Fixes #956 Add `features_group` parameter to ExhaustiveFeatureSelector [WIP] #957

NimaSarajpoor · 2022-07-08T15:57:34Z

Description

This PR addresses issue #956. The idea is to allow user to consider some features to be considered together throughout the feature selection process.

Added a note about the modification or contribution to the ./docs/sources/CHANGELOG.md file (if applicable)
Added appropriate unit test functions in the ./mlxtend/*/tests directories (if applicable)
Modify documentation in the corresponding Jupyter Notebook under mlxtend/docs/sources/ (if applicable)
Ran PYTHONPATH='.' pytest ./mlxtend -sv and make sure that all CURRENT unit tests pass (for small modifications, it might be sufficient to only run the specific test file, e.g., PYTHONPATH='.' pytest ./mlxtend/classifier/tests/test_stacking_cv_classifier.py -sv)
Checked for style issues by running flake8 ./mlxtend

What I have done so far:

Added parameter features_group to mlxtend/feature_selection/exhaustive_feature_selector.py
All EXISTING tests for feature_selection pass

What needs to be done:

Review changes and see if the taken approach is reasonable
Write test function for the new feature
Add this parameter to sequential_feature_selector.py

Please let me know what you think

side note:
I added some comments wherever I felt something should be noticed. We should remove the comments.

NimaSarajpoor · 2022-07-16T16:23:55Z

Recently, I found another PR ( #651 ) for adding fixed_features to ExhaustiveFeatureSelector. So, I think we are going to have a slight problem here.

Let's say our features are [1, 2, 3, 4, 5] and the user wants to:

consider feature_groups as [[1], [2] , [3], [4,5]]
And, consider fixed features: [1, 4]

So...how should we deal with this? We can:

throw a warning and consider [1, 4, 5] as fixed features?
Or, throw an error and let the user know that there is a conflict between these two parameters?
Or, ???

rasbt

This looks actually pretty good to me. If this works, this is a simple and elegant solution. I guess the next step would be unit tests. Would you like any help with it?

mlxtend/feature_selection/exhaustive_feature_selector.py

rasbt · 2022-07-17T21:13:15Z

Thanks for this PR btw! I think having feature_group support would be an awesome feature (no pun intended) for so many people!

EDIT: We can also make this PR specific to the Exhaustive Feature Selector and revisit the sequential one at a later time. I think it will require more work for sure.

rasbt · 2022-07-17T21:25:17Z

Recently, I found another PR ( #651 ) for adding fixed_features to ExhaustiveFeatureSelector. So, I think we are going to have a slight problem here.
Let's say our features are [1, 2, 3, 4, 5] and the user wants to:
consider feature_groups as [[1], [2] , [3], [4,5]]
And, consider fixed features: [1, 4]
So...how should we deal with this? We can:
throw a warning and consider [1, 4, 5] as fixed features?
Or, throw an error and let the user know that there is a conflict between these two parameters?
Or, ???

Oh good call. To be honest, I think it might be best to check for conflicts and raise an error so the user can fix it. It's maybe better than having a silent behavior that is not clear to the user.

So in the case above, maybe an error message would make sense like

raise ValueError("Feature 4 was specified as a fixed feature but feature 4 has other features in its feature group.")

And in the docstring, we can maybe say something like

If both fixed_features and feature_groups are used, all feature grouped members of a fixed feature need to be specified as a fixed feature.

What do you think?

NimaSarajpoor · 2022-07-17T23:01:45Z

I think it might be best to check for conflicts and raise an error so the user can fix it. It's maybe better than having a silent behavior that is not clear to the user.

You are right... It would be safer. We are putting just a little load on the shoulder of users to take care of such conflict. However, they would have then a better understanding of what's going on :)

NimaSarajpoor · 2022-07-17T23:05:38Z

Thanks for this PR btw! I think having feature_group support would be an awesome feature (no pun intended) for so many people!

Thanks for creating this opportunity :) I enjoy doing this :)

We can also make this PR specific to the Exhaustive Feature Selector and revisit the sequential one at a later time. I think it will require more work for sure.

Sure.

I guess the next step would be unit tests. Would you like any help with it?

I will probably need some help 😄 For now, I am going to take a look at the existing test function(s) and see how things are. I will try to do it myself. If I get into any issue, I will let you know.

NimaSarajpoor · 2022-07-17T23:13:04Z

@rasbt
Also, I do not have access to my main pc that contains my local repo and my in-progress work during the weekend. I use another pc during the weekend and thus I just review my codes. So, sometimes you may notice that I review my own code and catch some issues (so I do not forget them later when I want to push the next set of commits).

NimaSarajpoor

I will resolve/address these comments in my next push.

mlxtend/feature_selection/exhaustive_feature_selector.py

NimaSarajpoor · 2022-07-18T05:30:28Z

We also need to add two notes to the docstring:

for min_features / max_features: When the parameter feature_groups is provided (and not None), then the number of features is in fact corresponding to the number of groups of features. For instance, when feature_groups = [[0], [1], [2,3], [4]], the max_features can be "4" at the most as we have four groups only.
Although two or more individual features may be considered as one group throughout the feature-selection process, it does not mean the individual features of that group have the same impact on the outcome. For instance, in linear regression, the coefficient of the feature 2 and 3 can be different even if they are considered as one group in feature_groups.

rasbt · 2022-07-19T01:57:18Z

I like the suggestion! Slightly more concise version for the first:

If parameter feature_groups is not None, the number of features is equal to the number of feature groups (len(feature_groups)). For example, if feature_groups=[[0], [1], [2,3], [4]] the max_features value cannot exceed 4.

NimaSarajpoor · 2022-07-19T02:07:05Z

I like the suggestion! Slightly more concise version for the first:

Cool! Thanks for revising the note. I will add them to the docstring.

pep8speaks · 2022-07-20T17:59:00Z

Hello @NimaSarajpoor! Thanks for updating this PR.

In the file mlxtend/feature_selection/exhaustive_feature_selector.py:

Line 327:89: E501 line too long (94 > 88 characters)
Line 361:89: E501 line too long (93 > 88 characters)
Line 390:89: E501 line too long (89 > 88 characters)

Comment last updated at 2022-08-09 01:29:49 UTC

NimaSarajpoor · 2022-07-20T18:09:29Z

@rasbt
UPDATE

I resolved the issues previously discussed (except one, please see note below)
I added new test function to test new parameter feature_groups
All tests are passing :)

Notes:

I have not added the new parameter fixed_features (PR fixed_features for ExhaustiveFeatureSelector #579 #651) yet. Should I do it here in this branch? or, should I work on it in its own dedicated branch?
I did not add a check for the validity of the input parameter feature_groups. I think I need to check if the input is valid, and provide an error message otherwise.

rasbt · 2022-07-22T01:35:58Z

Thanks so much for the update.

If it is not too much hassle, maybe add the fixed_feature also here. We could have a separate PR, but that is maybe adding additional work for you since some of the steps are basically repeated.
I can take care of the documentation and the Changelog. I think these are the most "boring" things usually, and you are already doing so much

For the check, maybe the following ones are good:

a) a check that the feature groups contain all feature indices

if _merge_lists(feature_groups) != set(range(X.shape[1]):
    raise ValueError(`feature_group` must contain all features within `range(X.shape[1]`)

b) check that feature groups are not empty:

for fg in feature_groups:
    if not fg:
	raise ValueError("Feature group can't be empty)

All tests are passing :)

Wohooo!

Ohh, 3 a) reminds me that the ExhaustiveFeatureSelector actually supports pandas DataFrames (http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/#example-7-working-with-pandas-dataframes). I am not sure this would work with the current feature_groups implementation.

Option 1 (ideal):

I suppose what we want in this case is that the user can specify the feature groups by column names. For example, [['Sepal length', 'Petal length'], ['Sepal width'], ['Petal width']].

Option 2 (ok):

We could also let the feature group selection integer based, e.g., [[1, 2], [3], [4]] and select DataFrame columns by index for now. We can then add string support later on.

Option 3:

For now, we can just add a check and say that feature_groups are currently not supported for pandas DataFrames. We can then add Option 1 or 2 in the future in another PR.

What do you think, is option 1 feasible at this point or do you prefer Option 2/3 and to revisit this later?

NimaSarajpoor · 2022-07-22T03:06:32Z

maybe add the fixed_feature also here

Sure! Will do so.

2. I can take care of the documentation and the Changelog. I think these are the most "boring" things usually, and you are already doing so much

Thank you! :)

For the check, maybe the following ones are good:

a) a check that the feature groups contain all feature indices

Just for the records, I think we "can" allow users to provide not all but just some of the features in the parameter features_group. For instance, I THINK that even if the parameter value is [[0],[1,2],[3]], the Exhaustive Feature Selection will be performed no matter how many features the data actually have. Having said that, I think if a user wants to do that, they should simply pass the chunk of their data that contains those features only. So, I think error message (a) is good as it prevents users from making any accidental mistakes.

b) check that feature groups are not empty:

for fg in feature_groups:
    if not fg:
	raise ValueError("Feature group can't be empty)

Cool

Ohh, 3 a) reminds me that the ExhaustiveFeatureSelector actually supports pandas DataFrames (http://rasbt.github.io/mlxtend/user_guide/feature_selection/ExhaustiveFeatureSelector/#example-7-working-with-pandas-dataframes). I am not sure this would work with the current feature_groups implementation.

Option 1 (ideal):

I suppose what we want in this case is that the user can specify the feature groups by column names. For example, [['Sepal length', 'Petal length'], ['Sepal width'], ['Petal width']].

This is exactly what I thought after pushing the last set of commits! I am planning to work on this. And, then, I will take care of the fixed_feature :)

NimaSarajpoor · 2022-07-23T17:33:27Z

@rasbt
I think everything should be good now. Something in my code was bothering me and I fixed it (cleaned it). So, I am going to stop pushing now :)

rasbt · 2022-07-27T01:52:26Z

Thanks for all the updates! On the one hand, I think the custom_feature_names parameter is very useful. However, I think it's from before we added pandas DataFrame support. I would that it has become kind of redundant and we can remove it for the sake of simplicity. Over time, things in the mlxtend library are becoming more and more complicated, so I guess it's fair that we remove one parameter in favor of adding the new feature_groups parameter 😉.

To keep it manageable right now, and to not overcomplicate things when we add feature_groups, I would say let's drop the custom_feature_names right away (versus deprecating it). Let me take care of it in this PR before I continue with the review. I can add a documentation example to illustrate how users can achieve the same thing with pandas:

from sklearn.neighbors import KNeighborsClassifier
from mlxtend.feature_selection import ExhaustiveFeatureSelector as EFS
from sklearn.datasets import load_iris
import pandas as pd

iris_data = load_iris()
X, y = iris_data.data, iris_data.target

df_X = pd.DataFrame(X, columns=["Sepal length", "Sepal width", "Petal length", "Petal width"])



knn = KNeighborsClassifier(n_neighbors=3)

efs1 = EFS(knn, 
           min_features=1,
           max_features=4,
           scoring='accuracy',
           print_progress=True,
           cv=5)

efs1 = efs1.fit(df_X, y)

print('Best accuracy score: %.2f' % efs1.best_score_)
print('Best subset (indices):', efs1.best_idx_)
print('Best subset (corresponding names):', efs1.best_feature_names_)

Features: 15/15
Best accuracy score: 0.97
Best subset (indices): (0, 2, 3)
Best subset (corresponding names): ('Sepal length', 'Petal length', 'Petal width')

NimaSarajpoor · 2022-07-27T02:15:53Z

To keep it manageable right now, and to not overcomplicate things when we add feature_groups, I would say let's drop the custom_feature_names right away (versus deprecating it). Let me take care of it in this PR before I continue with the review

Sure. Removing the feature makes sense. Thanks for your effort.

I can add a documentation example to illustrate how users can achieve the same thing with pandas:

Sounds good :)

review-notebook-app · 2022-07-27T02:21:33Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

…mlxtend into feature_selection

rasbt · 2022-07-27T02:27:47Z

Alright, just removed it to simplify the code. I may have to call it a day here, but I will be back with the code reviews in the next couple of days!

…mlxtend into feature_selection

NimaSarajpoor · 2022-08-05T14:01:35Z

@rasbt
Thanks for the changes! I am going to go through my code. If I notice anything should be revised, I review them and then wait and see your opinion on them.

Also, if you have an idea about what I should do as the next step in this PR, please let me know.

NimaSarajpoor

@rasbt
I reviewed the code and left some comments. Please feel free to let me know if you disagree with some/all of them, and also please add your comments if you have any.

mlxtend/feature_selection/exhaustive_feature_selector.py

rasbt · 2022-08-06T16:24:06Z

Thanks for the comments! I would say other than those you mentioned, it looks really good. I wouldn't worry about any additional changes to the code

NimaSarajpoor · 2022-08-06T17:11:23Z

Thanks for the comments! I would say other than those you mentioned, it looks really good. I wouldn't worry about any additional changes to the code

Cool! I will take care of the comments and push the changes.

NimaSarajpoor · 2022-08-09T01:41:40Z

@rasbt
I took care of the issues. I think things should be good now :)

rasbt · 2022-08-10T22:37:35Z

Wow thanks so much, looks great now and will merge!

NimaSarajpoor · 2022-08-11T00:05:31Z

@rasbt
Thanks a lot for your guidance throughout this journey! I enjoyed working on this! So, thank you!

rasbt · 2022-08-11T01:59:30Z

Likewise, thanks so much for the high-quality contribution @NimaSarajpoor. This is well appreciated! Actually, I may make a small release in the following days if time permits!

NimaSarajpoor added 3 commits July 8, 2022 09:03

Add feature_groups parameter to exhaustive feature selection

ade469b

convert list to tuple, the original type

88268f7

Fixed docstring

f2ebae9

rasbt requested changes Jul 17, 2022

View reviewed changes

mlxtend/feature_selection/exhaustive_feature_selector.py Outdated Show resolved Hide resolved

mlxtend/feature_selection/exhaustive_feature_selector.py Outdated Show resolved Hide resolved

mlxtend/feature_selection/exhaustive_feature_selector.py Outdated Show resolved Hide resolved

NimaSarajpoor commented Jul 18, 2022

View reviewed changes

NimaSarajpoor added 3 commits July 20, 2022 11:38

small changes

6e01986

Add new test function for feature_groups

b76ed8f

new test is passing

7bbb440

Correct style

8f8778c

rasbt changed the title ~~Fixes #956 Add features_group parameter to features selection [WIP]~~ Fixes #956 Add features_group parameter to ExhaustiveFeatureSelector [WIP] Jul 22, 2022

NimaSarajpoor added 7 commits July 22, 2022 11:53

set a parameter default to None

92cebfc

Add support for name of features in feature_group

087d70e

Add support for fixed features

99b5141

Sort feature indices

bb9001b

Add new test function for fixed_features

582b9c3

fix bug

6622ade

fix minor bug

25cc657

NimaSarajpoor added 4 commits July 22, 2022 19:00

Add test function for support string in param fixed_features

44128cd

Revise comments

8597953

Cleaning the code

28f8758

Remove redundant line

20ae4c3

remove custom_feature_names

73b7215

rasbt added 3 commits July 26, 2022 21:22

Merge branch 'master' into feature_selection

133fd86

run linter

ea7b21f

Merge branch 'feature_selection' of https://github.com/NimaSarajpoor/…

3a79384

…mlxtend into feature_selection

rasbt added 6 commits August 3, 2022 20:57

some additional checks

d4cb896

add additional test

dd86b87

update precommit hook to fix inconsistency

923afa4

Merge branch 'feature_selection' of https://github.com/NimaSarajpoor/…

decdc08

…mlxtend into feature_selection

use flak8 config file in workflow

3ecf2a6

style fix

e4e5b0d

NimaSarajpoor commented Aug 5, 2022

View reviewed changes

minor changes

2c12bed

rasbt merged commit 2ed7ecf into rasbt:master Aug 10, 2022

NimaSarajpoor mentioned this pull request Aug 18, 2022

Sequential Feature Selection for categorical features without one-hot encoding #502

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #956 Add `features_group` parameter to ExhaustiveFeatureSelector [WIP] #957

Fixes #956 Add `features_group` parameter to ExhaustiveFeatureSelector [WIP] #957

NimaSarajpoor commented Jul 8, 2022 •

edited by rasbt

Loading

NimaSarajpoor commented Jul 16, 2022 •

edited

Loading

rasbt left a comment

rasbt commented Jul 17, 2022 •

edited

Loading

rasbt commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022 •

edited

Loading

NimaSarajpoor left a comment

NimaSarajpoor commented Jul 18, 2022 •

edited

Loading

rasbt commented Jul 19, 2022

NimaSarajpoor commented Jul 19, 2022

pep8speaks commented Jul 20, 2022 •

edited

Loading

NimaSarajpoor commented Jul 20, 2022 •

edited

Loading

rasbt commented Jul 22, 2022

NimaSarajpoor commented Jul 22, 2022

NimaSarajpoor commented Jul 23, 2022

rasbt commented Jul 27, 2022

NimaSarajpoor commented Jul 27, 2022

review-notebook-app bot commented Jul 27, 2022

rasbt commented Jul 27, 2022

NimaSarajpoor commented Aug 5, 2022

NimaSarajpoor left a comment

rasbt commented Aug 6, 2022

NimaSarajpoor commented Aug 6, 2022

NimaSarajpoor commented Aug 9, 2022

rasbt commented Aug 10, 2022

NimaSarajpoor commented Aug 11, 2022

rasbt commented Aug 11, 2022

Fixes #956 Add features_group parameter to ExhaustiveFeatureSelector [WIP] #957

Fixes #956 Add features_group parameter to ExhaustiveFeatureSelector [WIP] #957

Conversation

NimaSarajpoor commented Jul 8, 2022 • edited by rasbt Loading

Description

NimaSarajpoor commented Jul 16, 2022 • edited Loading

rasbt left a comment

Choose a reason for hiding this comment

rasbt commented Jul 17, 2022 • edited Loading

rasbt commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022

NimaSarajpoor commented Jul 17, 2022 • edited Loading

NimaSarajpoor left a comment

Choose a reason for hiding this comment

NimaSarajpoor commented Jul 18, 2022 • edited Loading

rasbt commented Jul 19, 2022

NimaSarajpoor commented Jul 19, 2022

pep8speaks commented Jul 20, 2022 • edited Loading

Comment last updated at 2022-08-09 01:29:49 UTC

NimaSarajpoor commented Jul 20, 2022 • edited Loading

rasbt commented Jul 22, 2022

NimaSarajpoor commented Jul 22, 2022

NimaSarajpoor commented Jul 23, 2022

rasbt commented Jul 27, 2022

NimaSarajpoor commented Jul 27, 2022

review-notebook-app bot commented Jul 27, 2022

rasbt commented Jul 27, 2022

NimaSarajpoor commented Aug 5, 2022

NimaSarajpoor left a comment

Choose a reason for hiding this comment

rasbt commented Aug 6, 2022

NimaSarajpoor commented Aug 6, 2022

NimaSarajpoor commented Aug 9, 2022

rasbt commented Aug 10, 2022

NimaSarajpoor commented Aug 11, 2022

rasbt commented Aug 11, 2022

Fixes #956 Add `features_group` parameter to ExhaustiveFeatureSelector [WIP] #957

Fixes #956 Add `features_group` parameter to ExhaustiveFeatureSelector [WIP] #957

NimaSarajpoor commented Jul 8, 2022 •

edited by rasbt

Loading

NimaSarajpoor commented Jul 16, 2022 •

edited

Loading

rasbt commented Jul 17, 2022 •

edited

Loading

NimaSarajpoor commented Jul 17, 2022 •

edited

Loading

NimaSarajpoor commented Jul 18, 2022 •

edited

Loading

pep8speaks commented Jul 20, 2022 •

edited

Loading

NimaSarajpoor commented Jul 20, 2022 •

edited

Loading