What Is Approach To Select The Best Subset Of Features? #238

sashml · 2017-09-08T08:16:04Z

Hi there,

mlxtend contains good feature selection approach via SFS(k_features=(5,10)). Regarding that I have a few questions:

When I put k_features=(5,7), I was thinking that only combinations of features 5,6,7 have to be considered during feature evaluation procedure.
If statement above is correct, why did I see features estimations from whole range from 1 till 7?
If No, how can I achieve the flow what I described?
How can I determine the best subset of features to be passed to k_features model parameter?
When I put range (10, 20) it's just my first guess, but in reality for this particular dataset maybe (20,25) range would be the best case. Is there any mechanism to detect (20,25) range?

Thank you!

The text was updated successfully, but these errors were encountered:

rasbt · 2017-09-08T16:18:08Z

When I put k_features=(5,7), I was thinking that only combinations of features 5,6,7 have to be considered during feature evaluation procedure.

If you are interested in different combinations of the features 5, 6, 7, I recommend using the ExhaustiveFeatureSelector. In the SequentialFeatureSelector k_features means the number of features. If you set k_features=(5,10), it will return the best performing subset that is between 5 and 10 features long.

How can I determine the best subset of features to be passed to k_features model parameter?

In this case, I recommend to run backward selection all the way down to 1 feature (or run forward selection all the way up to the m features in your dataset). Then, you can look at the performances (e.g., via the plotting function mentioned in the docs) and decide.

sashml · 2017-09-08T17:08:49Z

Thank you!

Regarding #2, I was hoping that I missed something and would be able to avoid so complex duties :)

rasbt · 2017-09-08T18:02:20Z

One common way to decide which feature subset to choose (if the size of the feature subset doesn't matter) is to look at the smallest feature subset and choose the subset that falls within 1 standard error of the best performing one. I guess this could be easily automated and added via k_features='auto' or so.

rasbt · 2017-09-09T08:52:47Z

I added some capabilities for 'best' and most'parsimonious' feature selection via #240 . Hope that's useful!

rasbt added the Question label Sep 8, 2017

rasbt mentioned this issue Sep 9, 2017

best and parsimonious features for sfs #240

Merged

rasbt closed this as completed Sep 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What Is Approach To Select The Best Subset Of Features? #238

What Is Approach To Select The Best Subset Of Features? #238

sashml commented Sep 8, 2017 •

edited

Loading

rasbt commented Sep 8, 2017

sashml commented Sep 8, 2017

rasbt commented Sep 8, 2017

rasbt commented Sep 9, 2017

What Is Approach To Select The Best Subset Of Features? #238

What Is Approach To Select The Best Subset Of Features? #238

Comments

sashml commented Sep 8, 2017 • edited Loading

rasbt commented Sep 8, 2017

sashml commented Sep 8, 2017

rasbt commented Sep 8, 2017

rasbt commented Sep 9, 2017

sashml commented Sep 8, 2017 •

edited

Loading