Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFECV with SVC & kernel != 'linear' == ValueError #5168

Closed
jmwoloso opened this issue Aug 27, 2015 · 18 comments
Closed

RFECV with SVC & kernel != 'linear' == ValueError #5168

jmwoloso opened this issue Aug 27, 2015 · 18 comments

Comments

@jmwoloso
Copy link
Contributor

I'm not even sure this is an issue worth mentioning, but thought I'd put it here in case anyone else ran into it. If you're specifying that SVC be used as the 'estimator' for RFECV and you have the SVC 'kernel' set to 'rbf', you'll get: "ValueError: coef_ is only available when using linear kernel". I haven't tested this with any other kernel setting like poly or sigmoid (my model is still running as I type this) but I imagine it would happen with anything where kernel != linear.

@amueller
Copy link
Member

This is by design. It's not possible to use RFE with SVC with any other kernel. Maybe a good reason to add SBS http://rasbt.github.io/mlxtend/docs/feature_selection/sequential_backward_selection/ ?

@jmwoloso
Copy link
Contributor Author

Ok, cool. Sebastian's Ensemble Classifier from mlxtend sure is handy, so I have no doubt this would also be a fine addition! Thanks Andy!

@amueller
Copy link
Member

he contributed the ensemble classifier to sklearn btw ;)

@jmwoloso
Copy link
Contributor Author

I know :D +1 for the addition! Haven't had a chance to take it out for a test drive...yet!

@rasbt
Copy link
Contributor

rasbt commented Oct 8, 2015

I was just talking about this with @rhiever and he was wondering where to find it in scikit-learn ;) So if there is more general interest, I could prepare a pull request for this (okay, I should takle #5070 first this/next weekend though). The code is actually pretty lean so far (https://github.com/rasbt/mlxtend/blob/master/mlxtend/feature_selection/sequential_backward_select.py), maybe adding an option to toggle between forward and backward selection and then just call it SFS (Sequential Feature Selector/Selection) or so. What do you think, is there still interest @jmwoloso @amueller ?

@jmwoloso
Copy link
Contributor Author

jmwoloso commented Oct 8, 2015

I believe it would be a very useful addition! Though I can't speak for others :) @rasbt @amueller

@rasbt
Copy link
Contributor

rasbt commented Oct 8, 2015

The concept is actually pretty simple and a lot of people may do something similar already (without necessarily calling it Sequential Backward/Forward Selection). Even so, this may be a convenient wrapper (incl. CV and GridSearch), plus it would be a different "application" than RFE (not talking about better or worse here). Whereas the RFE selects based on weights of linear models you'd select by performance metric (choosing any classification algorithm).

Maybe we could even use multiprocessing for that, we just need to make sure that we don't set the jobs > 1 for both outer SBS and inner CV.

@amueller
Copy link
Member

amueller commented Oct 9, 2015

@rasbt I think it would be a nice addition if you want to move it from mlxtend to sklearn :)

@jmwoloso
Copy link
Contributor Author

jmwoloso commented Oct 9, 2015

@rasbt @amueller Agreed!

@rasbt
Copy link
Contributor

rasbt commented Oct 10, 2015

Nice, I'd be definitely up to it.

if you want to move it from mlxtend to sklearn

mlxtend is actually just more of a "playground" for me. The stuff there should all work fine, but it is not the nicest, most efficient code; more like a "born out of need" kind of thing ;). I purposely avoid too much refactoring and spread the code over different classes since this is more for readability, and people can just copy & paste a certain function as needed without necessarily installing the whole package.

That being said, I am happy to contribute the SBS to scikit-learn; it is not only about "giving back to the nice community" but also a valuable learning experience, and personally, I would also prefer to use these things via the cleaner and battle-tested scikit-learn API :P

Coincidentally, I am planning to implement Sequential Forward Selection (SFS), Sequential Forward Floating Selection (SFFS), and Sequential Forward Floating Selection (SBFS) this weekend -- a colleague wants to use it for a study, and I just happened to need it next week too for a project with my experimental biology collaborators. I will probably implement them all separately, but once that's done I will open a placeholder pull request where we can discuss the implementation in scikit-learn further. E.g., having one SFS (Sequential Feature Selector) with different toggle options to switch between SFS, SBS, SFFS, SFBS (if the latter two should be included at all).

@jmwoloso
Copy link
Contributor Author

@rasbt Sounds like a busy weekend :D

@amueller
Copy link
Member

I am not very familiar with the techniques but I think for sklearn it is better to err on the side of not including all options instead of including too many. (the name for SBFS is Sequential Backward I guess?)

@rasbt
Copy link
Contributor

rasbt commented Oct 12, 2015

Sure, I agree. I don't want to make it unnecessarily complex. However, I think both the "forward" and "backward" approach have useful applications. Let's say you have 100 features and want to select the "best performing" subset of 10 features.
Here, it is easier/computationally more efficient to start with an empty set and add features until you and up with 10 features. If you don't have the conditional exclusion step, these are basically 10 iterations.
However, if you want to select 90 features out of 100, you are probably better off with backward selection. Especially if you use this approach as part of a grid search pipeline this makes a huge difference (note that the results, the subsets of selected features, are not necessarily the same though)

Maybe it wouldn't even be that complex to have one SeqentialFeatureSelector with an option: forward=True/False and conditional=True/False to cover the different scenarios. I think this may be simple to explain, I would only need to brainstorm a little bit about the refactoring, but this would also be quite okay I think.

Here are some short description about the different algorithms here (they are really super simple), and some example about the current usage/API, which we may want to change:

But I could create an early pull request with a SeqentialFeatureSelector skeleton where we can discuss this further maybe.

@amueller
Copy link
Member

I have to check on the Floating versions but I agree that both forward and backward would be helpful.

@amueller
Copy link
Member

do you have a reference for the floating version?

@rasbt
Copy link
Contributor

rasbt commented Oct 12, 2015

Good point, I wanted to look for some empirical studies or maybe try to find the original papers for reference -- I implemented these algos from old notes that I took in a pattern classification class, I think the Prof may have used this paper as "reference": http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=02CB16CB1C28EA6CB57E212861CFB180?doi=10.1.1.24.4369&rep=rep1&type=pdf

Floating versions are better in terms of classifier performance since you sample more feature subspaces, but it is computationally also more expensive. It would maybe be more interesting to not only compare SFS, SFFS, and optimal (exhaustive search) via classifier performance but also adding a time component.

So, again, it really depends on the application to choose the more appropriate one.

@arjunanil705
Copy link

Hi Guys,
Thank you for providing the SFS and SBS alternatives ..(against RFE and SVC).
I am having a dataset with 561 features. I applied the SFS algorithm to get 20 features, however I am having difficulty in accessing the indices of the 20 features.

@rasbt
Copy link
Contributor

rasbt commented Mar 6, 2018

Which one are you using, the one that is currently under construction (#8684) or the one from mlxtend (http://rasbt.github.io/mlxtend/user_guide/feature_selection/SequentialFeatureSelector/)? Regarding the former, the API may change a bit depending on how the PR goes, and for the latter, please feel free to ask questions regarding the indices via the mlxtend mailing list or Gh-issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants