-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
In scikit-learn 17.1 class RFECV uses a hard-coded 1 for n_features_to_select. For some classifiers with some parameter sets, the classifier will fail with only one feature. In addition, since RFECV takes some time, it would be helpful to allow the class to be initialized with a min_features_to_select with a value other than one that the user selects based on domain knowledge. One may not be interested in any RFECV solutions with less than, say five features. In this case the final four iterations of the recursive fit are wasted cycles.
The change is backward-compatible, anyone using RFECV without specifying min_features_to_select get's the argument default of one, the same behavior that's hard-coded today.
It's a trivial change to add it to the set of initialization parameters for the class. I use a customized version of sklearn 17.1 with this change. If I wasn't such a github noob I'd submit a pull or push or whatever it is.
Here's my code - modified RFECV init:
def __init__(self, estimator, step=1, cv=None, scoring=None,
estimator_params=None, verbose=0, min_features_to_select=1):
self.estimator = estimator
self.step = step
self.cv = cv
self.scoring = scoring
self.estimator_params = estimator_params
self.min_features_to_select = min_features_to_select
self.verbose = verbose
and in the fit() method:
# Initialization
cv = check_cv(self.cv, X, y, is_classifier(self.estimator))
scorer = check_scoring(self.estimator, scoring=self.scoring)
n_features = X.shape[1]
n_features_to_select = self.min_features_to_select