Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using Several Parameters with GridSearchCV #8243

Open
ScientiaEtVeritas opened this issue Jan 29, 2017 · 4 comments

Comments

@ScientiaEtVeritas
Copy link

@ScientiaEtVeritas ScientiaEtVeritas commented Jan 29, 2017

When using (for example) the following transformators

  • CountVectorizer
  • TruncatedSVD
  • SelectKBest

with GridSearchCV it happens that it chooses a number for n_features for CountVectorizer that is less than n_components for TruncatedSVD or k for SelectKBest.

This leads to an error: ValueError: n_components must be < n_features

For SelectKBest I found a temporary solution:

class SelectAtMostKBest(SelectKBest):
    def _check_params(self, X, y):
        if not (self.k == "all" or 0 <= self.k <= X.shape[1]):
            self.k = "all"

But there is no equivalent for TruncatedSVD.
Is this behaviour intended? If yes, what can I do about this?

@jnothman jnothman added the API label Jan 30, 2017
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Jan 30, 2017

Yes, this is a limitation of our API currently. One option is to use a parameter grid that only allows valid combinations, by using a list of parameter dicts wherein each setting is valid. A more robust forward-thinking solution might consider n_components and k optionally being functions of X and y.

@ScientiaEtVeritas

This comment has been minimized.

Copy link
Author

@ScientiaEtVeritas ScientiaEtVeritas commented Jan 30, 2017

@jnothman Thank you, can you please elaborate a little bit more on the function-based approach? I am also considering to use the RandomizedSearchCV, but will probably face the same problem there, right?

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Jan 30, 2017

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Mar 4, 2017

How about using SelectPercentile instead? Not sure if we allow covered variance in TruncatedSVD but it would be nice to have an option that is relative to the input feature size, not absolute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants
You can’t perform that action at this time.