Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give feedback when svm.SVC is configured with kernel hyperparameters for a different kernel #19614

Open
PGijsbers opened this issue Mar 4, 2021 · 18 comments
Labels
Easy Well-defined and straightforward way to resolve help wanted module:svm New Feature

Comments

@PGijsbers
Copy link
Contributor

During our class we noticed some students incorrectly configure some hyperparameters which are irrelevant to the kernel used, for example setting gamma when using a linear kernel. We think it could make sense for scikit-learn to give feedback to the user when non-effective settings are configured.

Describe the workflow you want to enable

from sklearn.datasets import load_iris
from sklearn.svm import SVC

x, y = load_iris(return_X_y=True)
clf = SVC(kernel='linear', gamma=1e-6)
clf.fit(x, y)
print(clf.score(x, y))

current output:

0.9933333333333333

proposed output, something similar to:

UserWarning: Gamma is set but not used because a linear kernel is configured.
0.9933333333333333
@NicolasHug
Copy link
Member

NicolasHug commented Mar 4, 2021

Thanks for the report, I agree we should even error in this case (instead of a warning), as bug fix. The same goes for coef0 and for SVR.

PR welcome @PGijsbers

@thomasjpfan
Copy link
Member

Currently, SVC ignores parameters depending on the kernel. For invalid combinations, I agree with @NicolasHug that we should error instead.

I think we should error when the parameter is not the default value and not compatible with the kernel. For example, if gamma!='scale' and kernel not in {'rbf', 'poly', 'sigmoid'}, we raise an error. This logic can extend to degree and coef0.

@PGijsbers
Copy link
Contributor Author

I'll try to make the changes this weekend 👍

@PGijsbers
Copy link
Contributor Author

Should the error be raised on fit similar to incompatible configurations for LogisticRegression, or on __init__?

@NicolasHug
Copy link
Member

the validation should happen in fit (more details here if you're interested https://scikit-learn.org/stable/developers/develop.html#instantiation)

Also please make sure to write a non-regression test for the cases that are worth testing. The test should make sure that the error is now properly raised. Your snippet above is a great candidate.

Thanks!

@PGijsbers
Copy link
Contributor Author

This change breaks some tests, e.g. sparse_svm. I'll go ahead and update those? Should I raise a DeprecationWarning first, or immediately setup a PR with ValueErrors?

@thomasjpfan
Copy link
Member

I would prefer to deprecated first to be on the safe side. For the first PR, let's do this for gamma first to see how other reviewers feel about it. Then we can have follow up PRs on the other parameters.

@NicolasHug
Copy link
Member

I don't think there's anything to deprecate here: there's no feature, just a silent (minor) bug.

We could raise a temporary warning instead of an error, but for such bugfixes we tend to error directly. On top of that, the upcoming release is 1.0, so it'd be nice to include such changes directly.

Also, the failing tests as mentioned above would have to fixed whether we raise a warning or an error.

@PGijsbers
Copy link
Contributor Author

I'll update the PR when there is a new consensus (the PR has a FutureWarning instead of DeprecationWarning right now) :) just let me know

@NicolasHug
Copy link
Member

@thomasjpfan what made you change your mind from error to warning? The offending test sparse_svm is clearly wrong as it passes clf = svm.OneClassSVM(gamma=1, kernel=kernel) for all kernels.

@jnothman
Copy link
Member

jnothman commented Mar 7, 2021 via email

@thomasjpfan
Copy link
Member

The ignoring behavior is explicitly documented which makes me think it was intentional.

I do want to get to raising an error at some point and would be +1 on raising an error for 1.0.

@cmarmo cmarmo removed the help wanted label Mar 7, 2021
@ogrisel
Copy link
Member

ogrisel commented Mar 15, 2021

I do want to get to raising an error at some point and would be +1 on raising an error for 1.0.

Why not just go through the usual deprecation cycle? This is more user friendly than a breaking change.

@ogrisel
Copy link
Member

ogrisel commented Mar 15, 2021

The fact that the degree parameter is documented as ignored explicitly if kernel is not poly is a marker that the current behavior was intentional and should therefore not be considered a bug. Also the current behavior can be (ab-)used to do simple yet efficient hyper-parameter search for all the kernel and there parametrizations at once using RandomizedSearchCV with distributions. Maybe that's an edge feature that few uses and the educational value of the warning (then error in the future) is more important.

Also, to be consistent, we should do the same for all 3 parameters (gamma, degree, coef0).

Also we should also issue the FutureWarning for SVC(gamma="auto", kernel="linear") because default is gamma="scale".

Since gamma="scale" is the default, it means that calling SVC(gamma="scale", kernel="linear") explicitly will not raise which is a bit weird/surprising. We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function. One would have to read the parameters section of the docstring instead. No strong opinion on that last point.

@glemaitre
Copy link
Member

Raising an error will not be an issue with SerchCV thanks to the error_score parameter.

Since gamma="scale" is the default, it means that calling SVC(gamma="scale", kernel="linear") explicitly will not raise which is a bit weird/surprising. We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function.

I assume that we want to be consistent. But indeed getting gamma=None that will default to gamma="scale' if kernel="rbf" is semantically weird. I would expect the default to be "auto" meaning that it would default to a value (that is never None). However, for gamma, "auto" is even meaning something else.

So I assume that we are left with:

  • let the code as it is at the cost of users trying a non-meaningful set of hyperparameters if they are not experts and look at the documentation
  • change the code with obscure default values where you need to read the documentation to know what will be the real default

I am +0 on this. @ogrisel @NicolasHug @NicolasHug could you give a bit more thoughts and say what you think is best?
Depending on the direction, my review process will be different in the PR :)

@thomasjpfan
Copy link
Member

Why not just go through the usual deprecation cycle? This is more user friendly than a breaking change.

Looking at this again, if we were to change behavior, I am +1 on deprecation.

We could use None as a default marker for gamma, degree and coef0 but it's a bit sad because then it's not longer possible to see what is the meaning of the default value just by reading the prototype of the function.

Maybe that is a good thing? Currently, one needs to read the docs to know if the parameter is even active.

Also the current behavior can be (ab-)used to do simple yet efficient hyper-parameter search for all the kernel and there parametrizations at once using RandomizedSearchCV with distributions.

I think the current implementation is less efficient, because one can have search spaces with parameters that are ignored.
For example, if we had a search space: kernel=['linear', 'poly', 'rbf', 'sigmoid'] x degree = [2, 3, 4, 5, 6], all the non-poly kernels will be training the same model. With error_score, the invalid combinations will early exit.

I am +0.5 on deprecating the current behavior.

@glemaitre
Copy link
Member

So the deprecation could be a good option.

@thomasjpfan what are your thoughts regarding the gamma parameter:

I assume that we want to be consistent. But indeed getting gamma=None that will default to gamma="scale' if kernel="rbf" is semantically weird. I would expect the default to be "auto" meaning that it would default to a value (that is never None). However, for gamma, "auto" is even meaning something else.

@thomasjpfan
Copy link
Member

Using 'auto' to mean 1/n_feature is kind of weird in itself. To move forward we can also rename 'auto' to reciprocal_n_features ?

@adrinjalali adrinjalali added Easy Well-defined and straightforward way to resolve help wanted labels Mar 7, 2024
Issac-Kondreddy added a commit to Issac-Kondreddy/scikit-learn that referenced this issue Mar 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve help wanted module:svm New Feature
Projects
None yet
Development

No branches or pull requests

8 participants