Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX Correct the definition of gamma=`scale` in svm #13221

Merged
merged 7 commits into from Feb 24, 2019

Conversation

Projects
None yet
4 participants
@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Feb 22, 2019

Closes #12741
See #13186 (comment):
And I'm wondering whether it's possible to solve #12741 in 0.20.3 by changing the definition of gamma='scale' directly, since it's introduced erroneously in 0.20. This is an embarrassing mistake and I think it'll be much more difficult to solve it in 0.21.X (maybe at that time we'll need to deprecate scale and introduce another option).

qinhanmin2014 added some commits Feb 22, 2019

@@ -1710,19 +1710,23 @@ def test_deprecated_grid_search_iid():
depr_message = ("The default of the `iid` parameter will change from True "
"to False in version 0.22")
X, y = make_blobs(n_samples=54, random_state=0, centers=2)
grid = GridSearchCV(SVC(gamma='scale'), param_grid={'C': [1]}, cv=3)
grid = GridSearchCV(SVC(gamma='scale', random_state=0),

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Feb 22, 2019

Author Member

We do not always warn.

if self.iid == 'warn':
warn = False
for scorer_name in scorers.keys():
scores = test_scores[scorer_name].reshape(n_candidates,
n_splits)
means_weighted = np.average(scores, axis=1,
weights=test_sample_counts)
means_unweighted = np.average(scores, axis=1)
if not np.allclose(means_weighted, means_unweighted,
rtol=1e-4, atol=1e-4):
warn = True
break
if warn:
warnings.warn("The default of the `iid` parameter will change "
"from True to False in version 0.22 and will be"
" removed in 0.24. This will change numeric"
" results when test-set sizes are unequal.",
DeprecationWarning)

@@ -87,9 +87,9 @@ def test_svc():
kernels = ["linear", "poly", "rbf", "sigmoid"]
for dataset in datasets:
for kernel in kernels:
clf = svm.SVC(gamma='scale', kernel=kernel, probability=True,
clf = svm.SVC(gamma=1, kernel=kernel, probability=True,

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Feb 22, 2019

Author Member

numerical issue here and below

X = np.array([[-2, -1], [-1, -1], [-1, -2], [1, 1], [1, 2], [2, 1]])
X_sp = sparse.lil_matrix(X)
Y = [1, 1, 1, 2, 2, 2]
clf = svm.SVC(gamma='scale', kernel=kernel, probability=True,
              random_state=0, decision_function_shape='ovo')
sp_clf = svm.SVC(gamma='scale', kernel=kernel, probability=True,
                 random_state=0, decision_function_shape='ovo')
clf.fit(X, Y)
print(clf._gamma)
print(clf.support_)
sp_clf.fit(X_sp, Y)
print(sp_clf._gamma)
print(sp_clf.support_)
# 0.25
# [0 1 3 4]
# 0.25000000000000006
# [1 2 3 5]
@@ -344,7 +344,7 @@ once will overwrite what was learned by any previous ``fit()``::
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
>>> clf.predict(X_test)
array([1, 0, 1, 1, 0])
array([0, 0, 0, 1, 0])

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Feb 22, 2019

Author Member

these samples are generated randomly (maybe we should avoid doing so in the tutorial?)

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

qinhanmin2014 commented Feb 22, 2019

Ready for a review @jnothman @amueller
We need to change some tests, see my reviews above.
Tagging 0.20.3.

@qinhanmin2014 qinhanmin2014 added this to the 0.20.3 milestone Feb 22, 2019

@ogrisel
Copy link
Member

ogrisel left a comment

LGTM. But because gamma is now a fitted parameter, wouldn't it make sense to expose the estimated bandwidth as a public attribute gamma_ instead of a private attribute _gamma?

I would be +0 for making it public.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Feb 22, 2019

lgtm (but github doesn't let me click approve?)

@qinhanmin2014

This comment has been minimized.

Copy link
Member Author

qinhanmin2014 commented Feb 23, 2019

these samples are generated randomly (maybe we should avoid doing so in the tutorial?)

Will soon open a PR to update the tutorial.

But because gamma is now a fitted parameter, wouldn't it make sense to expose the estimated bandwidth as a public attribute gamma_ instead of a private attribute _gamma?

Also +0 from my side and this is out of the scope of this PR. Maybe open an issue/PR if someone wants it?

@jnothman jnothman merged commit 5c8167a into scikit-learn:master Feb 24, 2019

11 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 92.48%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +7.51% compared to d19a5dc
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Feb 24, 2019

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:svm-gamma branch Feb 24, 2019

Kiku-git added a commit to Kiku-git/scikit-learn that referenced this pull request Mar 4, 2019

ivannz added a commit to ivannz/scikit-learn that referenced this pull request Mar 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.