Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Arbitrary SVC kernels #11296

Merged
merged 9 commits into from Feb 1, 2020

Conversation

georgipeev
Copy link
Contributor

Reference Issues/PRs

Fixes #10713

What does this implement/fix? Explain your changes.

Removed non-applicable input validation when a custom kernel is used. Allowed specifying X as a list when using custom kernels.

Any other comments?

WIP until I add a meaningful unit test aside from the current one that merely checks that no errors arise when using custom kernels.

@jnothman
Copy link
Member

And yet it seems you already have test failures :( Let us know if you need help

@@ -125,7 +126,7 @@ def test_precomputed():

kfunc = lambda x, y: np.dot(x, y.T)
clf = svm.SVC(kernel=kfunc)
clf.fit(X, Y)
clf.fit(np.array(X), Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically here it's an API change. Before for a custom kernel we would convert X to array. I propose to delegate the input checking to kernel function.

@@ -980,7 +981,7 @@ def test_svc_bad_kernel():
def test_timeout():
a = svm.SVC(kernel=lambda x, y: np.dot(x, y.T), probability=True,
random_state=0, max_iter=1)
assert_warns(ConvergenceWarning, a.fit, X, Y)
assert_warns(ConvergenceWarning, a.fit, np.array(X), Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. Due to API change.

@agramfort agramfort changed the title [WIP] Arbitrary SVC kernels [MRG] Arbitrary SVC kernels Jan 29, 2020
@rth rth added the module:svm label Jan 29, 2020
assert svc1.score(data, y) == svc3.score(K, y)
assert svc1.score(data, y) == svc2.score(X, y)
if hasattr(svc1, 'decision_function'): # classifier
assert_array_almost_equal(svc1.decision_function(data),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use assert_allclose instead of assert_array_almost_equal


if self.kernel == "precomputed":
if X.shape[1] != self.shape_fit_[0]:
raise ValueError("X.shape[1] = %d should be equal to %d, "
"the number of samples at training time" %
(X.shape[1], self.shape_fit_[0]))
elif n_features != self.shape_fit_[1]:
elif not callable(self.kernel) and n_features != self.shape_fit_[1]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If n_features is only used here, let's be explicit:

Suggested change
elif not callable(self.kernel) and n_features != self.shape_fit_[1]:
elif not callable(self.kernel) and X.shape[1] != self.shape_fit_[1]:


if self.kernel == "precomputed":
if X.shape[1] != self.shape_fit_[0]:
raise ValueError("X.shape[1] = %d should be equal to %d, "
"the number of samples at training time" %
(X.shape[1], self.shape_fit_[0]))
elif n_features != self.shape_fit_[1]:
elif not callable(self.kernel) and n_features != self.shape_fit_[1]:
raise ValueError("X.shape[1] = %d should be equal to %d, "
"the number of features at training time" %
(n_features, self.shape_fit_[1]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(n_features, self.shape_fit_[1]))
(X.shape[1], self.shape_fit_[1]))

@@ -454,14 +467,15 @@ def _validate_for_predict(self, X):
raise ValueError(
"cannot use sparse input in %r trained on dense data"
% type(self).__name__)
n_samples, n_features = X.shape
if not callable(self.kernel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove this assignment then

raise ValueError("sample_weight and X have incompatible shapes: "
"%r vs %r\n"
"Note: Sparse matrices cannot be indexed w/"
"boolean masks (use `indices=True` in CV)."
% (sample_weight.shape, X.shape))

if isinstance(self.gamma, str):
kernel = self.kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kernel = self.kernel
kernel = 'precomputed' if callable(self.kernel) else self.kernel

sklearn/svm/_base.py Show resolved Hide resolved
sklearn/svm/_base.py Show resolved Hide resolved
kernels in :class:`svm.SVC` and :class:`svm.SVR`.
:pr:`11296` by `Alexandre Gramfort`_ and :user:`Georgi Peev <georgipeev>`.

- |API| Do not enforce float entries in X when using custom kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add this change in Change of behaviour. I think this is fine since it could be considered as bug fix but better to document it as well.

Copy link
Contributor

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought I posted this yesterday. Hopefully not out of date

kernels in :class:`svm.SVC` and :class:`svm.SVR`.
:pr:`11296` by `Alexandre Gramfort`_ and :user:`Georgi Peev <georgipeev>`.

- |API| Do not enforce float entries in X when using custom kernel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand "API" to mean "there's a new way to do something I could do before". I'm not sure it quite fits here

@@ -185,6 +185,14 @@ Changelog
`probB_`, are now deprecated as they were not useful. :pr:`15558` by
`Thomas Fan`_.

- |Fix| Fix use of custom kernel not taking float entries such as string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be regarded as an enhancement

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well it was support to work but it was not tested. I prefer to see this as a bug fix.

order='C', accept_sparse='csr',
accept_large_sparse=False)
if callable(self.kernel):
check_consistent_length(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now do we have a backward incompatibility since before a kernel would have been passed a validated float array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with that level of incompatibility though perhaps it deserves a note in the change log

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is mentioned in section Changed models

@agramfort
Copy link
Member

@jnothman feel free to merge. I tried to address your comments.

@jnothman
Copy link
Member

jnothman commented Feb 1, 2020

Thanks @georgipeev

@jnothman jnothman merged commit 84bc8d3 into scikit-learn:master Feb 1, 2020
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SVC: Do not enforce that input data is of type np.float64
6 participants