Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX degree in polynomial kernel should be a floating number #27668

Merged
merged 13 commits into from Dec 1, 2023
Merged

FIX degree in polynomial kernel should be a floating number #27668

merged 13 commits into from Dec 1, 2023

Conversation

NolantheNerd
Copy link
Contributor

Reference Issues/PRs

None - this fix is not associated with an issue.

What does this implement/fix? Explain your changes.

This fix only changes the docs.

The SpectralClustering class' degree parameter is constrained to positive integers, but the SpectralClustering class' docstring specifies that degree can be a float.

I've updated the docstring to require that degree be passed as an int.

The SpectralClustering class' parameter `degree` is constrained to
positive integers, but the docstring specifies that `degree` can be a
float.

I've updated the docstring to require that `degree` be passed as an int.
@github-actions
Copy link

github-actions bot commented Oct 25, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 588fd5e. Link to the linter CI: here

@NolantheNerd NolantheNerd changed the title Fixes Degree Type Discrepency in SpectralClustering Class Docstring Fix Degree Type Discrepency in SpectralClustering Class Docstring Oct 25, 2023
@glemaitre
Copy link
Member

I think the documentation is correct but the code is wrong.

The degree will be used by a the polynomial kernel:

@validate_params(
{
"X": ["array-like", "sparse matrix"],
"Y": ["array-like", "sparse matrix", None],
"degree": [Interval(Real, 1, None, closed="left")],
"gamma": [
Interval(Real, 0, None, closed="left"),
None,
Hidden(np.ndarray),
],
"coef0": [Interval(Real, None, None, closed="neither")],
},
prefer_skip_nested_validation=True,
)
def polynomial_kernel(X, Y=None, degree=3, gamma=None, coef0=1):

and this is constrained to be a float.

@jeremiedbb I see that I missed this thing in the following review: #24035

We also tolerate an integer at 0 that will be excluded by the kernel itself. I assume that we could fore all the degrees parameter to "degree": [Interval(Real, 0, None, closed="left")]

@jeremiedbb WDYT?

@glemaitre
Copy link
Member

Interacting IRL with @jeremiedbb, we agreed that this is a regression and that we should use a Real number instead of Integral her. @NolantheNerd Do you want to contribute this change?

@glemaitre glemaitre changed the title Fix Degree Type Discrepency in SpectralClustering Class Docstring FIX degree in polynomial kernel should be a floating number Oct 30, 2023
NolantheNerd and others added 2 commits October 30, 2023 11:46
Changed degree type requirement from Integral to Real in
SpectralClustering class. This now matches both the SpectralClustering
docstring and the expected behaviour of degree, given that it is passed
to the polynomial kernel.
@NolantheNerd
Copy link
Contributor Author

Done! Ready for review!

@glemaitre
Copy link
Member

Can you also update:

  • the degree parameter from KernelPCA to be a real as well
  • the degree parameter from KerelRidge to be a real as well

In addition, we will need an entry in the changelog to acknowledge the regression.

…ow/scikit-learn into docs/spectralclustering-degree-type
In the SpectralClustering, KernelPCA and KernelRidge constructors, the
degree parameter was modified to allow float values, not just integral
values. The respective class' have had their docstrings updated to
reflect this change.
@NolantheNerd
Copy link
Contributor Author

Changes complete! Now for cluster.SpectralClustering,kernel_ridge.KernelRidge and decomposition.KernelPCA:

  • The docstring reflects that degree is a float.
  • The default value has been changed from 3 -> 3.0 in the docstring to further clarify that floats are permitted.
  • The default value in the constructor has been changed from 3 -> 3.0.

There doesn't seem to be a consistent pattern for integer/float disambiguation in the codebase, for example, in the KernelPCA docstring, both 1 and 1.0 are used for float values. @glemaitre Is there a "right" way to do it?

coef0 : float, default=1
Independent term in poly and sigmoid kernels.
Ignored by other kernels.
kernel_params : dict, default=None
Parameters (keyword arguments) and
values for kernel passed as callable object.
Ignored by other kernels.
alpha : float, default=1.0
Hyperparameter of the ridge regression that learns the
inverse transform (when fit_inverse_transform=True).

@glemaitre
Copy link
Member

There doesn't seem to be a consistent pattern for integer/float disambiguation in the codebase

In this case both integral an float have the same behaviour so there is no need for disambiguation.

Sometimes we have float and integral that does not behave the same but usually the range of the accepted values will be different and the parameter validation will be here to help.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only think that we can keep 3 instead of 3.0 since this is already a valid Real.

Otherwise LGTM. Thanks @NolantheNerd

sklearn/decomposition/_kernel_pca.py Outdated Show resolved Hide resolved
sklearn/decomposition/_kernel_pca.py Outdated Show resolved Hide resolved
sklearn/kernel_ridge.py Outdated Show resolved Hide resolved
sklearn/kernel_ridge.py Outdated Show resolved Hide resolved
sklearn/cluster/_spectral.py Outdated Show resolved Hide resolved
sklearn/cluster/_spectral.py Outdated Show resolved Hide resolved
NolantheNerd and others added 6 commits November 3, 2023 20:40
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
@glemaitre glemaitre added Waiting for Second Reviewer First reviewer is done, need a second one! Blocker labels Nov 4, 2023
@glemaitre glemaitre added this to the 1.4 milestone Nov 4, 2023
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @NolantheNerd

@OmarManzoor
Copy link
Contributor

@NolantheNerd I don't see any conflicts but Github is complaining that there are. Could you kindly check if there are any conflicts with main or otherwise maybe push an empty commit to retrigger the CI?

@NolantheNerd
Copy link
Contributor Author

@OmarManzoor looks like there was a conflict in the news file. I've fixed the conflict.

@glemaitre glemaitre merged commit 37f6802 into scikit-learn:main Dec 1, 2023
27 checks passed
@glemaitre
Copy link
Member

Merging then. Thanks @NolantheNerd

@NolantheNerd NolantheNerd deleted the docs/spectralclustering-degree-type branch December 3, 2023 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker module:cluster Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants