New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333
Conversation
b3e207b
to
6e1c01f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need an entry in what's new. If we considered it as a bug fix then we don't need a deprecation.
@@ -1260,6 +1260,12 @@ def test_quantile_transform_check_error(): | |||
assert_raise_message(ValueError, | |||
'Expected 2D array, got scalar array instead', | |||
transformer.transform, 10) | |||
# check that a warning is raised is n_quantiles > n_samples | |||
transformer = QuantileTransformer(n_quantiles=100) | |||
assert_warns_message(UserWarning, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use pytest to catch warning instead?
sklearn/preprocessing/data.py
Outdated
n_samples = X.shape[0] | ||
|
||
if self.n_quantiles > n_samples: | ||
self.n_quantiles = n_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be called n_quantiles_
isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes indeed I just fixed it
Thanks @glemaitre |
n_samples = X.shape[0] | ||
|
||
if self.n_quantiles > n_samples: | ||
warnings.warn("n_quantiles (%s) is greater than the total number " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise LGTM
Co-Authored-By: albertcthomas <albertthomas88@gmail.com>
d105f79
to
cfb2b19
Compare
Thanks @jnothman |
…eTransformer (scikit-learn#13333)" This reverts commit 6dd1404.
…eTransformer (scikit-learn#13333)" This reverts commit 6dd1404.
Fixes #13315
Values of
n_quantiles
larger thann_samples
either do not lead to a better approximation of the cdf estimator or to a wrong approximation. See #13315 for details.Don't know if this is considered as a bug or if this requires a deprecation cycle?