Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333

Merged
merged 9 commits into from Mar 1, 2019

Conversation

albertcthomas
Copy link
Contributor

Fixes #13315

Values of n_quantiles larger than n_samples either do not lead to a better approximation of the cdf estimator or to a wrong approximation. See #13315 for details.

Don't know if this is considered as a bug or if this requires a deprecation cycle?

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need an entry in what's new. If we considered it as a bug fix then we don't need a deprecation.

sklearn/preprocessing/data.py Outdated Show resolved Hide resolved
@@ -1260,6 +1260,12 @@ def test_quantile_transform_check_error():
assert_raise_message(ValueError,
'Expected 2D array, got scalar array instead',
transformer.transform, 10)
# check that a warning is raised is n_quantiles > n_samples
transformer = QuantileTransformer(n_quantiles=100)
assert_warns_message(UserWarning,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest to catch warning instead?

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
self.n_quantiles = n_samples
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be called n_quantiles_ isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed I just fixed it

@glemaitre glemaitre self-requested a review February 28, 2019 16:28
@albertcthomas
Copy link
Contributor Author

Thanks @glemaitre

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
warnings.warn("n_quantiles (%s) is greater than the total number "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise LGTM

@albertcthomas
Copy link
Contributor Author

Thanks @jnothman

@jnothman jnothman merged commit 2ad8735 into scikit-learn:master Mar 1, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants