Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333

Merged
merged 9 commits into from Mar 1, 2019

Conversation

@albertcthomas
Copy link
Collaborator

@albertcthomas albertcthomas commented Feb 28, 2019

Fixes #13315

Values of n_quantiles larger than n_samples either do not lead to a better approximation of the cdf estimator or to a wrong approximation. See #13315 for details.

Don't know if this is considered as a bug or if this requires a deprecation cycle?

Copy link
Contributor

@glemaitre glemaitre left a comment

You need an entry in what's new. If we considered it as a bug fix then we don't need a deprecation.

Loading

sklearn/preprocessing/data.py Outdated Show resolved Hide resolved
Loading
@@ -1260,6 +1260,12 @@ def test_quantile_transform_check_error():
assert_raise_message(ValueError,
'Expected 2D array, got scalar array instead',
transformer.transform, 10)
# check that a warning is raised is n_quantiles > n_samples
transformer = QuantileTransformer(n_quantiles=100)
assert_warns_message(UserWarning,
Copy link
Contributor

@glemaitre glemaitre Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use pytest to catch warning instead?

Loading

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
self.n_quantiles = n_samples
Copy link
Contributor

@glemaitre glemaitre Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be called n_quantiles_ isn't it?

Loading

Copy link
Collaborator Author

@albertcthomas albertcthomas Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed I just fixed it

Loading

@glemaitre glemaitre self-requested a review Feb 28, 2019
@albertcthomas
Copy link
Collaborator Author

@albertcthomas albertcthomas commented Feb 28, 2019

Thanks @glemaitre

Loading

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
warnings.warn("n_quantiles (%s) is greater than the total number "
Copy link
Member

@jnothman jnothman Feb 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.

Loading

Copy link
Member

@jnothman jnothman left a comment

otherwise LGTM

Loading

@albertcthomas
Copy link
Collaborator Author

@albertcthomas albertcthomas commented Mar 1, 2019

Thanks @jnothman

Loading

@jnothman jnothman merged commit 2ad8735 into scikit-learn:master Mar 1, 2019
16 checks passed
Loading
xhlulu added a commit to xhlulu/scikit-learn that referenced this issue Apr 28, 2019
xhlulu added a commit to xhlulu/scikit-learn that referenced this issue Apr 28, 2019
xhlulu added a commit to xhlulu/scikit-learn that referenced this issue Apr 28, 2019
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this issue Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants