Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333

Merged
merged 9 commits into from Mar 1, 2019

Conversation

Projects
None yet
3 participants
@albertcthomas
Copy link
Contributor

commented Feb 28, 2019

Fixes #13315

Values of n_quantiles larger than n_samples either do not lead to a better approximation of the cdf estimator or to a wrong approximation. See #13315 for details.

Don't know if this is considered as a bug or if this requires a deprecation cycle?

@albertcthomas albertcthomas force-pushed the albertcthomas:n_quantiles branch from b3e207b to 6e1c01f Feb 28, 2019

@glemaitre
Copy link
Contributor

left a comment

You need an entry in what's new. If we considered it as a bug fix then we don't need a deprecation.

Show resolved Hide resolved sklearn/preprocessing/data.py Outdated
@@ -1260,6 +1260,12 @@ def test_quantile_transform_check_error():
assert_raise_message(ValueError,
'Expected 2D array, got scalar array instead',
transformer.transform, 10)
# check that a warning is raised is n_quantiles > n_samples
transformer = QuantileTransformer(n_quantiles=100)
assert_warns_message(UserWarning,

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 28, 2019

Contributor

Could you use pytest to catch warning instead?

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
self.n_quantiles = n_samples

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 28, 2019

Contributor

It should be called n_quantiles_ isn't it?

This comment has been minimized.

Copy link
@albertcthomas

albertcthomas Feb 28, 2019

Author Contributor

yes indeed I just fixed it

@glemaitre glemaitre self-requested a review Feb 28, 2019

@albertcthomas

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2019

Thanks @glemaitre

n_samples = X.shape[0]

if self.n_quantiles > n_samples:
warnings.warn("n_quantiles (%s) is greater than the total number "

This comment has been minimized.

Copy link
@jnothman

jnothman Feb 28, 2019

Member

I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.

@jnothman
Copy link
Member

left a comment

otherwise LGTM

@albertcthomas albertcthomas force-pushed the albertcthomas:n_quantiles branch from d105f79 to cfb2b19 Mar 1, 2019

@albertcthomas

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

Thanks @jnothman

@jnothman jnothman merged commit 2ad8735 into scikit-learn:master Mar 1, 2019

9 checks passed

LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 92.21%)
Details
codecov/project 92.22% (+<.01%) compared to 04a5733
Details

@jrbourbeau jrbourbeau referenced this pull request Mar 3, 2019

Merged

Fix sklearn dev tests #474

Kiku-git added a commit to Kiku-git/scikit-learn that referenced this pull request Mar 4, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.