Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Dynamically set n_quantiles to min(n_samples, n_quantiles) in QuantileTransformer #13333

Merged
merged 9 commits into from Mar 1, 2019

fix overwritting paramaters

  • Loading branch information...
albertcthomas committed Feb 28, 2019
commit a24dd8e1fcad883157b07c49f7808e4508247e7d
Copy path View file
@@ -2076,6 +2076,10 @@ class QuantileTransformer(BaseEstimator, TransformerMixin):
Attributes
----------
n_quantiles_ : integer
The actual number of quantiles used to discretize the cumulative
distribution function.
quantiles_ : ndarray, shape (n_quantiles, n_features)
The values corresponding the quantiles of reference.
@@ -2225,18 +2229,18 @@ def fit(self, X, y=None):
n_samples = X.shape[0]

if self.n_quantiles > n_samples:
self.n_quantiles = n_samples
warnings.warn("n_quantiles (%s) is greater than the total number "

This comment has been minimized.

Copy link
@jnothman

jnothman Feb 28, 2019

Member

I think this is more verbose than is helpful. "n_quantiles (%d) is being reduced to the number of samples (%d)" if you want to add another phrase explaining why, okay, but I think it's intuitively okay.

"of samples (%s). n_quantiles will be set to "
"n_samples as more quantiles do not lead to a "
"better approximation of the used cumulative "
"distribution function estimator."
% (self.n_quantiles, n_samples))
self.n_quantiles_ = max(1, min(self.n_quantiles, n_samples))

rng = check_random_state(self.random_state)

# Create the quantiles of reference
self.references_ = np.linspace(0, 1, self.n_quantiles,
self.references_ = np.linspace(0, 1, self.n_quantiles_,
endpoint=True)
if sparse.issparse(X):
self._sparse_fit(X, rng)
@@ -1265,7 +1265,7 @@ def test_quantile_transform_check_error():
assert_warns_message(UserWarning,

This comment has been minimized.

Copy link
@glemaitre

glemaitre Feb 28, 2019

Contributor

Could you use pytest to catch warning instead?

"n_quantiles will be set to n_samples",
transformer.fit, X)
assert transformer.n_quantiles == X.shape[0]
assert transformer.n_quantiles_ == X.shape[0]


def test_quantile_transform_sparse_ignore_zeros():
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.