scikit-learn · GaelVaroquaux · Jun 9, 2017 · Jul 21, 2013 · Feb 15, 2017 · Feb 15, 2017
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -1194,6 +1194,7 @@ See the :ref:`metrics` section of the user guide for further details.
    preprocessing.Normalizer
    preprocessing.OneHotEncoder
    preprocessing.PolynomialFeatures
+   preprocessing.QuantileTransformer
    preprocessing.RobustScaler
    preprocessing.StandardScaler
 
@@ -1207,6 +1208,7 @@ See the :ref:`metrics` section of the user guide for further details.
    preprocessing.maxabs_scale
    preprocessing.minmax_scale
    preprocessing.normalize
+   preprocessing.quantile_transform
    preprocessing.robust_scale
    preprocessing.scale
 

diff --git a/doc/modules/preprocessing.rst b/doc/modules/preprocessing.rst
@@ -10,6 +10,13 @@ The ``sklearn.preprocessing`` package provides several common
 utility functions and transformer classes to change raw feature vectors
 into a representation that is more suitable for the downstream estimators.
 
+In general, learning algorithms benefit from standardization of the data set. If
+some outliers are present in the set, robust scalers or transformers are more
+appropriate. The behaviors of the different scalers, transformers, and
+normalizers on a dataset containing marginal outliers is highlighted in
+:ref:`sphx_glr_auto_examples_preprocessing_plot_all_scaling.py`.
+
+
 .. _preprocessing_scaler:
 
 Standardization, or mean removal and variance scaling
@@ -39,10 +46,10 @@ operation on a single array-like dataset::
 
   >>> from sklearn import preprocessing
   >>> import numpy as np
-  >>> X = np.array([[ 1., -1.,  2.],
-  ...               [ 2.,  0.,  0.],
-  ...               [ 0.,  1., -1.]])
-  >>> X_scaled = preprocessing.scale(X)
+  >>> X_train = np.array([[ 1., -1.,  2.],
+  ...                     [ 2.,  0.,  0.],
+  ...                     [ 0.,  1., -1.]])
+  >>> X_scaled = preprocessing.scale(X_train)
 
   >>> X_scaled                                          # doctest: +ELLIPSIS
   array([[ 0.  ..., -1.22...,  1.33...],
@@ -71,7 +78,7 @@ able to later reapply the same transformation on the testing set.
 This class is hence suitable for use in the early steps of a
 :class:`sklearn.pipeline.Pipeline`::
 
-  >>> scaler = preprocessing.StandardScaler().fit(X)
+  >>> scaler = preprocessing.StandardScaler().fit(X_train)
   >>> scaler
   StandardScaler(copy=True, with_mean=True, with_std=True)
 
@@ -81,7 +88,7 @@ This class is hence suitable for use in the early steps of a
   >>> scaler.scale_                                       # doctest: +ELLIPSIS
   array([ 0.81...,  0.81...,  1.24...])
 
-  >>> scaler.transform(X)                               # doctest: +ELLIPSIS
+  >>> scaler.transform(X_train)                           # doctest: +ELLIPSIS
   array([[ 0.  ..., -1.22...,  1.33...],
          [ 1.22...,  0.  ..., -0.26...],
          [-1.22...,  1.22..., -1.06...]])
@@ -90,7 +97,8 @@ This class is hence suitable for use in the early steps of a
 The scaler instance can then be used on new data to transform it the
 same way it did on the training set::
 
-  >>> scaler.transform([[-1.,  1., 0.]])                # doctest: +ELLIPSIS
+  >>> X_test = [[-1., 1., 0.]]
+  >>> scaler.transform(X_test)                # doctest: +ELLIPSIS
   array([[-2.44...,  1.22..., -0.26...]])
 
 It is possible to disable either centering or scaling by either
@@ -248,6 +256,66 @@ a :class:`KernelCenterer` can transform the kernel matrix
 so that it contains inner products in the feature space
 defined by :math:`phi` followed by removal of the mean in that space.
 
+.. _preprocessing_transformer:
+
+Non-linear transformation
+=========================
+
+Like scalers, :class:`QuantileTransformer` puts each feature into the same
+range or distribution. However, by performing a rank transformation, it smooths
+out unusual distributions and is less influenced by outliers than scaling
+methods. It does, however, distort correlations and distances within and across
+features.
+
+:class:`QuantileTransformer` and :func:`quantile_transform` provide a
+non-parametric transformation based on the quantile function to map the data to
+a uniform distribution with values between 0 and 1::
+
+  >>> from sklearn.datasets import load_iris
+  >>> from sklearn.model_selection import train_test_split
+  >>> iris = load_iris()
+  >>> X, y = iris.data, iris.target
+  >>> X_train, X_test, y_train, y_test = train_test_split(X, y)
+  >>> quantile_transformer = preprocessing.QuantileTransformer(
+  ...     smoothing_noise=1e-12)
+  >>> X_train_trans = quantile_transformer.fit_transform(X_train)
+  >>> X_test_trans = quantile_transformer.transform(X_test)
+
+  >>> np.percentile(X_train[:, 0], [0, 25, 50, 75, 100])
+  ... # doctest: +ELLIPSIS, +SKIP
+  array([...])
+  >>> np.percentile(X_train_trans[:, 0], [0, 25, 50, 75, 100])
+  ... # doctest: +ELLIPSIS, +SKIP
+  array([...])
+  >>> np.percentile(X_test[:, 0], [0, 25, 50, 75, 100])
+  ... # doctest: +ELLIPSIS, +SKIP
+  array([...])
+  >>> np.percentile(X_test_trans[:, 0], [0, 25, 50, 75, 100])
+  ... # doctest: +ELLIPSIS, +SKIP
+  array([...])
+
+It is also possible to map the transformed data to a normal distribution by
+setting ``output_distribution='normal'``::
+
+  >>> quantile_transformer = preprocessing.QuantileTransformer(
+  ...     smoothing_noise=True, output_distribution='normal')
+  >>> X_trans = quantile_transformer.fit_transform(X)
+  >>> quantile_transformer.quantiles_ # doctest: +ELLIPSIS
+  array([...])
+
+Thus the median of the input becomes the mean of the output, centered at 0. The
+normal output is clipped so that the input's minimum and maximum ---
+corresponding to the 1e-7 and 1 - 1e-7 quantiles respectively --- do not
+become infinite under the transformation.
+
+:class:`QuantileTransformer` provides a ``smoothing_noise`` parameter to
+make the interpretation more intuitive when inspecting the
+transformation. This is particularly useful when feature values are
+replicated identically many times in the training set (e.g. prices, ordinal
+values such as user ratings, coarse-grained units of time, etc.). See
+:ref:`sphx_glr_auto_examples_preprocessing_plot_smoothing_noise_quantile_transform.py`
+for more details.
+
 .. _preprocessing_normalization:
 
 Normalization

diff --git a/doc/whats_new.rst b/doc/whats_new.rst
@@ -57,6 +57,13 @@ New features
      during the first epochs of ridge and logistic regression.
      By `Arthur Mensch`_.
 
+   - Added :class:`preprocessing.QuantileTransformer` class and
+     :func:`preprocessing.quantile_transform` function for features
+     normalization based on quantiles.
+     :issue:`8363` by :user:`Denis Engemann <dengemann>`,
+     :user:`Guillaume Lemaitre <glemaitre>`, `Olivier Grisel`_, `Raghav RV`_,
+     and :user:`Thierry Guillemot <tguillemot>`.
+
 Enhancements
 ............
 
@@ -161,7 +168,7 @@ Enhancements
    - Add ``sample_weight`` parameter to :func:`metrics.cohen_kappa_score` by
      Victor Poughon.
 
-   - In :class:`gaussian_process.GaussianProcessRegressor`, method ``predict`` 
+   - In :class:`gaussian_process.GaussianProcessRegressor`, method ``predict``
      is a lot faster with ``return_std=True`` by :user:`Hadrien Bertrand <hbertrand>`.
 
 Bug fixes
@@ -254,7 +261,7 @@ Bug fixes
      multiple inheritance context.
      :issue:`8316` by :user:`Holger Peters <HolgerPeters>`.
 
-   - Fix :func:`sklearn.linear_model.BayesianRidge.fit` to return 
+   - Fix :func:`sklearn.linear_model.BayesianRidge.fit` to return
      ridge parameter `alpha_` and `lambda_` consistent with calculated
      coefficients `coef_` and `intercept_`.
      :issue:`8224` by :user:`Peter Gedeck <gedeck>`.
@@ -5059,4 +5066,4 @@ David Huard, Dave Morrill, Ed Schofield, Travis Oliphant, Pearu Peterson.
 .. _Anish Shah: https://github.com/AnishShah
 
 .. _Neeraj Gangwar: http://neerajgangwar.in
-.. _Arthur Mensch: https://amensch.fr
+.. _Arthur Mensch: https://amensch.fr