Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Ignore and pass-through NaN values in MaxAbsScaler and maxabs_scale #11011

Merged
merged 10 commits into from Jun 23, 2018
5 changes: 5 additions & 0 deletions doc/whats_new/v0.20.rst
Expand Up @@ -230,6 +230,11 @@ Preprocessing
:issue:`10404` and :issue:`11243` by :user:`Lucija Gregov <LucijaGregov>` and
:user:`Guillaume Lemaitre <glemaitre>`.

- :class:`preprocessing.MaxAbsScaler` and :func:`preprocessing.maxabs_scale`
handles and ignores NaN values.
:issue:`11011` by `Lucija Gregov <LucihaGregov>` and
:user:`Guillaume Lemaitre <glemaitre>`

Model evaluation and meta-estimators

- A scorer based on :func:`metrics.brier_score_loss` is also available.
Expand Down
22 changes: 16 additions & 6 deletions sklearn/preprocessing/data.py
Expand Up @@ -800,6 +800,9 @@ class MaxAbsScaler(BaseEstimator, TransformerMixin):

Notes
-----
NaNs are treated as missing values: disregarded in fit, and maintained in
transform.

For a comparison of the different scalers, transformers, and normalizers,
see :ref:`examples/preprocessing/plot_all_scaling.py
<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.
Expand Down Expand Up @@ -851,13 +854,14 @@ def partial_fit(self, X, y=None):
Ignored
"""
X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
estimator=self, dtype=FLOAT_DTYPES)
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')

if sparse.issparse(X):
mins, maxs = min_max_axis(X, axis=0)
mins, maxs = min_max_axis(X, axis=0, ignore_nan=True)
max_abs = np.maximum(np.abs(mins), np.abs(maxs))
else:
max_abs = np.abs(X).max(axis=0)
max_abs = np.nanmax(np.abs(X), axis=0)

# First pass
if not hasattr(self, 'n_samples_seen_'):
Expand All @@ -881,7 +885,8 @@ def transform(self, X):
"""
check_is_fitted(self, 'scale_')
X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
estimator=self, dtype=FLOAT_DTYPES)
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')

if sparse.issparse(X):
inplace_column_scale(X, 1.0 / self.scale_)
Expand All @@ -899,7 +904,8 @@ def inverse_transform(self, X):
"""
check_is_fitted(self, 'scale_')
X = check_array(X, accept_sparse=('csr', 'csc'), copy=self.copy,
estimator=self, dtype=FLOAT_DTYPES)
estimator=self, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')

if sparse.issparse(X):
inplace_column_scale(X, self.scale_)
Expand Down Expand Up @@ -937,6 +943,9 @@ def maxabs_scale(X, axis=0, copy=True):

Notes
-----
NaNs are treated as missing values: disregarded to compute the statistics,
and maintained during the data transformation.

For a comparison of the different scalers, transformers, and normalizers,
see :ref:`examples/preprocessing/plot_all_scaling.py
<sphx_glr_auto_examples_preprocessing_plot_all_scaling.py>`.
Expand All @@ -945,7 +954,8 @@ def maxabs_scale(X, axis=0, copy=True):

# If copy is required, it will be done inside the scaler object.
X = check_array(X, accept_sparse=('csr', 'csc'), copy=False,
ensure_2d=False, dtype=FLOAT_DTYPES)
ensure_2d=False, dtype=FLOAT_DTYPES,
force_all_finite='allow-nan')
original_ndim = X.ndim

if original_ndim == 1:
Expand Down
5 changes: 4 additions & 1 deletion sklearn/preprocessing/tests/test_common.py
Expand Up @@ -8,9 +8,11 @@

from sklearn.base import clone

from sklearn.preprocessing import maxabs_scale
from sklearn.preprocessing import minmax_scale
from sklearn.preprocessing import quantile_transform

from sklearn.preprocessing import MaxAbsScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.preprocessing import QuantileTransformer

Expand All @@ -27,7 +29,8 @@ def _get_valid_samples_by_column(X, col):

@pytest.mark.parametrize(
"est, func, support_sparse",
[(MinMaxScaler(), minmax_scale, False),
[(MaxAbsScaler(), maxabs_scale, True),
(MinMaxScaler(), minmax_scale, False),
(QuantileTransformer(n_quantiles=10), quantile_transform, True)]
)
def test_missing_value_handling(est, func, support_sparse):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've realised we should have assert_no_warnings in here when fitting and transforming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good point. I just catched that the QuantileTransformer was still raising a warning at inverse_transform. Since it is a single line I would included here.

Expand Down
2 changes: 1 addition & 1 deletion sklearn/utils/estimator_checks.py
Expand Up @@ -77,7 +77,7 @@
'RandomForestRegressor', 'Ridge', 'RidgeCV']

ALLOW_NAN = ['Imputer', 'SimpleImputer', 'MICEImputer',
'MinMaxScaler', 'QuantileTransformer']
'MinMaxScaler', 'MaxAbsScaler', 'QuantileTransformer']


def _yield_non_meta_checks(name, estimator):
Expand Down