Skip to content

SimpleImputer doesn't support infinite values #13419

@gjimzhou

Description

@gjimzhou

Description

SimpleImputer will throw ValueError if the array passed in contains np.inf which is not the case in the old Imputer.

This may be caused by calling the _validate_input before actual fitting or transforming.

Steps/Code to Reproduce

s = np.array([-np.inf, 2, 3, 4, 5, 6, np.inf]).reshape(-1, 1)
from sklearn.impute import SimpleImputer
transformer = SimpleImputer(strategy='median')
transformer.fit(s)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-28-ee05450ab4c6> in <module>()
      3 transformer = SimpleImputer(strategy='median')
      4 
----> 5 transformer.fit(s)

~/anaconda3/lib/python3.6/site-packages/sklearn/impute.py in fit(self, X, y)
    221         self : SimpleImputer
    222         """
--> 223         X = self._validate_input(X)
    224 
    225         # default fill_value is 0 for numerical input and "missing_value"

~/anaconda3/lib/python3.6/site-packages/sklearn/impute.py in _validate_input(self, X)
    195                                  "".format(self.strategy, X.dtype.kind))
    196             else:
--> 197                 raise ve
    198 
    199         _check_inputs_dtype(X, self.missing_values)

~/anaconda3/lib/python3.6/site-packages/sklearn/impute.py in _validate_input(self, X)
    188         try:
    189             X = check_array(X, accept_sparse='csc', dtype=dtype,
--> 190                             force_all_finite=force_all_finite, copy=self.copy)
    191         except ValueError as ve:
    192             if "could not convert" in str(ve):

~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    571         if force_all_finite:
    572             _assert_all_finite(array,
--> 573                                allow_nan=force_all_finite == 'allow-nan')
    574 
    575     shape_repr = _shape_repr(array.shape)

~/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan)
     54                 not allow_nan and not np.isfinite(X).all()):
     55             type_err = 'infinity' if allow_nan else 'NaN, infinity'
---> 56             raise ValueError(msg_err.format(type_err, X.dtype))
     57 
     58 

ValueError: Input contains infinity or a value too large for dtype('float64').

Expected Results

s = np.array([-np.inf, 2, 3, 4, 5, 6, np.inf]).reshape(-1, 1)
from sklearn.preprocessing import Imputer
transformer = Imputer(strategy='median')
transformer.fit(s)
/home/jzhou/anaconda3/lib/python3.6/site-packages/sklearn/utils/deprecation.py:58: DeprecationWarning:

Class Imputer is deprecated; Imputer was deprecated in version 0.20 and will be removed in 0.22. Import impute.SimpleImputer from sklearn instead.

Imputer(axis=0, copy=True, missing_values='NaN', strategy='median', verbose=0)

Versions

System:
    python: 3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34)  [GCC 7.3.0]
executable: /home/jzhou/anaconda3/bin/python
   machine: Linux-4.10.0-42-generic-x86_64-with-debian-stretch-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/jzhou/anaconda3/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 19.0.3
setuptools: 40.8.0
   sklearn: 0.20.2
     numpy: 1.16.2
     scipy: 1.2.1
    Cython: 0.28.2
    pandas: 0.24.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions