Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MinCovDet can't deal with singular matrix #21818

Open
vnmabus opened this issue Nov 29, 2021 · 3 comments
Open

MinCovDet can't deal with singular matrix #21818

vnmabus opened this issue Nov 29, 2021 · 3 comments

Comments

@vnmabus
Copy link
Contributor

vnmabus commented Nov 29, 2021

Describe the bug

MinCovDet can't deal with singular covariance, for example in the zero covariance case. Note that usual EmpiricalCovariance can deal with that case too.

Steps/Code to Reproduce

from sklearn.covariance import EmpiricalCovariance, MinCovDet
import numpy as np

data = np.zeros((3, 2))
print(EmpiricalCovariance().fit(data).covariance_)
print(MinCovDet().fit(data).covariance_)

Expected Results

[[0. 0.]
[0. 0.]]
[[0. 0.]
[0. 0.]]

Actual Results

[[0. 0.]
 [0. 0.]]
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:806: RuntimeWarning: invalid value encountered in true_divide
  self.dist_ /= correction
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:848: RuntimeWarning: Mean of empty slice.
  location_reweighted = data[mask].mean(0)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/core/_methods.py:182: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:380: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_empirical_covariance.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
  covariance = np.cov(X.T, bias=1)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide
  c *= np.true_divide(1, fact)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: invalid value encountered in multiply
  c *= np.true_divide(1, fact)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-037a31d6709d> in <module>
      3 data = np.zeros((3, 2))
      4 print(EmpiricalCovariance().fit(data).covariance_)
----> 5 print(MinCovDet().fit(data).covariance_)

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py in fit(self, X, y)
    763         self.correct_covariance(X)
    764         # re-weight estimator
--> 765         self.reweight_covariance(X)
    766 
    767         return self

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py in reweight_covariance(self, data)
    852         support_reweighted = np.zeros(n_samples, dtype=bool)
    853         support_reweighted[mask] = True
--> 854         self._set_covariance(covariance_reweighted)
    855         self.location_ = location_reweighted
    856         self.support_ = support_reweighted

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_empirical_covariance.py in _set_covariance(self, covariance)
    185             is computed.
    186         """
--> 187         covariance = check_array(covariance)
    188         # set covariance
    189         self.covariance_ = covariance

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    790 
    791         if force_all_finite:
--> 792             _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    793 
    794     if ensure_min_samples > 0:

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    114             raise ValueError(
    115                 msg_err.format(
--> 116                     type_err, msg_dtype if msg_dtype is not None else X.dtype
    117                 )
    118             )

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions

System:
python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
executable: /home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/bin/python
machine: Linux-5.4.0-90-generic-x86_64-with-debian-buster-sid

Python dependencies:
pip: 21.3.1
setuptools: 58.5.2
sklearn: 1.0.1
numpy: 1.21.2
scipy: 1.6.2
Cython: 0.29.14
pandas: 1.3.4
matplotlib: 3.4.2
joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

@glemaitre
Copy link
Member

glemaitre commented Dec 17, 2021

I am unsure if this is a bug or just expected behaviour. If we expect to return a finite or full of nan covariance matrix, then we fail. If we expect to actually fail, then we should improve the error message then.

@GaelVaroquaux will be in a better position to tell if this is the expected behaviour here.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Dec 17, 2021 via email

@vnmabus
Copy link
Contributor Author

vnmabus commented Dec 23, 2021

I think that if it is possible to do something sensible (like EmpiricalCovariance does), that should be done. If you have to compute a matrix and some entry cannot be computed, it is better to return the matrix with an invalid value, such as NaN, in that entry than to raise an exception. In the first case at least you can deal with the problematic part, but in the latter one you don't even know which part was problematic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants