MinCovDet can't deal with singular matrix #21818

vnmabus · 2021-11-29T15:56:53Z

Describe the bug

MinCovDet can't deal with singular covariance, for example in the zero covariance case. Note that usual EmpiricalCovariance can deal with that case too.

Steps/Code to Reproduce

from sklearn.covariance import EmpiricalCovariance, MinCovDet
import numpy as np

data = np.zeros((3, 2))
print(EmpiricalCovariance().fit(data).covariance_)
print(MinCovDet().fit(data).covariance_)

Expected Results

[[0. 0.]
[0. 0.]]
[[0. 0.]
[0. 0.]]

Actual Results

[[0. 0.]
 [0. 0.]]
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:739: UserWarning: The covariance matrix associated to your dataset is not full rank
  "The covariance matrix associated to your dataset is not full rank"
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:806: RuntimeWarning: invalid value encountered in true_divide
  self.dist_ /= correction
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py:848: RuntimeWarning: Mean of empty slice.
  location_reweighted = data[mask].mean(0)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/core/_methods.py:182: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:380: RuntimeWarning: Mean of empty slice.
  avg = a.mean(axis)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_empirical_covariance.py:94: RuntimeWarning: Degrees of freedom <= 0 for slice
  covariance = np.cov(X.T, bias=1)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: divide by zero encountered in true_divide
  c *= np.true_divide(1, fact)
/home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/numpy/lib/function_base.py:2542: RuntimeWarning: invalid value encountered in multiply
  c *= np.true_divide(1, fact)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-037a31d6709d> in <module>
      3 data = np.zeros((3, 2))
      4 print(EmpiricalCovariance().fit(data).covariance_)
----> 5 print(MinCovDet().fit(data).covariance_)

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py in fit(self, X, y)
    763         self.correct_covariance(X)
    764         # re-weight estimator
--> 765         self.reweight_covariance(X)
    766 
    767         return self

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_robust_covariance.py in reweight_covariance(self, data)
    852         support_reweighted = np.zeros(n_samples, dtype=bool)
    853         support_reweighted[mask] = True
--> 854         self._set_covariance(covariance_reweighted)
    855         self.location_ = location_reweighted
    856         self.support_ = support_reweighted

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/covariance/_empirical_covariance.py in _set_covariance(self, covariance)
    185             is computed.
    186         """
--> 187         covariance = check_array(covariance)
    188         # set covariance
    189         self.covariance_ = covariance

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    790 
    791         if force_all_finite:
--> 792             _assert_all_finite(array, allow_nan=force_all_finite == "allow-nan")
    793 
    794     if ensure_min_samples > 0:

~/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/lib/python3.7/site-packages/sklearn/utils/validation.py in _assert_all_finite(X, allow_nan, msg_dtype)
    114             raise ValueError(
    115                 msg_err.format(
--> 116                     type_err, msg_dtype if msg_dtype is not None else X.dtype
    117                 )
    118             )

ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Versions

System:
python: 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0]
executable: /home/carlos/Programas/Utilidades/Lenguajes/miniconda3/envs/fda/bin/python
machine: Linux-5.4.0-90-generic-x86_64-with-debian-buster-sid

Python dependencies:
pip: 21.3.1
setuptools: 58.5.2
sklearn: 1.0.1
numpy: 1.21.2
scipy: 1.6.2
Cython: 0.29.14
pandas: 1.3.4
matplotlib: 3.4.2
joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

glemaitre · 2021-12-17T17:14:00Z

I am unsure if this is a bug or just expected behaviour. If we expect to return a finite or full of nan covariance matrix, then we fail. If we expect to actually fail, then we should improve the error message then.

@GaelVaroquaux will be in a better position to tell if this is the expected behaviour here.

GaelVaroquaux · 2021-12-17T17:55:45Z

Probably an expected behavior. The problem is not well defined with a singular matrix.

vnmabus · 2021-12-23T14:30:37Z

I think that if it is possible to do something sensible (like EmpiricalCovariance does), that should be done. If you have to compute a matrix and some entry cannot be computed, it is better to return the matrix with an invalid value, such as NaN, in that entry than to raise an exception. In the first case at least you can deal with the problematic part, but in the latter one you don't even know which part was problematic.

vnmabus added the Bug: triage label Nov 29, 2021

vnmabus mentioned this issue Nov 29, 2021

MS-Plot throws error if FD data consists of only straight lines GAA-UAM/scikit-fda#398

Open

cmarmo added the module:covariance label Nov 30, 2021

glemaitre added Enhancement and removed Bug: triage Enhancement labels Dec 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MinCovDet can't deal with singular matrix #21818

MinCovDet can't deal with singular matrix #21818

vnmabus commented Nov 29, 2021 •

edited

glemaitre commented Dec 17, 2021 •

edited

GaelVaroquaux commented Dec 17, 2021 via email

vnmabus commented Dec 23, 2021

MinCovDet can't deal with singular matrix #21818

MinCovDet can't deal with singular matrix #21818

Comments

vnmabus commented Nov 29, 2021 • edited

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Dec 17, 2021 • edited

GaelVaroquaux commented Dec 17, 2021 via email

vnmabus commented Dec 23, 2021

vnmabus commented Nov 29, 2021 •

edited

glemaitre commented Dec 17, 2021 •

edited