Robust covaraince (fast_mcd) does not handle singular covariance matrices #3367

Closed
ojy opened this Issue Jul 11, 2014 · 3 comments

Comments

Projects
None yet
3 participants
@ojy

ojy commented Jul 11, 2014

When the c_step of fast_mcd encounters a covariance matrix with a 0 determinant, it raises an error. According to the paper (Rousseeuw & Van Driessen 1999), it shouldn't be the case. Technically, 0-determinant covariance is the optimal solution. A singular case requires a special treatment. (Section 5 of the paper). I'm new here, but I can contribute to this.

@amueller amueller added the Bug label Jan 22, 2015

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jan 22, 2015

Member

Sorry for the long silence. If you are still interested a fix would be very welcome.

Member

amueller commented Jan 22, 2015

Sorry for the long silence. If you are still interested a fix would be very welcome.

@ThatGeoGuy

This comment has been minimized.

Show comment
Hide comment
@ThatGeoGuy

ThatGeoGuy Apr 26, 2015

Contributor

Hey, sorry for the long wait, I haven't taken the time to commit to this issue since I have been busy with final projects / exams. I submitted a PR which I believe fixes this issue (PR #4635), but does manage to produce some warnings with one of the sample datasets I used to test the fix.

I believe the issue with that sample dataset (described in the PR) is different than what caused this issue. Overall, I think this issue is solved, as despite the warnings on some of the trials, this still produces the correct answer for exact fit scenarios.

Contributor

ThatGeoGuy commented Apr 26, 2015

Hey, sorry for the long wait, I haven't taken the time to commit to this issue since I have been busy with final projects / exams. I submitted a PR which I believe fixes this issue (PR #4635), but does manage to produce some warnings with one of the sample datasets I used to test the fix.

I believe the issue with that sample dataset (described in the PR) is different than what caused this issue. Overall, I think this issue is solved, as despite the warnings on some of the trials, this still produces the correct answer for exact fit scenarios.

@amueller amueller modified the milestone: 0.19 Sep 29, 2016

@ThatGeoGuy

This comment has been minimized.

Show comment
Hide comment
@ThatGeoGuy

ThatGeoGuy Feb 9, 2017

Contributor

I'm starting fresh to try and tackle this again, two years later (sorry, it seems I really delayed my old fix, which is now incompatible to an extent with the old code). I'm gonna try and write some test cases and submit a new PR. Sorry for being such a wreck about actually fixing this, I think I was quite close to getting everything correct last time.

For the record, I will also test whatever data I generate against the reference implementation in LIBRA as well, which should let me know if I change things too much.

Contributor

ThatGeoGuy commented Feb 9, 2017

I'm starting fresh to try and tackle this again, two years later (sorry, it seems I really delayed my old fix, which is now incompatible to an extent with the old code). I'm gonna try and write some test cases and submit a new PR. Sorry for being such a wreck about actually fixing this, I think I was quite close to getting everything correct last time.

For the record, I will also test whatever data I generate against the reference implementation in LIBRA as well, which should let me know if I change things too much.

@agramfort agramfort closed this in #8328 Jun 8, 2017

agramfort added a commit that referenced this issue Jun 8, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

Sundrique added a commit to Sundrique/scikit-learn that referenced this issue Jun 14, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

dmohns added a commit to dmohns/scikit-learn that referenced this issue Aug 7, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

NelleV added a commit to NelleV/scikit-learn that referenced this issue Aug 11, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

paulha added a commit to paulha/scikit-learn that referenced this issue Aug 19, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this issue Aug 29, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this issue Nov 15, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this issue Dec 18, 2017

[MRG+1] Fixes issue #3367 -> MCD fails on data with singular covarian…
…ce matrix (#8328)

* Adds failing test for issue 3367

* Adds extra comments to describe what I'm testing

Basically in this instance I discovered this bug independently from
\#3367 because I was trying to use principle components to estimate
plane normals / geometry of points in 3D space. When you have a set of
points that specify a perfect plane though (or in the case of wanting to
use MCD, you have a subset of your points that specify a perfect plane),
then the code fails because the covariance matrix is singular. However,
if your covariance matrix is singular, you've already found the set of
points with the lowest determinant. As per Rousseeuw & Van Driessen
1999, at this point you can stop searching.

The code did stop searching, however, it raised a ValueError on singular
matrices for no reason. So the correct fix should be to remove that.

* Fixes issue 3367

This should work with the test case provided.

* Adds missing argument to test

* Style corrections to pass flake runner

Implements the style corrections as mentioned in pull request #8328

* Adds entry for PR #8328 to what's new

* Adds review changes from @jnothman
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment