Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Fix bug in graph lasso when n_features=2 #13276

Merged
merged 6 commits into from Feb 27, 2019

Conversation

4 participants
@bellet
Copy link
Contributor

bellet commented Feb 26, 2019

Reference Issues/PRs

Fixes a bug introduced in #9858
Related to some discussions in #12228

What does this implement/fix? Explain your changes.

This PR fixes a subtle bug in the graph lasso implementation, which was introduced in PR #9858 proposing an online way to update the matrix sub_covariance (leading to significant speed ups for large dense matrices). This bug affects both the CD and LARS solvers as it is in the outer loop.

More precisely, the problem originates from the fact that np.ascontiguousarray does not always copy the original array:

In [1]: 
   ...: import numpy as np
   ...: a = np.arange(10)
   ...: print(a)
   ...: b = np.ascontiguousarray(a)
   ...: b[0] = -1
   ...: print(a)
   ...: 
[0 1 2 3 4 5 6 7 8 9]
[-1  1  2  3  4  5  6  7  8  9]

This problem led to the matrix covariance_ being modified when sub_covariance is updated, which is not correct.

Example of incorrect behavior on a simple PSD matrix taken from #12228 (comment):

In [1]: import numpy as np 
   ...: from sklearn.covariance import graphical_lasso
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: cov, _ = graphical_lasso(A, alpha=0.1, verbose=True, max_iter=5)
   ...: print(cov)
   ...: 
[graphical_lasso] Iteration   0, cost  1.24e+01, dual gap 4.728e-01
[graphical_lasso] Iteration   1, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   2, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   3, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   4, cost  1.19e+01, dual gap -9.262e-01
/home/aurelien/Documents/research/github/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 5 iteration: dual gap: -9.262e-01
  % (max_iter, d_gap), ConvergenceWarning)
[[6.  1.9]
 [1.9 6. ]]

With this PR:

In [1]: import numpy as np 
   ...: from sklearn.covariance import graphical_lasso
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: cov, _ = graphical_lasso(A, alpha=0.1, verbose=True, max_iter=5)
   ...: print(cov)
   ...: 
[graphical_lasso] Iteration   0, cost  1.02e+01, dual gap 3.331e-16
[[6.  1.9]
 [1.9 1. ]]

This is consistent with the output of graph lasso as implemented in package skggm:

In [1]: import numpy as np 
   ...: from inverse_covariance import quic
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1)
   ...: print(cov)
   ...:               
[[6.00000005 1.9       ]
 [1.9        1.00000001]]

UPDATE: the problem arises only when n_features=2, hence has small scope. This explains why the tests passed for PR #9858. The root cause of the problem can still valid for larger matrices, as seen in this example:

In [1]: import numpy as np
   ...: a = np.ones((3, 3))
   ...: print(a)
   ...: b = np.ascontiguousarray(a)
   ...: b[0] = [-1, -2, -1]
   ...: print(a)
   ...: 
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[-1. -2. -1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

However slicing the two dimensions at the same time (as done for graph lasso) forces np.ascontiguousarray to make a copy when n_features>2:

In [1]: import numpy as np
   ...: a = np.ones((3, 3))
   ...: print(a)
   ...: b = np.ascontiguousarray(a[1:, 1:])
   ...: b[0] = [-1, -2]
   ...: print(a)
   ...: 
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

When n_features=2, the output is a single value (which is already contiguous), hence the bug.

@bellet bellet changed the title [MRG] Fix bug in graph lasso [WIP] Fix bug in graph lasso Feb 26, 2019

@bellet

This comment has been minimized.

Copy link
Contributor Author

bellet commented Feb 26, 2019

I will try to add some tests

@bellet

This comment has been minimized.

Copy link
Contributor Author

bellet commented Feb 26, 2019

See my update in the PR text: it looks like the bug has small scope as it only occurs when n_features=2. I will add a specific test for this case.

@bellet bellet changed the title [WIP] Fix bug in graph lasso [WIP] Fix bug in graph lasso when n_features=2 Feb 26, 2019

@vene

This comment has been minimized.

Copy link
Member

vene commented Feb 26, 2019

Interesting but makes sense that it only happens when n_features=2; because it ends up with a size-1 tensor, right? This is the only case when such a slicing could yield a contiguous view; otherwise you can't avoid crossing some strides. Neat, I guess... 😵

the test is a good idea

@bellet

This comment has been minimized.

Copy link
Contributor Author

bellet commented Feb 26, 2019

Interesting but makes sense that it only happens when n_features=2; because it ends up with a size-1 tensor, right? This is the only case when such a slicing could yield a contiguous view; otherwise you can't avoid crossing some strides. Neat, I guess... dizzy_face

Yes, this is exactly what happens

@bellet

This comment has been minimized.

Copy link
Contributor Author

bellet commented Feb 26, 2019

I added a test for the case n_features=2, which failed before and passes now
This should now be ready to merge

@bellet bellet changed the title [WIP] Fix bug in graph lasso when n_features=2 [MRG] Fix bug in graph lasso when n_features=2 Feb 26, 2019

@agramfort agramfort added this to In progress in Sprint Paris 2019 Feb 26, 2019

@GaelVaroquaux GaelVaroquaux moved this from In progress to Needs review in Sprint Paris 2019 Feb 26, 2019

@jnothman jnothman added this to the 0.20.3 milestone Feb 26, 2019

@jnothman
Copy link
Member

jnothman left a comment

Does this have your approval @vene? At a glance it looks good to me, but I don't feel I've looked carefully at the context.

Please add an entry to the change log at doc/whats_new/v0.20.rst under version 0.20.3. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@vene

This comment has been minimized.

Copy link
Member

vene commented Feb 26, 2019

@jnothman the change itself has my approval, yes.

Since I'm out of date with our deprecation policies, i was waiting for someone else to weigh in on whether there should be an added test for the deprecated graph_lasso function which is just an alias for graphical_lasso or not.

bellet added some commits Feb 26, 2019

@bellet

This comment has been minimized.

Copy link
Contributor Author

bellet commented Feb 26, 2019

All done !

@vene

vene approved these changes Feb 26, 2019

Copy link
Member

vene left a comment

Looks good to me, thanks @bellet!

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Feb 26, 2019

Since I'm out of date with our deprecation policies, i was waiting for someone else to weigh in on whether there should be an added test for the deprecated graph_lasso function which is just an alias for graphical_lasso or not.

My policy is that worrying about this sort of question for deprecated things is more effort than it's worth.

@agramfort agramfort merged commit adc1e59 into scikit-learn:master Feb 27, 2019

8 of 9 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

Sprint Paris 2019 automation moved this from Needs review to Done Feb 27, 2019

@agramfort

This comment has been minimized.

Copy link
Member

agramfort commented Feb 27, 2019

thx @bellet

@bellet bellet deleted the bellet:glasso_fix branch Feb 27, 2019

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Feb 27, 2019

[MRG] Fix bug in graph lasso when n_features=2 (scikit-learn#13276)
* fix bug in graph lasso

* better fix

* add test for case n_features=2

* remove test for deprecated version

* update whats new

* note that the bug was a regression

Kiku-git added a commit to Kiku-git/scikit-learn that referenced this pull request Mar 4, 2019

[MRG] Fix bug in graph lasso when n_features=2 (scikit-learn#13276)
* fix bug in graph lasso

* better fix

* add test for case n_features=2

* remove test for deprecated version

* update whats new

* note that the bug was a regression

@wdevazelhes wdevazelhes referenced this pull request Mar 5, 2019

Merged

[MRG] FIX: make proposal for sdml formulation #162

5 of 5 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.