added LinearOperator support in safe_sparse_dot() #16463

PavelStishenko · 2020-02-17T15:39:55Z

Reference Issues/PRs

PavelStishenko · 2020-02-18T06:39:55Z

Please, help! I can not see details of the codecov/patch check. It shows 500 server error.

rth · 2020-02-18T09:32:25Z

sklearn/utils/extmath.py

@@ -148,7 +149,10 @@ def safe_sparse_dot(a, b, dense_output=False):
        else:
            ret = np.dot(a, b)
    else:
-        ret = a @ b
+        if isinstance(b, LinearOperator) and not isinstance(a, LinearOperator):
+            ret = (b.T @ a.T).T


Can you explain this line? a @ b does not work? Do we actually need to handle this case? I'm OK with the rest of changes, but if we can avoid this special case it would be better IMO.

Can you explain this line? a @ b does not work? Do we actually need to handle this case? I'm OK with the rest of changes, but if we can avoid this special case it would be better IMO.

Yes, a @ b doesn't work if b is a LinearOperator and a is not. Because the first argument of the @ operator defines actual function that do the job. LinearOperator can be multiplied by anything, but ndarray or sparse matrices know nothing about LinearOperator and do not know how to proceed.

import numpy as np from scipy.sparse.linalg import aslinearoperator A = np.random.random((4,4)) op = aslinearoperator(A) print (op @ A) # works fine print (A @ op) # ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)

And so this is necessary for randomized_svd?

And so this is necessary for randomized_svd?

Definitely. LinearOperator can incapsulate anything that behave like a matrix via matvec() implementation. Obvious examples are SVD decomposition, einsum() routine, any tensor network with two free legs.

Why not return b@a then, it should work since here b and a are 1d vectors?

Why not return b@a then, it should work since here b and a are 1d vectors?

They can be 2D.

PavelStishenko · 2020-02-21T09:34:11Z

Any help with codecov/patch please. It still shows 500-th server error

rth

Please add a unit test, testing the behaviour of safe_sparse_dot(array, LinearOperator) and safe_sparse_dot(LinearOperator, array) to sklearn/utils/tests/test_extmath.py.

That would also fix the coverage error.

Also please add an entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:.

…_dot(LinearOperator, array) adjusted docstrings and added whats_new entry

rth

Currently the added test fails in some CI with,

AttributeError: 'MatrixLinearOperator' object has no attribute '_transpose'

Maybe there is a minimal scipy version required? If so, that is fine, but it should be indicated in the what's new, with a comment in the code and the test skipped using,

pytest.importorskip("docutils", minversion="???")

rth · 2020-02-21T12:15:59Z

doc/whats_new/v0.23.rst

@@ -341,6 +341,10 @@ Changelog
  pandas sparse DataFrame.
  :pr:`16021` by :user:`Rushabh Vasani <rushabh-v>`.

+- |Feature| :func:`utils.extmath.randomized_svd` now accepts 


and utils.extmath.safe_dot_product

rth · 2020-02-21T13:52:19Z

doc/whats_new/v0.23.rst

@@ -341,7 +341,8 @@ Changelog
  pandas sparse DataFrame.
  :pr:`16021` by :user:`Rushabh Vasani <rushabh-v>`.

- |Feature| :func:`utils.extmath.randomized_svd` now accepts 
+- |Feature| :func:`utils.extmath.randomized_svd` and 


Tests are still failing. Also please add a test for randomized_svd on a LinearOperator.

Fixed. The test added

…with old SciPy

rth

Minor comment otherwise LGTM. Thanks @Mazay0 !

sklearn/utils/tests/test_extmath.py

… with naming convention and added comment about motivation behind this class

PavelStishenko · 2020-02-27T12:02:06Z

@glemaitre review please.

NicolasHug

Thanks for the PR @Mazay0 . Made a few comments.

@rth I can't say I'm sold on the need to support LinearOperator. The changes seem minimal here, but who knows how they might complicate things in the future? I'm not saying we shouldn't do this, but I think we need to be super careful when introducing support for a new data structure. WDYT?

NicolasHug · 2020-04-08T12:03:27Z

sklearn/utils/extmath.py

@@ -148,7 +149,10 @@ def safe_sparse_dot(a, b, dense_output=False):
        else:
            ret = np.dot(a, b)
    else:
-        ret = a @ b
+        if isinstance(b, LinearOperator) and not isinstance(a, LinearOperator):
+            ret = (b.T @ a.T).T


Why not return b@a then, it should work since here b and a are 1d vectors?

sklearn/utils/tests/test_extmath.py

rth · 2020-04-08T12:47:39Z

I can't say I'm sold on the need to support LinearOperator. The changes seem minimal here, but who knows how they might complicate things in the future? I'm not saying we shouldn't do this, but I think we need to be super careful when introducing support for a new data structure. WDYT?

@NicolasHug I generally share your feeling. However the changes are indeed minimal and LinearOperator is part of of scipy.sparse so it's not like we are adding support for a new library. Of course, I'm not saying that we would support it everywhere, not will we provide much official support. But if it makes life easier for some contributors, they are willing make a PR to fix it, and changes are minimal, then why not.

PavelStishenko · 2020-04-09T05:17:30Z

Thanks for the PR @Mazay0 . Made a few comments.

@rth I can't say I'm sold on the need to support LinearOperator. The changes seem minimal here, but who knows how they might complicate things in the future? I'm not saying we shouldn't do this, but I think we need to be super careful when introducing support for a new data structure. WDYT?

LinearOperator is supposed to mimic matrix behavior. Progress in its development goes in this direction, so in future in should be even simpler to support it. It is especially true given that sklearn supports scipy's sparse matrices and anyway shouldn't rely on internal data representation.
Key feature of Randomized-SVD is that it is a matrix-free method and support of LinearOperator is the most natural way to employ this feature.

rth · 2020-04-19T21:18:35Z

@NicolasHug Would be good to make a decision on this one (in one direction or another). Has your position changed since last review?

NicolasHug · 2020-04-20T13:31:27Z

Has your position changed since last review?

Not really honestly. Maybe other @scikit-learn/core-devs should chime in

adrinjalali

Where would this apply? only when people directly pass a LinearOperator to these functions? what are the consequences of the changed returned value for the user in places where these functions are used internally?

adrinjalali · 2020-04-20T14:27:14Z

sklearn/utils/extmath.py

@@ -131,7 +132,7 @@ def safe_sparse_dot(a, b, dense_output=False):
    dot_product : array or sparse matrix
        sparse if ``a`` and ``b`` are sparse and ``dense_output=False``.
    """
-    if a.ndim > 2 or b.ndim > 2:
+    if len(a.shape) > 2 or len(b.shape) > 2:


I vaguely remember this potentially being expensive compared to a.ndim. Is that not a concern here?

Probably it is more expensive, but the difference is negligible compared with subsequent operations.

I vaguely remember this potentially being expensive compared to a.ndim. Is that not a concern here?

scipy folks provisionally agreed to accept PR with added ndim attribute to the LinearOperator: scipy/scipy#11908 (comment)
So this row can be reverted to the original state.

SciPy folks merged PR with ndim attribute for LinearOperator: scipy/scipy#11915 So in future this row can be reverted to ndim again.

adrinjalali · 2020-04-20T14:29:46Z

sklearn/utils/extmath.py

@@ -131,7 +132,7 @@ def safe_sparse_dot(a, b, dense_output=False):
    dot_product : array or sparse matrix


This PR also changes the output type.

Yes indeed. I will fix the docstring and will add an appropriate test, if you at least in principle agree that this PR can be accepted.

rth · 2020-04-20T15:53:37Z

Where would this apply? only when people directly pass a LinearOperator to these functions?

Yes, only if you pass LinearOperator as input.

what are the consequences of the changed returned value for the user in places where these functions are used internally?

Right now it would have no consequence internally since it's safe_dot_product is never used with LinearOperator. We do use LinearOperator is other parts of the code base. However it would allow users to use LinearOperator as input to randomized_svd (cf parent issue) in their own code. The proposed changes seem to me like a small price for that. Of course we are not promising any official support of it.

Several scipy function, particularly for matrix decomposition (svds etc) support LinearOperator as input alongside sparse matrices https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html#matrix-factorizations

PavelStishenko · 2020-04-21T10:56:54Z

Where would this apply? only when people directly pass a LinearOperator to these functions? what are the consequences of the changed returned value for the user in places where these functions are used internally?

Only when people pass some objects derived from LinearOperator. This is nice if you want to encapsulate internal data structure in some custom way, other than scipy.sparse classes. Returned value will not change for cases where numpy or scipy sparse arrays are used. LinearOperators could not be used before, so no code will be broken by this PR.

adrinjalali · 2020-04-21T12:14:43Z

Thanks for following up the ndim issue on scipy @Mazay0 .

I checked the LinearOperator doc, and it seems it's kind of a separate thing compared to __array_function__. I was hoping that when we move to supporting either __array_function__ or __array_module__, then LinearOperator would automatically be supported, but it seems that's not the case.

I guess the other question is, why is LinearOperator not implementing __array_function__?

jnothman · 2020-04-27T13:58:13Z

We should probably have discussed this at the monthly meeting. Labelling as needs decision.

jnothman

Unless we explicitly document some estimators or equivalent functions as accepting a LinearOperator, I find this support in a utility a bit obscure.

added LinearOperator support in safe_sparse_dot()

41af221

PavelStishenko requested a review from glemaitre February 18, 2020 06:38

rth reviewed Feb 18, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into rand_svd_linop

f2a870e

rth reviewed Feb 21, 2020

View reviewed changes

pvstishenko added 2 commits February 21, 2020 17:16

Added test for safe_sparse_dot(array, LinearOperator) and safe_sparse…

ebe8993

…_dot(LinearOperator, array) adjusted docstrings and added whats_new entry

fix for flake8 compliance

2fdd447

rth reviewed Feb 21, 2020

View reviewed changes

pvstishenko added 3 commits February 21, 2020 19:07

make test work with old scipy

3c1ecfa

set minimal scipy version for test

a04957e

use testing class for LiearOperator stub

367ef98

rth reviewed Feb 21, 2020

View reviewed changes

pvstishenko added 4 commits February 25, 2020 15:40

fixed lint errors

5c11cb5

added explicit _transpose in TestingLinearOperator for compatibility …

81aaebe

…with old SciPy

added test for randomized_svd with LinearOperator

8c61f0b

removed unnecessary method from TestingLinearOperator

0b0cef1

rth approved these changes Feb 26, 2020

View reviewed changes

sklearn/utils/tests/test_extmath.py Outdated Show resolved Hide resolved

Renamed TestingLinearOperator to CustomLinearOperator for consistency…

8d06262

… with naming convention and added comment about motivation behind this class

github-actions bot added the module:utils label Mar 2, 2020

Merge branch 'master' into rand_svd_linop

c8e4e6c

rth requested a review from NicolasHug April 8, 2020 11:36

NicolasHug reviewed Apr 8, 2020

View reviewed changes

adrinjalali reviewed Apr 20, 2020

View reviewed changes

PavelStishenko mentioned this pull request Apr 21, 2020

LinearOperator should have ndim attribute scipy/scipy#11908

Closed

jnothman added the Needs Decision Requires decision label Apr 27, 2020

jnothman reviewed Apr 27, 2020

View reviewed changes

Base automatically changed from master to main January 22, 2021 10:52

glemaitre removed their request for review April 10, 2021 16:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added LinearOperator support in safe_sparse_dot() #16463

added LinearOperator support in safe_sparse_dot() #16463

PavelStishenko commented Feb 17, 2020

PavelStishenko commented Feb 18, 2020

rth Feb 18, 2020

PavelStishenko Feb 18, 2020 •

edited

rth Feb 18, 2020

PavelStishenko Feb 18, 2020

NicolasHug Apr 8, 2020

PavelStishenko Apr 8, 2020

PavelStishenko commented Feb 21, 2020

rth left a comment

rth left a comment

rth Feb 21, 2020

rth Feb 21, 2020

PavelStishenko Feb 25, 2020

rth left a comment

PavelStishenko commented Feb 27, 2020

NicolasHug left a comment

NicolasHug Apr 8, 2020

rth commented Apr 8, 2020

PavelStishenko commented Apr 9, 2020

rth commented Apr 19, 2020 •

edited

NicolasHug commented Apr 20, 2020

adrinjalali left a comment

adrinjalali Apr 20, 2020

PavelStishenko Apr 21, 2020

PavelStishenko Apr 21, 2020

PavelStishenko Apr 27, 2020

adrinjalali Apr 20, 2020

PavelStishenko Apr 21, 2020

rth commented Apr 20, 2020

PavelStishenko commented Apr 21, 2020

adrinjalali commented Apr 21, 2020

jnothman commented Apr 27, 2020

jnothman left a comment

		@@ -131,7 +132,7 @@ def safe_sparse_dot(a, b, dense_output=False):
		dot_product : array or sparse matrix

added LinearOperator support in safe_sparse_dot() #16463

Are you sure you want to change the base?

added LinearOperator support in safe_sparse_dot() #16463

Conversation

PavelStishenko commented Feb 17, 2020

Reference Issues/PRs

PavelStishenko commented Feb 18, 2020

Choose a reason for hiding this comment

PavelStishenko Feb 18, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PavelStishenko commented Feb 21, 2020

rth left a comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth left a comment

Choose a reason for hiding this comment

PavelStishenko commented Feb 27, 2020

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented Apr 8, 2020

PavelStishenko commented Apr 9, 2020

rth commented Apr 19, 2020 • edited

NicolasHug commented Apr 20, 2020

adrinjalali left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rth commented Apr 20, 2020

PavelStishenko commented Apr 21, 2020

adrinjalali commented Apr 21, 2020

jnothman commented Apr 27, 2020

jnothman left a comment

Choose a reason for hiding this comment

PavelStishenko Feb 18, 2020 •

edited

rth commented Apr 19, 2020 •

edited