New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
added LinearOperator support in safe_sparse_dot() #16463
base: main
Are you sure you want to change the base?
Conversation
Please, help! I can not see details of the codecov/patch check. It shows 500 server error. |
@@ -148,7 +149,10 @@ def safe_sparse_dot(a, b, dense_output=False): | |||
else: | |||
ret = np.dot(a, b) | |||
else: | |||
ret = a @ b | |||
if isinstance(b, LinearOperator) and not isinstance(a, LinearOperator): | |||
ret = (b.T @ a.T).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this line? a @ b
does not work? Do we actually need to handle this case? I'm OK with the rest of changes, but if we can avoid this special case it would be better IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this line?
a @ b
does not work? Do we actually need to handle this case? I'm OK with the rest of changes, but if we can avoid this special case it would be better IMO.
Yes, a @ b
doesn't work if b
is a LinearOperator
and a
is not. Because the first argument of the @
operator defines actual function that do the job. LinearOperator
can be multiplied by anything, but ndarray
or sparse matrices know nothing about LinearOperator
and do not know how to proceed.
import numpy as np
from scipy.sparse.linalg import aslinearoperator
A = np.random.random((4,4))
op = aslinearoperator(A)
print (op @ A) # works fine
print (A @ op) # ValueError: matmul: Input operand 1 does not have enough dimensions (has 0, gufunc core with signature (n?,k),(k,m?)->(n?,m?) requires 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so this is necessary for randomized_svd
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And so this is necessary for
randomized_svd
?
Definitely. LinearOperator can incapsulate anything that behave like a matrix via matvec() implementation. Obvious examples are SVD decomposition, einsum() routine, any tensor network with two free legs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return b@a
then, it should work since here b
and a
are 1d vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return
b@a
then, it should work since hereb
anda
are 1d vectors?
They can be 2D.
Any help with codecov/patch please. It still shows 500-th server error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test, testing the behaviour of safe_sparse_dot(array, LinearOperator)
and safe_sparse_dot(LinearOperator, array)
to sklearn/utils/tests/test_extmath.py
.
That would also fix the coverage error.
Also please add an entry to the change log at doc/whats_new/v0.23.rst
. Like the other entries there, please reference this pull request with :pr:
and credit yourself (and other contributors if applicable) with :user:
.
…_dot(LinearOperator, array) adjusted docstrings and added whats_new entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the added test fails in some CI with,
AttributeError: 'MatrixLinearOperator' object has no attribute '_transpose'
Maybe there is a minimal scipy version required? If so, that is fine, but it should be indicated in the what's new, with a comment in the code and the test skipped using,
pytest.importorskip("docutils", minversion="???")
doc/whats_new/v0.23.rst
Outdated
@@ -341,6 +341,10 @@ Changelog | |||
pandas sparse DataFrame. | |||
:pr:`16021` by :user:`Rushabh Vasani <rushabh-v>`. | |||
|
|||
- |Feature| :func:`utils.extmath.randomized_svd` now accepts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and utils.extmath.safe_dot_product
@@ -341,7 +341,8 @@ Changelog | |||
pandas sparse DataFrame. | |||
:pr:`16021` by :user:`Rushabh Vasani <rushabh-v>`. | |||
|
|||
- |Feature| :func:`utils.extmath.randomized_svd` now accepts | |||
- |Feature| :func:`utils.extmath.randomized_svd` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are still failing. Also please add a test for randomized_svd
on a LinearOperator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. The test added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment otherwise LGTM. Thanks @Mazay0 !
… with naming convention and added comment about motivation behind this class
@glemaitre review please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @Mazay0 . Made a few comments.
@rth I can't say I'm sold on the need to support LinearOperator. The changes seem minimal here, but who knows how they might complicate things in the future? I'm not saying we shouldn't do this, but I think we need to be super careful when introducing support for a new data structure. WDYT?
@@ -148,7 +149,10 @@ def safe_sparse_dot(a, b, dense_output=False): | |||
else: | |||
ret = np.dot(a, b) | |||
else: | |||
ret = a @ b | |||
if isinstance(b, LinearOperator) and not isinstance(a, LinearOperator): | |||
ret = (b.T @ a.T).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not return b@a
then, it should work since here b
and a
are 1d vectors?
@NicolasHug I generally share your feeling. However the changes are indeed minimal and |
LinearOperator is supposed to mimic matrix behavior. Progress in its development goes in this direction, so in future in should be even simpler to support it. It is especially true given that sklearn supports scipy's sparse matrices and anyway shouldn't rely on internal data representation. |
@NicolasHug Would be good to make a decision on this one (in one direction or another). Has your position changed since last review? |
Not really honestly. Maybe other @scikit-learn/core-devs should chime in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would this apply? only when people directly pass a LinearOperator
to these functions? what are the consequences of the changed returned value for the user in places where these functions are used internally?
@@ -131,7 +132,7 @@ def safe_sparse_dot(a, b, dense_output=False): | |||
dot_product : array or sparse matrix | |||
sparse if ``a`` and ``b`` are sparse and ``dense_output=False``. | |||
""" | |||
if a.ndim > 2 or b.ndim > 2: | |||
if len(a.shape) > 2 or len(b.shape) > 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember this potentially being expensive compared to a.ndim
. Is that not a concern here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably it is more expensive, but the difference is negligible compared with subsequent operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I vaguely remember this potentially being expensive compared to
a.ndim
. Is that not a concern here?
scipy folks provisionally agreed to accept PR with added ndim
attribute to the LinearOperator
: scipy/scipy#11908 (comment)
So this row can be reverted to the original state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SciPy folks merged PR with ndim
attribute for LinearOperator
: scipy/scipy#11915 So in future this row can be reverted to ndim
again.
@@ -131,7 +132,7 @@ def safe_sparse_dot(a, b, dense_output=False): | |||
dot_product : array or sparse matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR also changes the output type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed. I will fix the docstring and will add an appropriate test, if you at least in principle agree that this PR can be accepted.
Yes, only if you pass LinearOperator as input.
Right now it would have no consequence internally since it's Several scipy function, particularly for matrix decomposition (svds etc) support LinearOperator as input alongside sparse matrices https://docs.scipy.org/doc/scipy/reference/sparse.linalg.html#matrix-factorizations |
Only when people pass some objects derived from |
Thanks for following up the I checked the I guess the other question is, why is |
We should probably have discussed this at the monthly meeting. Labelling as needs decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless we explicitly document some estimators or equivalent functions as accepting a LinearOperator, I find this support in a utility a bit obscure.
Reference Issues/PRs
Fixes #16461.