-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Use GEMM in _update_dict #11420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Use GEMM in _update_dict #11420
Conversation
23228fe
to
e48f0f1
Compare
This fuses the multiplication and addition together into the same computation. Also includes the sign change as well. Not to mention that SciPy linalg routines return Fortran ordered arrays (even if they were C-ordered originally) unlike NumPy's `dot`. So this avoids a copy as well.
e48f0f1
to
78c0570
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth showing a benchmark?
ger, = linalg.get_blas_funcs(('ger',), (dictionary, code)) | ||
# Residuals, computed with BLAS for speed and efficiency | ||
# R <- -1.0 * U * V^T + 1.0 * Y | ||
R = gemm(-1.0, dictionary, code, 1.0, Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces a Fortran array I presume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth making it explicit that we expect a fortran array in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the SciPy linalg functions correctly handle C ordered arrays as input.
In [1]: import numpy as np
In [2]: import scipy.linalg as linalg
In [3]: np.random.seed(0)
In [4]: a = np.random.random((2, 3))
In [5]: b = np.random.random((3, 4))
In [6]: c = np.random.random((2, 4))
In [7]: gemm, = linalg.get_blas_funcs(('gemm',), (a, b, c))
In [8]: gemm(1.0, a, b, 1.0, c)
Out[8]:
array([[1.62736178, 1.79020759, 1.92593582, 2.17344607],
[1.08121316, 1.54678646, 0.89707181, 1.77876957]])
In [9]: gemm(1.0, a, b, 1.0, c).flags
Out[9]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
That said, we appear to already be forcing dictionary
and code
to Fortran ordered arrays anywhere _update_dict
is called.
This is the main take away. In [1]: import numpy as np
In [2]: a = np.random.random((5, 6)).copy(order="F")
In [3]: b = np.random.random((6, 7)).copy(order="F")
In [4]: np.isfortran(np.dot(a, b))
Out[4]: False Meaning calling |
Ah, I get it now. LGTM. But I'm no blas expert, so let's get a second opinion. |
Any ideas who might be a good person to ask to look at this? |
Would you be able to review, @ogrisel? |
Friendly nudge 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Out of curiosity could you please run a quick benchmark to evaluate the performance impact of that fix on a typical (smallish) problem of yours?
ger, = linalg.get_blas_funcs(('ger',), (dictionary, code)) | ||
# Residuals, computed with BLAS for speed and efficiency | ||
# R <- -1.0 * U * V^T + 1.0 * Y | ||
R = gemm(-1.0, dictionary, code, 1.0, Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth making it explicit that we expect a fortran array in the comment.
Benchmarking shows the time change is pretty close to negligible, which is expected since the same operations are done in either case. Though memory usage should be reduced as we are dealing only in Fortran arrays now instead of converting as before thus avoiding a copy. |
@ogrisel I added the comment you suggested. Merging. |
Thanks @agramfort. Sorry @ogrisel, thought you were concerned about the requirements of input arrays. |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Avoids a copy in
_update_dict
and fuses two operations together using the BLAS GEMM operation.Any other comments?