Skip to content

[MRG] Use GEMM in _update_dict #11420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jul 18, 2018

Conversation

jakirkham
Copy link
Contributor

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Avoids a copy in _update_dict and fuses two operations together using the BLAS GEMM operation.

Any other comments?

@jakirkham jakirkham force-pushed the use_gemm__update_dict branch from 23228fe to e48f0f1 Compare July 3, 2018 22:53
This fuses the multiplication and addition together into the same
computation. Also includes the sign change as well. Not to mention that
SciPy linalg routines return Fortran ordered arrays (even if they were
C-ordered originally) unlike NumPy's `dot`. So this avoids a copy as
well.
@jakirkham jakirkham force-pushed the use_gemm__update_dict branch from e48f0f1 to 78c0570 Compare July 3, 2018 22:57
@jakirkham jakirkham changed the title Use GEMM in _update_dict [MRG] Use GEMM in _update_dict Jul 3, 2018
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth showing a benchmark?

ger, = linalg.get_blas_funcs(('ger',), (dictionary, code))
# Residuals, computed with BLAS for speed and efficiency
# R <- -1.0 * U * V^T + 1.0 * Y
R = gemm(-1.0, dictionary, code, 1.0, Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This produces a Fortran array I presume?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worth making it explicit that we expect a fortran array in the comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, the SciPy linalg functions correctly handle C ordered arrays as input.

In [1]: import numpy as np

In [2]: import scipy.linalg as linalg

In [3]: np.random.seed(0)

In [4]: a = np.random.random((2, 3))

In [5]: b = np.random.random((3, 4))

In [6]: c = np.random.random((2, 4))

In [7]: gemm, = linalg.get_blas_funcs(('gemm',), (a, b, c))

In [8]: gemm(1.0, a, b, 1.0, c)
Out[8]: 
array([[1.62736178, 1.79020759, 1.92593582, 2.17344607],
       [1.08121316, 1.54678646, 0.89707181, 1.77876957]])

In [9]: gemm(1.0, a, b, 1.0, c).flags
Out[9]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

That said, we appear to already be forcing dictionary and code to Fortran ordered arrays anywhere _update_dict is called.

@jakirkham
Copy link
Contributor Author

This is the main take away.

In [1]: import numpy as np

In [2]: a = np.random.random((5, 6)).copy(order="F")

In [3]: b = np.random.random((6, 7)).copy(order="F")

In [4]: np.isfortran(np.dot(a, b))
Out[4]: False

Meaning calling np.asfortranarray before was copying the array. This avoids that copy.

@jnothman
Copy link
Member

jnothman commented Jul 4, 2018

Ah, I get it now. LGTM. But I'm no blas expert, so let's get a second opinion.

@jakirkham
Copy link
Contributor Author

Any ideas who might be a good person to ask to look at this?

@jakirkham
Copy link
Contributor Author

Would you be able to review, @ogrisel?

@jakirkham
Copy link
Contributor Author

Friendly nudge 😉

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Out of curiosity could you please run a quick benchmark to evaluate the performance impact of that fix on a typical (smallish) problem of yours?

ger, = linalg.get_blas_funcs(('ger',), (dictionary, code))
# Residuals, computed with BLAS for speed and efficiency
# R <- -1.0 * U * V^T + 1.0 * Y
R = gemm(-1.0, dictionary, code, 1.0, Y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be worth making it explicit that we expect a fortran array in the comment.

@jakirkham
Copy link
Contributor Author

Benchmarking shows the time change is pretty close to negligible, which is expected since the same operations are done in either case. Though memory usage should be reduced as we are dealing only in Fortran arrays now instead of converting as before thus avoiding a copy.

@agramfort
Copy link
Member

@ogrisel I added the comment you suggested.

Merging.

@agramfort agramfort merged commit fc3a6cc into scikit-learn:master Jul 18, 2018
@jakirkham jakirkham deleted the use_gemm__update_dict branch July 18, 2018 14:58
@jakirkham
Copy link
Contributor Author

Thanks @agramfort. Sorry @ogrisel, thought you were concerned about the requirements of input arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants