New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] Use GEMM in _update_dict #11420

Merged
merged 3 commits into from Jul 18, 2018

Conversation

Projects
None yet
4 participants
@jakirkham
Contributor

jakirkham commented Jul 3, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Avoids a copy in _update_dict and fuses two operations together using the BLAS GEMM operation.

Any other comments?

jakirkham added some commits Jul 3, 2018

Use BLAS GEMM routine to compute residuals
This fuses the multiplication and addition together into the same
computation. Also includes the sign change as well. Not to mention that
SciPy linalg routines return Fortran ordered arrays (even if they were
C-ordered originally) unlike NumPy's `dot`. So this avoids a copy as
well.

@jakirkham jakirkham changed the title from Use GEMM in _update_dict to [MRG] Use GEMM in _update_dict Jul 3, 2018

@jnothman

Is it worth showing a benchmark?

ger, = linalg.get_blas_funcs(('ger',), (dictionary, code))
# Residuals, computed with BLAS for speed and efficiency
# R <- -1.0 * U * V^T + 1.0 * Y
R = gemm(-1.0, dictionary, code, 1.0, Y)

This comment has been minimized.

@jnothman

jnothman Jul 3, 2018

Member

This produces a Fortran array I presume?

This comment has been minimized.

@jakirkham

jakirkham Jul 4, 2018

Contributor

Yep.

This comment has been minimized.

@ogrisel

ogrisel Jul 17, 2018

Member

Maybe it would be worth making it explicit that we expect a fortran array in the comment.

This comment has been minimized.

@jakirkham

jakirkham Jul 17, 2018

Contributor

Interestingly, the SciPy linalg functions correctly handle C ordered arrays as input.

In [1]: import numpy as np

In [2]: import scipy.linalg as linalg

In [3]: np.random.seed(0)

In [4]: a = np.random.random((2, 3))

In [5]: b = np.random.random((3, 4))

In [6]: c = np.random.random((2, 4))

In [7]: gemm, = linalg.get_blas_funcs(('gemm',), (a, b, c))

In [8]: gemm(1.0, a, b, 1.0, c)
Out[8]: 
array([[1.62736178, 1.79020759, 1.92593582, 2.17344607],
       [1.08121316, 1.54678646, 0.89707181, 1.77876957]])

In [9]: gemm(1.0, a, b, 1.0, c).flags
Out[9]: 
  C_CONTIGUOUS : False
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

That said, we appear to already be forcing dictionary and code to Fortran ordered arrays anywhere _update_dict is called.

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 4, 2018

This is the main take away.

In [1]: import numpy as np

In [2]: a = np.random.random((5, 6)).copy(order="F")

In [3]: b = np.random.random((6, 7)).copy(order="F")

In [4]: np.isfortran(np.dot(a, b))
Out[4]: False

Meaning calling np.asfortranarray before was copying the array. This avoids that copy.

@jnothman

This comment has been minimized.

Member

jnothman commented Jul 4, 2018

Ah, I get it now. LGTM. But I'm no blas expert, so let's get a second opinion.

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 10, 2018

Any ideas who might be a good person to ask to look at this?

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 11, 2018

Would you be able to review, @ogrisel?

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 17, 2018

Friendly nudge 😉

@ogrisel

This looks good. Out of curiosity could you please run a quick benchmark to evaluate the performance impact of that fix on a typical (smallish) problem of yours?

ger, = linalg.get_blas_funcs(('ger',), (dictionary, code))
# Residuals, computed with BLAS for speed and efficiency
# R <- -1.0 * U * V^T + 1.0 * Y
R = gemm(-1.0, dictionary, code, 1.0, Y)

This comment has been minimized.

@ogrisel

ogrisel Jul 17, 2018

Member

Maybe it would be worth making it explicit that we expect a fortran array in the comment.

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 18, 2018

Benchmarking shows the time change is pretty close to negligible, which is expected since the same operations are done in either case. Though memory usage should be reduced as we are dealing only in Fortran arrays now instead of converting as before thus avoiding a copy.

@agramfort

This comment has been minimized.

Member

agramfort commented Jul 18, 2018

@ogrisel I added the comment you suggested.

Merging.

@agramfort agramfort merged commit fc3a6cc into scikit-learn:master Jul 18, 2018

0 of 2 checks passed

ci/circleci: python2 Your tests are queued behind your running builds
Details
ci/circleci: python3 Your tests are queued behind your running builds
Details

@jakirkham jakirkham deleted the jakirkham:use_gemm__update_dict branch Jul 18, 2018

@jakirkham

This comment has been minimized.

Contributor

jakirkham commented Jul 18, 2018

Thanks @agramfort. Sorry @ogrisel, thought you were concerned about the requirements of input arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment