Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try using unroll+clblas GEMM #16

Closed
hughperkins opened this issue Apr 25, 2015 · 0 comments
Closed

Try using unroll+clblas GEMM #16

hughperkins opened this issue Apr 25, 2015 · 0 comments
Assignees

Comments

@hughperkins
Copy link
Owner

Following this article, http://petewarden.com/2015/04/20/why-gemm-is-at-the-heart-of-deep-learning/ http://www.reddit.com/r/MachineLearning/comments/338lfs/why_gemm_is_at_the_heart_of_deep_learning/ , decided should try this, in case gives an easy way to speed up DeepCL for large image sizes.

My verdict? Not useful :-(

Tried on my laptop, and on a K520, and the results were:

  • unroll + matmult on cpu is a bit faster than direct cpu convolution. I suppose this is because memory access patterns a better
  • unroll + clblas was faster again
  • the most naive convolutional opencl kernel, ie not using any type of unroll or gemm, was the fastest

For batchsize=128, inputplanes=32, inputsize=128, numfilters=32, filtersize=5, on a K520 got:

  • convolution + cpu: 318s
  • unrolled + cpu: 218s
  • unrolled +clblas: invalid command queue
  • no unrolling, propagate1: 2s

Matrices are apparently a bit too big for unroll + clblas, so tried using a smaller batchsize:
batchsize=16, inputplanes=32, inputsize=128, numfilters=32, filtersize=5:

  • convolution + cpu: 39s
  • unroll + cpu: 26s
  • unroll + clblas GEMM: 2.2s
  • propagate1: 0.27s

Note that propagate1 is DeepCL's most generic, least optimized kernel. It doesnt use local memory (which is why it's generic, and works on anything really, unless it runs out of gpu global memory). Kernels using local memory are around 3-10 times faster than propagate1.

Overall: current conclusion: unroll + clblas GEMM doesnt seem promising?

=> closing issue.

@hughperkins hughperkins self-assigned this Apr 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant