Improve CUDA softmax performance #4973

apaszke · 2018-01-31T22:33:34Z

Simple fix with a very large perf benefit for smaller sizes. Below are some plots (dim_size = size of the softmaxed dimension, outer_size = batch size, z-axis = ratio of old time to new time). In general, as long as dim_size < 1024 you get at least a 2x speedup with this code, 4x if you fit in 256, and even 12x for sizes around 100 and smaller.

I tried playing with some other potential improvements like replacing the blockReduce function with a shuffle-based one, but it gave mixed results (-20% time in some cases, +20% time in other cases).

Thanks to @nikitakit for reporting #4893 (which is fixed in this PR).

apaszke · 2018-02-01T18:31:09Z

All the build failures are spurious and unrelated to this PR

colesbury

Nice speed-up!

aten/src/THCUNN/SoftMaxCommon.cuh

@@ -57,6 +57,15 @@ void SpatialSoftMax_getLaunchSizes(
  grid = SpatialSoftMax_getGridSize(block, max_active_blocks, outer_size, dim_size, inner_size);
 }

+inline dim3 SoftMax_getBlockSize(int ILP, uint64_t dim_size) {
+  uint64_t block_size = 1;
+  uint64_t max_block_size = std::min(dim_size / ILP, static_cast<uint64_t>(1024));


onnxbot-worker-1 mentioned this pull request Jan 31, 2018

[auto] pytorch-pr-4973 onnxbot/onnx-fb-universe#502

Closed

Improve CUDA softmax performance

80a5fc4

apaszke force-pushed the softmax_speedup branch from db2e612 to 80a5fc4 Compare February 1, 2018 00:18

apaszke closed this Feb 1, 2018

apaszke reopened this Feb 1, 2018

apaszke closed this Feb 1, 2018

apaszke reopened this Feb 1, 2018

apaszke requested a review from colesbury February 1, 2018 16:51

colesbury approved these changes Feb 1, 2018

View reviewed changes

apaszke merged commit 8e22f84 into master Feb 2, 2018

apaszke deleted the softmax_speedup branch February 2, 2018 12:24

soumith added the 0.3.1 label Feb 5, 2018

soumith mentioned this pull request Feb 15, 2018

GPU Softmax over last dimension of 3D tensor is slow #4893

Closed

ezyang added the open source label Jun 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve CUDA softmax performance #4973

Improve CUDA softmax performance #4973

Uh oh!

apaszke commented Jan 31, 2018

Uh oh!

apaszke commented Feb 1, 2018

Uh oh!

colesbury left a comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!

Improve CUDA softmax performance #4973

Improve CUDA softmax performance #4973

Uh oh!

Conversation

apaszke commented Jan 31, 2018

Uh oh!

apaszke commented Feb 1, 2018

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

Uh oh!