Speed up maxout by exploiting parallelism better #579

danieldk · 2022-01-25T14:07:33Z

I was profiling the parser refactor and noticed that the maxout kernel is taking more time than I expected:

The maximum amount of parallelism by the kernel is determined by its batch size. However, this leaves the GPU underused. E.g. by default our amount of parallelism is nr_blocks: 128 * nr_threads_per_block: 128 = 16384, which is much larger than the typical batch size.

This PR changes the maxout kernel to parallelize at the output level (each thread computes one output). This makes the maxout kernel about 4.7 times faster with a batch size of 1024 on a RTX 2060 Super:

I haven't updated the backprop variant yet, since it barely appears in the profiles.

honnibal · 2022-01-25T14:15:45Z

Awesome!

Use `which` to select the max elements from the original array and check this against the expected maxout output.

danieldk · 2022-01-27T07:33:04Z

Added a test which checks which output as well.

danieldk · 2022-01-27T07:47:47Z

@explosion-bot please test_gpu

explosion-bot · 2022-01-27T07:48:20Z

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-gpu-test-suite/builds/16

danieldk · 2022-01-27T08:06:50Z

The test failure is unrelated, but interesting (inconsistent results in the grad scaler test).

danieldk · 2022-01-28T07:53:59Z

@explosion-bot please test_gpu

explosion-bot · 2022-01-28T07:54:39Z

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-gpu-test-suite/builds/17

danieldk marked this pull request as draft January 25, 2022 14:08

danieldk added 2 commits January 25, 2022 17:32

Speed up maxout by exploiting parallelism better

3b39919

test_ops: check which output of maxout

f6e379e

Use `which` to select the max elements from the original array and check this against the expected maxout output.

danieldk force-pushed the maxout-speedup branch from d022b76 to f6e379e Compare January 25, 2022 16:34

danieldk marked this pull request as ready for review January 25, 2022 16:34

danieldk added performance Speed and memory use feat / ops Backends and maths labels Jan 28, 2022

svlandeg merged commit 31d638c into explosion:master Feb 4, 2022

danieldk deleted the maxout-speedup branch February 4, 2022 12:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up maxout by exploiting parallelism better #579

Speed up maxout by exploiting parallelism better #579

danieldk commented Jan 25, 2022 •

edited

honnibal commented Jan 25, 2022

danieldk commented Jan 27, 2022

danieldk commented Jan 27, 2022

explosion-bot commented Jan 27, 2022 •

edited

danieldk commented Jan 27, 2022

danieldk commented Jan 28, 2022

explosion-bot commented Jan 28, 2022 •

edited

Speed up maxout by exploiting parallelism better #579

Speed up maxout by exploiting parallelism better #579

Conversation

danieldk commented Jan 25, 2022 • edited

honnibal commented Jan 25, 2022

danieldk commented Jan 27, 2022

danieldk commented Jan 27, 2022

explosion-bot commented Jan 27, 2022 • edited

danieldk commented Jan 27, 2022

danieldk commented Jan 28, 2022

explosion-bot commented Jan 28, 2022 • edited

danieldk commented Jan 25, 2022 •

edited

explosion-bot commented Jan 27, 2022 •

edited

explosion-bot commented Jan 28, 2022 •

edited