Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up maxout by exploiting parallelism better #579

Merged
merged 2 commits into from Feb 4, 2022

Conversation

danieldk
Copy link
Contributor

@danieldk danieldk commented Jan 25, 2022

I was profiling the parser refactor and noticed that the maxout kernel is taking more time than I expected:

Screen Shot 2022-01-25 at 14 45 07

The maximum amount of parallelism by the kernel is determined by its batch size. However, this leaves the GPU underused. E.g. by default our amount of parallelism is nr_blocks: 128 * nr_threads_per_block: 128 = 16384, which is much larger than the typical batch size.

This PR changes the maxout kernel to parallelize at the output level (each thread computes one output). This makes the maxout kernel about 4.7 times faster with a batch size of 1024 on a RTX 2060 Super:

Screen Shot 2022-01-25 at 14 44 55

I haven't updated the backprop variant yet, since it barely appears in the profiles.

@danieldk danieldk marked this pull request as draft January 25, 2022 14:08
@honnibal
Copy link
Member

Awesome!

Use `which` to select the max elements from the original array and check
this against the expected maxout output.
@danieldk danieldk marked this pull request as ready for review January 25, 2022 16:34
@danieldk
Copy link
Contributor Author

Added a test which checks which output as well.

@danieldk
Copy link
Contributor Author

@explosion-bot please test_gpu

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jan 27, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-gpu-test-suite/builds/16

@danieldk
Copy link
Contributor Author

The test failure is unrelated, but interesting (inconsistent results in the grad scaler test).

@danieldk
Copy link
Contributor Author

@explosion-bot please test_gpu

@explosion-bot
Copy link
Collaborator

explosion-bot commented Jan 28, 2022

🪁 Successfully triggered build on Buildkite

URL: https://buildkite.com/explosion-ai/thinc-gpu-test-suite/builds/17

@danieldk danieldk added performance Speed and memory use feat / ops Backends and maths labels Jan 28, 2022
@svlandeg svlandeg merged commit 31d638c into explosion:master Feb 4, 2022
@danieldk danieldk deleted the maxout-speedup branch February 4, 2022 12:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ops Backends and maths performance Speed and memory use
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants