[cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel #110203

valentinandrei · 2023-09-28T05:44:08Z

Replaces accessing L1 for gamma_val, c_h and c_loss with a warp shuffles. This is guaranteed to work as the unroll factor is lower than the warp width. On average it brings speedups between 0 and 25%.

We measured using the benchmark described in #107287

Fixes #ISSUE_NUMBER

pytorch-bot · 2023-09-28T05:44:11Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110203

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (6 Unrelated Failures)

As of commit 1e7947b with merge base 1e7947b ():

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot bot added the release notes: cuda release notes category label Sep 28, 2023

valentinandrei changed the title ~~[cuda] Replace smem sync with warp shuffles in layer_norm_grad_input_kernel~~ [cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel Sep 28, 2023

valentinandrei force-pushed the main branch from 1aea999 to f791641 Compare September 29, 2023 04:37

valentinandrei closed this Oct 11, 2023

valentinandrei force-pushed the main branch from 79b94fe to 1e7947b Compare October 11, 2023 05:59

guangyey temporarily deployed to pytorchbot-env October 11, 2023 06:04 — with GitHub Actions Inactive

guangyey had a problem deploying to upload-stats October 11, 2023 06:04 — with GitHub Actions Failure

PaliC temporarily deployed to upload-stats October 11, 2023 06:05 — with GitHub Actions Inactive

pytorch-bot bot temporarily deployed to mergebot October 11, 2023 06:07 Inactive

pytorch-bot bot temporarily deployed to pytorchbot-env October 11, 2023 06:08 Inactive

pytorch-bot bot temporarily deployed to upload-stats October 11, 2023 06:08 Inactive

pytorch-bot bot temporarily deployed to mergebot October 11, 2023 06:10 Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel #110203

[cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel #110203

valentinandrei commented Sep 28, 2023 •

edited

pytorch-bot bot commented Sep 28, 2023 •

edited

[cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel #110203

[cuda] Replace L1 access with warp shuffles in layer_norm_grad_input_kernel #110203

Conversation

valentinandrei commented Sep 28, 2023 • edited

pytorch-bot bot commented Sep 28, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110203

✅ You can merge normally! (6 Unrelated Failures)

valentinandrei commented Sep 28, 2023 •

edited

pytorch-bot bot commented Sep 28, 2023 •

edited