[FP8] FP8 for SwishLayerNorm #157574

oniononion36 · 2025-07-03T18:26:44Z

Summary: Add a pass use_triton_fp8_swish_replace_normal_swish to replace _triton_swish_rms_norm with its counterpart that supports fp8 triton_swish_rms_norm, and turn on fp8 during inference.

Test Plan:

buck2 run mode/opt  mode/inplace -c fbcode.platform010_cuda_version=12.4 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR   --model-snapshot-id=899072727_0 --node-replacement-dict="{}" --gpu-trace --add-passes=use_triton_fp8_swish_replace_normal_swish

The perf improvement on the 100x model with this pass is roughly ~7%, details are recorded here

Rollback Plan:

Reviewed By: frank-wei

Differential Revision: D76531303

cc @ezyang @SherlockNoMad @EikanWang @jgong5 @wenzhe-nrv

pytorch-bot · 2025-07-03T18:26:47Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157574

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 153bf89 with merge base 19ae5af ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-03T18:26:54Z

This pull request was exported from Phabricator. Differential Revision: D76531303

frank-wei · 2025-07-03T20:44:40Z

LGTM but let's make sure all tests are passed.

Summary: Add a pass use_triton_fp8_swish_replace_normal_swish to replace _triton_swish_rms_norm with its counterpart that supports fp8 triton_swish_rms_norm, and turn on fp8 during inference. Test Plan: ``` buck2 run mode/opt mode/inplace -c fbcode.platform010_cuda_version=12.4 -c fbcode.nvcc_arch=h100 caffe2/torch/fb/model_transform/experimental/benchmark:mts_gpu_benchmark -- --lower-backend=AOT_INDUCTOR --model-snapshot-id=899072727_0 --node-replacement-dict="{}" --gpu-trace --add-passes=use_triton_fp8_swish_replace_normal_swish ``` The perf improvement on the 100x model with this pass is roughly ~7%, details are recorded [here](https://docs.google.com/document/d/1eIV_OTQyQcf_DlEDxwycTwhyGxT5OJkLzs8cPL6EMYc/edit?tab=t.0) Rollback Plan: Reviewed By: frank-wei Differential Revision: D76531303

facebook-github-bot · 2025-07-03T21:00:44Z

This pull request was exported from Phabricator. Differential Revision: D76531303

facebook-github-bot · 2025-07-04T00:58:54Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-07-04T01:00:43Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorch-bot bot added the release notes: fx release notes category label Jul 3, 2025

facebook-github-bot added the fx label Jul 3, 2025

facebook-github-bot added the fb-exported label Jul 3, 2025

frank-wei approved these changes Jul 3, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 3, 2025

oniononion36 force-pushed the export-D76531303 branch from 545809e to 153bf89 Compare July 3, 2025 21:00

pytorchmergebot added the merging label Jul 4, 2025

pytorchmergebot closed this in c9a5bf0 Jul 4, 2025

pytorchmergebot added Merged and removed merging labels Jul 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FP8] FP8 for SwishLayerNorm #157574

[FP8] FP8 for SwishLayerNorm #157574

Uh oh!

oniononion36 commented Jul 3, 2025 •

edited by pytorch-bot bot

Loading

Uh oh!

pytorch-bot bot commented Jul 3, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 3, 2025

Uh oh!

frank-wei commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 4, 2025

Uh oh!

pytorchmergebot commented Jul 4, 2025

Uh oh!

Uh oh!

[FP8] FP8 for SwishLayerNorm #157574

[FP8] FP8 for SwishLayerNorm #157574

Uh oh!

Conversation

oniononion36 commented Jul 3, 2025 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157574

✅ No Failures

Uh oh!

facebook-github-bot commented Jul 3, 2025

Uh oh!

frank-wei commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 3, 2025

Uh oh!

facebook-github-bot commented Jul 4, 2025

Uh oh!

pytorchmergebot commented Jul 4, 2025

Merge started

Uh oh!

Uh oh!

oniononion36 commented Jul 3, 2025 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jul 3, 2025 •

edited

Loading