introduce triton sdpa kernel to cuda backend #15877

Gasoonjia · 2025-11-18T23:58:03Z

Summary:
Introduce Triton SDPA Kernel to CUDA Backend

This diff introduces a kernel-generator (https://github.com/meta-pytorch/KernelAgent) driven, Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition.

Changes

Added a new file sdpa.py to fbcode/executorch/backends/cuda/triton/kernels and fbcode/executorch/backends/cuda/triton/kernels directories, which contains the Triton-optimized SDPA kernel implementation.
Added a new fbcode/executorch/backends/cuda/triton/replacement_pass, which replaces the given existing edge ops with target triton kernels.
Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported.

Purpose

The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs

Reviewed By: larryliu0820

Differential Revision: D87259044

Summary: **Introduce Triton SDPA Kernel to CUDA Backend** This diff introduces a kernel-generator (https://github.com/meta-pytorch/KernelAgent) driven, Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition. **Changes** * Added a new file `sdpa.py` to `fbcode/executorch/backends/cuda/triton/kernels` and `fbcode/executorch/backends/cuda/triton/kernels` directories, which contains the Triton-optimized SDPA kernel implementation. * Added a new `fbcode/executorch/backends/cuda/triton/replacement_pass`, which replaces the given existing edge ops with target triton kernels. * Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported. **Purpose** The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs Reviewed By: larryliu0820 Differential Revision: D87259044

pytorch-bot · 2025-11-18T23:58:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15877

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 35f22b8 with merge base f66b79f ():

NEW FAILURE - The following job has failed:

Test CUDA Builds / test-model-cuda-e2e (openai, whisper-small, non-quantized) / linux-job (gh)
RuntimeError: Command docker exec -t b9d54eeadc73cfde6d041139bc5cddd7c249da199fea94ba7bbe18aad047168b /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / linux / linux-job (gh) (trunk failure)
exir/backend/test/test_lowered_backend_module.py::TestBackendAPI::test_emit_nested_lowered_backend_module
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
exir/backend/test/test_lowered_backend_module.py::TestBackendAPI::test_emit_nested_lowered_backend_module

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-11-18T23:58:12Z

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87259044.

larryliu0820

Review automatically exported from Phabricator review in Meta.

github-actions · 2025-11-18T23:58:54Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2025-11-19T04:18:29Z

@Gasoonjia has imported this pull request. If you are a Meta employee, you can view this in D87259044.

Gasoonjia requested review from jackzhxng and lucylq as code owners November 18, 2025 23:58

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2025

meta-codesync bot added fb-exported meta-exported labels Nov 18, 2025

larryliu0820 approved these changes Nov 18, 2025

View reviewed changes

Gasoonjia and others added 2 commits November 18, 2025 20:11

solve gemma3 export issue

fc87357

Merge branch 'main' into export-D87259044

35f22b8

Gasoonjia merged commit 9952aef into main Nov 19, 2025
171 of 175 checks passed

Gasoonjia deleted the export-D87259044 branch November 19, 2025 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

introduce triton sdpa kernel to cuda backend #15877

introduce triton sdpa kernel to cuda backend #15877

Uh oh!

Gasoonjia commented Nov 18, 2025

Uh oh!

pytorch-bot bot commented Nov 18, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Nov 18, 2025

Uh oh!

larryliu0820 left a comment

Uh oh!

github-actions bot commented Nov 18, 2025

Uh oh!

meta-codesync bot commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

introduce triton sdpa kernel to cuda backend #15877

introduce triton sdpa kernel to cuda backend #15877

Uh oh!

Conversation

Gasoonjia commented Nov 18, 2025

Uh oh!

pytorch-bot bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15877

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

meta-codesync bot commented Nov 18, 2025

Uh oh!

larryliu0820 left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 18, 2025

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Nov 19, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Nov 18, 2025 •

edited

Loading

This PR needs a `release notes:` label