Skip to content

Conversation

@Gasoonjia
Copy link
Contributor

Summary:
Introduce Triton SDPA Kernel to CUDA Backend

This diff introduces a kernel-generator (https://github.com/meta-pytorch/KernelAgent) driven, Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition.

Changes

  • Added a new file sdpa.py to fbcode/executorch/backends/cuda/triton/kernels and fbcode/executorch/backends/cuda/triton/kernels directories, which contains the Triton-optimized SDPA kernel implementation.
  • Added a new fbcode/executorch/backends/cuda/triton/replacement_pass, which replaces the given existing edge ops with target triton kernels.
  • Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported.

Purpose

The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs

Reviewed By: larryliu0820

Differential Revision: D87259044

Summary:
**Introduce Triton SDPA Kernel to CUDA Backend**

This diff introduces a kernel-generator (https://github.com/meta-pytorch/KernelAgent) driven, Triton-optimized implementation of scaled dot-product attention (SDPA) kernel to the CUDA backend. The new kernel is designed to replace the default Edge SDPA operator during graph transformation to accelerate the model inference and get rid of sdpa decomposition.

**Changes**

* Added a new file `sdpa.py` to `fbcode/executorch/backends/cuda/triton/kernels` and `fbcode/executorch/backends/cuda/triton/kernels` directories, which contains the Triton-optimized SDPA kernel implementation.
* Added a new `fbcode/executorch/backends/cuda/triton/replacement_pass`, which replaces the given existing edge ops with target triton kernels.
* Added tests for sdpa exporting with triton kernel. Without the triton kernel, sdpa model can not be exported.

**Purpose**

The purpose of this diff is to provide a high-performance SDPA kernel for the CUDA backend, which can be used to accelerate attention-based models on NVIDIA GPUs

Reviewed By: larryliu0820

Differential Revision: D87259044
@pytorch-bot
Copy link

pytorch-bot bot commented Nov 18, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/15877

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 35f22b8 with merge base f66b79f (image):

NEW FAILURE - The following job has failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 18, 2025
@meta-codesync
Copy link

meta-codesync bot commented Nov 18, 2025

@Gasoonjia has exported this pull request. If you are a Meta employee, you can view the originating Diff in D87259044.

Copy link
Contributor

@larryliu0820 larryliu0820 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync
Copy link

meta-codesync bot commented Nov 19, 2025

@Gasoonjia has imported this pull request. If you are a Meta employee, you can view this in D87259044.

@Gasoonjia Gasoonjia merged commit 9952aef into main Nov 19, 2025
171 of 175 checks passed
@Gasoonjia Gasoonjia deleted the export-D87259044 branch November 19, 2025 07:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants