[aoti-et] Add cuda delegate runtime code #14827

larryliu0820 · 2025-10-06T21:07:30Z

This pull request introduces comprehensive support for the CUDA backend in ExecuTorch, enabling model export, build, and runtime execution with CUDA acceleration. It adds new CMake build logic, implements the CUDA backend runtime, updates workflow automation for CUDA model testing, and improves type and error handling for CUDA-specific operations.

CUDA Backend Integration

Added new CUDA backend build logic to CMakeLists.txt, including registration of the aoti_cuda backend and dependencies on common AOTI and CUDA-specific sources. (CMakeLists.txt, [1]; backends/cuda/CMakeLists.txt, [2]
Implemented the CudaBackend runtime in cuda_backend.cpp, handling dynamic loading of model containers, GPU tensor management, and execution flow for CUDA kernels. (backends/cuda/runtime/cuda_backend.cpp, backends/cuda/runtime/cuda_backend.cppR1-R383)

Workflow and Testing Automation

Updated and renamed the CUDA workflow file to add a matrix job for CUDA model testing, running tests for multiple models on GPU hardware. (.github/workflows/cuda.yml, .github/workflows/cuda.ymlR64-R87)
Enhanced the CI test script to support CUDA backend selection, model export, and execution, including artifact preparation. (.ci/scripts/test_model.sh, [1] [2] [3]

Type and Error Handling Improvements

Extended supported data types for the CUDA backend, adding INT64 and updating error messages for unsupported dtypes. (backends/cuda/runtime/shims/utils.h, [1] [2] [3]
Added new type aliases and fields for CUDA delegate and tensor handles to support runtime operations. (backends/aoti/aoti_model_container.h, [1] [2]

Miscellaneous

Improved include paths for the AOTI common library to ensure proper header resolution. (backends/aoti/CMakeLists.txt, backends/aoti/CMakeLists.txtL33-R35)
Added copyright and documentation to the CUDA export scripts. (examples/cuda/scripts/__init__.py, examples/cuda/scripts/init.pyR1-R7)

pytorch-bot · 2025-10-06T21:07:33Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14827

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Driver update on H100 and A100 instances

❌ 1 New Failure, 120 Pending

As of commit a9bb409 with merge base d8e07bd ():

NEW FAILURE - The following job has failed:

trunk / test-qnn-optimum-model (fp32, albert) / linux-job (gh)
RuntimeError: Command docker exec -t b30050442a2422207253adbcb7f891481daa31a4fcba8244965e486e3ddd5313 /exec failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

examples/cuda/scripts/export.py

backends/cuda/runtime/cuda_backend.cpp

digantdesai · 2025-10-06T22:19:01Z

backends/cuda/runtime/cuda_backend.cpp

@@ -0,0 +1,374 @@
+/*


Why not keep bulk of code under backend/aoti and keep only cuda specific runtime AOTI bits here? Rationale is code dedup across all aoti backends.

Yeah good point

digantdesai · 2025-10-06T22:20:53Z

backends/aoti/aoti_model_container.h


 } // extern "C"

 // AOTI Delegate Handle structure


nit if this backend can't be instantiated directly then perhaps s/aoti/_aoti?

Can you say more?

digantdesai · 2025-10-06T22:21:35Z

examples/cuda/scripts/export.py

+    exec_program = delegated_program.to_executorch()
+    save_pte_program(exec_program, args.model_name, args.output_dir)
+    if args.generate_etrecord:
+        exec_program.get_etrecord().save(f"{args.model_name}_cuda_etrecord.bin")


can we do etdump on aoti runtime?

Probably. Still trying to figure out how to do etdump for aoti, will probably defer to @Gasoonjia

we can definitely do something on the et side (e.g. making every delegate call as a blackbox), but need sometime to make it support it inside the delegate

.github/workflows/cuda.yml

src/executorch/examples/cuda

examples/cuda/scripts/export.py

backends/cuda/CMakeLists.txt

Gasoonjia · 2025-10-07T16:33:33Z

backends/aoti/aoti_model_container.h

 extern "C" {

 // Type definitions
+using AOTITensorHandle = Tensor*;


I think we can directly using Tensor*; in the other places we've removed the alias.

examples/cuda/scripts/export.py

torch_pin.py

examples/cuda/scripts/export.py

backends/cuda/runtime/cuda_backend.cpp

examples/cuda/scripts/export.py

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025

mergennachin reviewed Oct 6, 2025

View reviewed changes

examples/cuda/scripts/export.py Outdated Show resolved Hide resolved

mergennachin reviewed Oct 6, 2025

View reviewed changes

backends/cuda/runtime/cuda_backend.cpp Show resolved Hide resolved

digantdesai reviewed Oct 6, 2025

View reviewed changes

mergennachin requested a review from manuelcandales October 6, 2025 22:20

digantdesai reviewed Oct 6, 2025

View reviewed changes

Gasoonjia reviewed Oct 6, 2025

View reviewed changes

.github/workflows/cuda.yml Outdated Show resolved Hide resolved

larryliu0820 marked this pull request as ready for review October 7, 2025 05:33

Gasoonjia approved these changes Oct 7, 2025

View reviewed changes

larryliu0820 added 2 commits October 7, 2025 08:09

[aoti-et] Add cuda delegate runtime code

13d005f

Add CI

6e58d47

larryliu0820 force-pushed the aoti_backend_cpp branch from f46a3a5 to 6e58d47 Compare October 7, 2025 15:11

larryliu0820 requested a review from kirklandsign as a code owner October 7, 2025 15:11

larryliu0820 added the release notes: desktop for desktop/laptop workstream label Oct 7, 2025

Cleanup

2b10d8a

mergennachin approved these changes Oct 7, 2025

View reviewed changes

Gasoonjia reviewed Oct 7, 2025

View reviewed changes

examples/cuda/scripts/export.py Show resolved Hide resolved

torch_pin.py Show resolved Hide resolved

examples/cuda/scripts/export.py Outdated Show resolved Hide resolved

Gasoonjia reviewed Oct 7, 2025

View reviewed changes

backends/cuda/runtime/cuda_backend.cpp Outdated Show resolved Hide resolved

Gasoonjia reviewed Oct 7, 2025

View reviewed changes

backends/cuda/runtime/cuda_backend.cpp Outdated Show resolved Hide resolved

larryliu0820 added 2 commits October 7, 2025 11:25

Fix CI jobs

480a76a

Add cxx standard

d58e941

mergennachin reviewed Oct 7, 2025

View reviewed changes

examples/cuda/scripts/export.py Outdated Show resolved Hide resolved

Fix broken CI

12460c2

larryliu0820 requested review from jackzhxng and lucylq as code owners October 7, 2025 20:26

Fix mimi

253f2ee

larryliu0820 added 2 commits October 7, 2025 13:34

Remove CoreML

bb32d9b

Fix CI

a9bb409

larryliu0820 merged commit 697078b into main Oct 7, 2025
278 of 279 checks passed

larryliu0820 deleted the aoti_backend_cpp branch October 7, 2025 22:57

[aoti-et] Add cuda delegate runtime code #14827

[aoti-et] Add cuda delegate runtime code #14827

Uh oh!

Conversation

larryliu0820 commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14827

❗ 1 Active SEVs

❌ 1 New Failure, 120 Pending

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

larryliu0820 commented Oct 6, 2025 •

edited

Loading

pytorch-bot bot commented Oct 6, 2025 •

edited

Loading