Skip to content

Conversation

larryliu0820
Copy link
Contributor

@larryliu0820 larryliu0820 commented Oct 6, 2025

This pull request introduces comprehensive support for the CUDA backend in ExecuTorch, enabling model export, build, and runtime execution with CUDA acceleration. It adds new CMake build logic, implements the CUDA backend runtime, updates workflow automation for CUDA model testing, and improves type and error handling for CUDA-specific operations.

CUDA Backend Integration

  • Added new CUDA backend build logic to CMakeLists.txt, including registration of the aoti_cuda backend and dependencies on common AOTI and CUDA-specific sources. (CMakeLists.txt, [1]; backends/cuda/CMakeLists.txt, [2]
  • Implemented the CudaBackend runtime in cuda_backend.cpp, handling dynamic loading of model containers, GPU tensor management, and execution flow for CUDA kernels. (backends/cuda/runtime/cuda_backend.cpp, backends/cuda/runtime/cuda_backend.cppR1-R383)

Workflow and Testing Automation

  • Updated and renamed the CUDA workflow file to add a matrix job for CUDA model testing, running tests for multiple models on GPU hardware. (.github/workflows/cuda.yml, .github/workflows/cuda.ymlR64-R87)
  • Enhanced the CI test script to support CUDA backend selection, model export, and execution, including artifact preparation. (.ci/scripts/test_model.sh, [1] [2] [3]

Type and Error Handling Improvements

  • Extended supported data types for the CUDA backend, adding INT64 and updating error messages for unsupported dtypes. (backends/cuda/runtime/shims/utils.h, [1] [2] [3]
  • Added new type aliases and fields for CUDA delegate and tensor handles to support runtime operations. (backends/aoti/aoti_model_container.h, [1] [2]

Miscellaneous

Copy link

pytorch-bot bot commented Oct 6, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14827

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure, 120 Pending

As of commit a9bb409 with merge base d8e07bd (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 6, 2025
@@ -0,0 +1,374 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not keep bulk of code under backend/aoti and keep only cuda specific runtime AOTI bits here? Rationale is code dedup across all aoti backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next PR


} // extern "C"

// AOTI Delegate Handle structure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit if this backend can't be instantiated directly then perhaps s/aoti/_aoti?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you say more?

exec_program = delegated_program.to_executorch()
save_pte_program(exec_program, args.model_name, args.output_dir)
if args.generate_etrecord:
exec_program.get_etrecord().save(f"{args.model_name}_cuda_etrecord.bin")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we do etdump on aoti runtime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably. Still trying to figure out how to do etdump for aoti, will probably defer to @Gasoonjia

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can definitely do something on the et side (e.g. making every delegate call as a blackbox), but need sometime to make it support it inside the delegate

@larryliu0820 larryliu0820 marked this pull request as ready for review October 7, 2025 05:33
@larryliu0820 larryliu0820 added the release notes: desktop for desktop/laptop workstream label Oct 7, 2025
extern "C" {

// Type definitions
using AOTITensorHandle = Tensor*;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can directly using Tensor*; in the other places we've removed the alias.

@larryliu0820 larryliu0820 merged commit 697078b into main Oct 7, 2025
278 of 279 checks passed
@larryliu0820 larryliu0820 deleted the aoti_backend_cpp branch October 7, 2025 22:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: desktop for desktop/laptop workstream

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants