Qualcomm AI Engine Direct - CDSP Direct Mode by winskuo-quic · Pull Request #17326 · pytorch/executorch

winskuo-quic · 2026-02-10T07:13:36Z

Summary

Support CDSP direct mode by defining ExecuTorch's customized rpc protocol.
We have validated this PR with the following spec:
- QNN 2.42.0
- Hexagon SDK 6.4.02
- Hexagon Tools Root 19.0.04 (This is tool chain under Hexagon SDK 6.4.02)
- V79 & v81 device
Please refer to README file under backends/qualcomm/runtime/backends/direct_mode/README.md for setup. Please be aware of the Note section if you observe the total execution time is slower the traditional mode.

Example Script

To build:
backends/qualcomm/scripts/build.sh --enable_hexagon

To run traditional mode (Same as usual):
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_adaptive_avg_pool2d --model SM8750 --device $DEVICE --build_folder build-android

To run direct mode (add --direct_build_folder build-hexagon):
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_adaptive_avg_pool2d --model SM8750 --device $DEVICE --build_folder build-android --direct_build_folder build-hexagon/

Test plan

add --direct_build_folder build-hexagon/ at end of any TestQNNQuantizedUtils, TestQNNQuantizedModel, TestQNNFloatingPointModel, TestQNNFloatingPointOperator

Author: @haowhsu-quic, @winskuo-quic

pytorch-bot · 2026-02-10T07:13:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17326

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 3 Unrelated Failures

As of commit 02bf8db with merge base 96672a4 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
RuntimeError: Command docker exec -t 78589e9c880dee871352fe20a8ab17e37cf3b5f5b3d494b73a6b81aae2649a88 /exec failed with exit code 1
pull / test-samsung-quantmodels-linux / linux-job (gh)
RuntimeError: Command docker exec -t 2a269c37c17f7ac249c4e5545197d2e8ffc41d38159ee16e4b7851eafa47bde8 /exec failed with exit code 1
trunk / test-arm-cortex-m-size-test (bare_metal) / linux-job (gh)
RuntimeError: Command docker exec -t ed7529389b0e50e151b1caa7d35255be1dbee806a21e208821bf479bb96b665e /exec failed with exit code 1
trunk / test-arm-cortex-m-size-test (zephyr-preset) / linux-job (gh)
RuntimeError: Command docker exec -t f64194d0f70d471ed0c34607ce46fc61bf51b25b1b3503d140d0e0d59f5567b1 /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / macos / macos-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
AttributeError: '_OpNamespace' 'mkldnn' object has no attribute '_is_mkldnn_acl_supported'

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-02-10T07:14:28Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

cccclai · 2026-02-10T17:40:01Z

cc: @sxu @billmguo @mohankumarkumar

meta-codesync · 2026-02-10T17:46:15Z

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D92849263.

sxu · 2026-02-10T18:12:49Z

Thanks for putting this up. It looks like most of this is setting up the IDL/stub/skel, the backend runtime itself remained pretty much the same? One thing we also need that the backend runtime should not perform any system configuration whatsoever, is the code that manages DMA buffers and sets QNN/HTP perf mode still running in this direct mode? If not could you explain how they are disabled in this PR?

winskuo-quic · 2026-02-11T05:24:02Z

Thanks for putting this up. It looks like most of this is setting up the IDL/stub/skel, the backend runtime itself remained pretty much the same? One thing we also need that the backend runtime should not perform any system configuration whatsoever, is the code that manages DMA buffers and sets QNN/HTP perf mode still running in this direct mode? If not could you explain how they are disabled in this PR?

Hi @sxu,
Yes, your understanding is correct. Backend runtime pretty much remains the same. We have slight changes on backend so the code can be successfully compiled with both ndk and hexagon toolchain.

For minimal runtime,
For DMA buffers, are you referring to Shared Buffer Mechanism (Zero-Copy)? If so, we do not support Shared Buffer in this PR, and this will be a follow up PR in future.
For perf mode control, I would like to confirm if you are asking what are the options to control perf mode?
Perf Config is mainly handled in this file: https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.cpp
If above assumption is correct, there are 3 options:

AOT compile option: This is easiest way for quick workaround. You can just modify this so the default behavior won't set performance to burst

executorch/backends/qualcomm/utils/utils.py

Line 1022 in aa2f683

htp_options.performance_mode = QnnExecuTorchHtpPerformanceMode.kHtpBurst
Using runtime option: This requires adding a couple lines of code so qnn_direct_executor_runner(direct mode) supports this feature. Below is an example usage on qnn_executor_runner(traditional mode):

executorch/examples/qualcomm/executor_runner/qnn_executor_runner.cpp

Line 182 in aa2f683

executorch::runtime::BackendOptions<3> backend_options;
A Global Config API to control perf mode (This will be enabled in future.)

sxu · 2026-02-11T17:59:28Z

@winskuo-quic What I meant to say is that we don't want the Executorch backend to perform any type of system configuration at all, we have a dedicated system service that makes global decision on HTP power state in conjunction with all other components of the SoC, including CPU, GPU, memory subsystems, etc. taking into account what's the aggregated set of features currently running on device. We can't have individual models put in their own power votes for HTP or enable RPC QoS for example, as that interferes with our settings.

My concern is, given the existing runtime is compiled as is for Hexagon, is it still setting some the default configurations which we don't want.

sxu · 2026-02-11T19:22:27Z

@winskuo-quic Just to clarify, for the above requirements I'm assuming we are asked to use what's in the OSS runtime exactly as is on our Hexagon deployment (that's the impression I got from QC representatives and PyTorch team). For that we are basically asking for the current OSS runtime (the part that runs on Hexagon) to be stripped down to the bare minimum to only do inference and nothing else.

I personally don't see a strong technical reason to use the runtime exactly has is, since we have very different requirements than average developer deploying to mobile phones. An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need. We already have a prototype and it's just a couple hundreds LOC, all it does is parse the OSS AoT generated binary, call contextCreateFromBinary -> graphRetrieve -> graphExecute x N -> contextFree and nothing else.

cc @cccclai

JacobSzwejbka · 2026-02-11T21:39:46Z

I personally don't see a strong technical reason to use the runtime exactly as is ....... An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need.

cc @cccclai

Is there not an inherent strong versioning risk? Theres no explicit contract to my knowledge in the AoT flow on the output blob today. In ET design its tightly coupled with the backend intended to crack it open. When you fork the runtime it imposes a contract now that is difficult to maintain in OSS because the 2nd runtime is not visible.

sxu · 2026-02-11T22:20:37Z

I personally don't see a strong technical reason to use the runtime exactly as is ....... An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need.
cc @cccclai

Is there not an inherent strong versioning risk? Theres no explicit contract to my knowledge in the AoT flow on the output blob today. In ET design its tightly coupled with the backend intended to crack it open. When you fork the runtime it imposes a contract now that is difficult to maintain in OSS because the 2nd runtime is not visible.

There is a tradeoff, I just want to bring this up to the QC engineering team tasked to work on this to get their thoughts. The way the runtime is currently structured, I honestly find it pretty complex and the initialization flow hard to trace. If the host and target flow continue to be intermingled with if-else and conditional compilation sprinkled here and there to satisfy our requirements, I worry it will be hard to maintain. I would be much more comfortable if there's a separate bare bone runtime.

Regarding the implicit contract for a fork, I think it would be between the OSS AoT and the fork, it would not be breaking any broader design decision that ET made. The majority of the compatibility is actually handled by QNN itself, we can use the OSS utility (QnnCustomProtocol) to extract the context binary and that should be everything that's needed for inference.

JacobSzwejbka · 2026-02-11T22:54:10Z

Regarding the implicit contract for a fork, I think it would be between the OSS AoT and the fork

I mostly just dont want to be in a state where changes to AoT OSS cannot merge because they start breaking the fork where the tests and logic are not available in oss. QnnCustomProtocol seems like a fairly stable api surface I just worry that if pressures mount more hacky things will start getting done on the fork.

winskuo-quic · 2026-02-12T05:31:15Z

@sxu,

Our codebase doesn't contain many HTP specific configs. For perf config, if you would like a short workaround for this PR to disable the feature, this can be done through removing the logic under https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.cpp and https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.h, so it looks something like the GPU, where we don't provide any config. For GPU reference, see: https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/gpu/GpuDevice.h. Please let me know if you run into any issues—I can provide a patch if needed.

I understand the motivation for a bare-minimum inference runtime. However, creating a minimal runtime specifically for CDSP direct mode would be challenging. The reason is that certain backends require some QNN configs in order to function correctly (e.g., LPAI). In such cases, we would need to build a minimal runtime for each backend, which would be difficult to maintain in the long run.

That said, I agree with your concern about the codebase structure being complex. To address this, we plan to work on file restructuring and CMake refactoring. This should reduce the number of macros required to enable a minimal runtime and improve overall readability of the code.

cccclai · 2026-02-20T23:06:26Z

Can you rebase so I can check the CI signal? I feel like we can have this as the first step and iterate on top of @sxu 's feedback.

winskuo-quic · 2026-02-23T10:01:35Z

Can you rebase so I can check the CI signal? I feel like we can have this as the first step and iterate on top of @sxu 's feedback.

Hi @cccclai,
I have rebased the PR.
Thanks

cccclai · 2026-02-24T18:53:23Z

internal CI looks quite bad, I feel like it might be related to file dataloader but not sure, for example

executorch/runtime/executor/test:pte_data_map_test -- --exact 'fbcode//executorch/runtime/executor/test:pte_data_map_test - PteDataMapTest.UnimplementedMethods'

Note: Google Test filter = PteDataMapTest.UnimplementedMethods
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from PteDataMapTest
[ RUN      ] PteDataMapTest.UnimplementedMethods

=================================================================
==9840==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x505000000032 at pc 0x000000398d8a bp 0x7fff42cbb430 sp 0x7fff42cbabe8
WRITE of size 35 at 0x505000000032 thread T0
SCARINESS: 45 (multi-byte-write-heap-buffer-overflow)
    #0 0x000000398d89 in strcpy (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x398d89)
    #1 0x7f5f943c8f3d in executorch::extension::FileDataLoader::from(char const*, unsigned long) fbcode/executorch/extension/data_loader/file_data_loader.cpp:132
    #2 0x0000002f212f in PteDataMapTest::SetUp() fbcode/executorch/runtime/executor/test/pte_data_map_test.cpp:88
    #3 0x7f5f94558e86 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2745
    #4 0x7f5f9455890a in testing::Test::Run() fbsource/src/gtest.cc:2757
    #5 0x7f5f9455c1f3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2908
    #6 0x7f5f94562c8b in testing::TestSuite::Run() fbsource/src/gtest.cc:3086
    #7 0x7f5f9459188c in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6077
    #8 0x7f5f94590382 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2745
    #9 0x7f5f9458fad5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5617
    #10 0x7f5f9444a880 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2341
    #11 0x7f5f9444a5ac in main fbcode/common/gtest/LightMain.cpp:20
    #12 0x7f5f93e2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #13 0x7f5f93e2c717 in [__libc_start_main@GLIBC_2.2.5](mailto:__libc_start_main@GLIBC_2.2.5) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #14 0x0000002f1680 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

0x505000000032 is located 0 bytes after 34-byte region [0x505000000010,0x505000000032)
allocated by thread T0 here:
    #0 0x000000504b22 in operator new(unsigned long, std::align_val_t) (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x504b22)
    #1 0x7f5f943c945c in executorch::extension::(anonymous namespace)::et_aligned_alloc(unsigned long, std::align_val_t) fbcode/executorch/extension/data_loader/file_data_loader.cpp:46
    #2 0x7f5f943c8dfb in executorch::extension::FileDataLoader::from(char const*, unsigned long) fbcode/executorch/extension/data_loader/file_data_loader.cpp:125
    #3 0x0000002f212f in PteDataMapTest::SetUp() fbcode/executorch/runtime/executor/test/pte_data_map_test.cpp:88
    #4 0x7f5f94558e86 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2745
    #5 0x7f5f9455890a in testing::Test::Run() fbsource/src/gtest.cc:2757
    #6 0x7f5f9455c1f3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2908
    #7 0x7f5f94562c8b in testing::TestSuite::Run() fbsource/src/gtest.cc:3086
    #8 0x7f5f9459188c in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6077
    #9 0x7f5f94590382 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2745
    #10 0x7f5f9458fad5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5617
    #11 0x7f5f9444a880 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2341
    #12 0x7f5f9444a5ac in main fbcode/common/gtest/LightMain.cpp:20
    #13 0x7f5f93e2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #14 0x7f5f93e2c717 in [__libc_start_main@GLIBC_2.2.5](mailto:__libc_start_main@GLIBC_2.2.5) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #15 0x0000002f1680 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

SUMMARY: AddressSanitizer: heap-buffer-overflow (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x398d89) in strcpy
Shadow bytes around the buggy address:
  0x504ffffffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504ffffffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504ffffffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x505000000000: fa fa 00 00 00 00[02]fa fa fa fa fa fa fa fa fa
  0x505000000080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==9840==ABORTING

winskuo-quic · 2026-02-25T02:42:13Z

@cccclai,
Could you help trigger internal CI again?
Added 1 to the file_len so it could include the null terminator.

cccclai · 2026-02-25T17:39:05Z

Yeah done. let's wait for the signal

cccclai · 2026-02-26T17:08:20Z

Seems better now and this is a different error. Can we apply the same compiler flag in the cmake build?

Remote command returned non-zero exit code 1
Remote action, reproduce with: `frecli cas download-action 572b3092acf1870ff2eae669d19d283ab446edf5cd38ce1fdf8ad99f57f8eee0:148`
Stdout: <empty>
Stderr:
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:23:28: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   23 |     uint64_t buffer_size = QNN_CTX_BIN_ALIGNMENT + binary_size_;
      |                            ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:64:27: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   64 |   uint32_t padding_size = QNN_CTX_BIN_ALIGNMENT - magic_number_proto_size -
      |                           ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:90:10: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   90 |   return QNN_CTX_BIN_ALIGNMENT;

winskuo-quic · 2026-03-02T01:36:43Z

Seems better now and this is a different error. Can we apply the same compiler flag in the cmake build?

Remote command returned non-zero exit code 1
Remote action, reproduce with: `frecli cas download-action 572b3092acf1870ff2eae669d19d283ab446edf5cd38ce1fdf8ad99f57f8eee0:148`
Stdout: <empty>
Stderr:
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:23:28: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   23 |     uint64_t buffer_size = QNN_CTX_BIN_ALIGNMENT + binary_size_;
      |                            ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:64:27: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   64 |   uint32_t padding_size = QNN_CTX_BIN_ALIGNMENT - magic_number_proto_size -
      |                           ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:90:10: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   90 |   return QNN_CTX_BIN_ALIGNMENT;

Thanks for sharing the error log.
I believe this error comes from buck build?
I defined a macro inside CMAKELists.txt initially and have now moved to header file.
Please refer to the latest commit for the change.
Thanks

abhinaykukkadapu · 2026-03-04T21:05:29Z

@winskuo-quic @cccclai i put up a forward fix for the internal CI issues, will let you know once that pass and we can then land the changes.

abhinaykukkadapu · 2026-03-04T23:45:17Z

@winskuo-quic my fix seem to fix all CI signals, instead of me landing this diff and putting a forward fix, can you apply this patch and rebase. I will then re-import and push.

  diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
  index abcdef1..1234567 100644                                                                                                                                                                                                                 
  --- a/executorch/backends/qualcomm/runtime/targets.bzl
  +++ b/executorch/backends/qualcomm/runtime/targets.bzl                                                                                                                                                                                        
  @@ -47,9 +47,9 @@                                         
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  -                ] + (["backends/gpu/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/aarch64/*.cpp"]) + (
  -                    ["backends/htp/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htp/aarch64/*.cpp"]) + (
  -                    ["backends/ir/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/ir/aarch64/*.cpp"]
  +                ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/target/*.cpp"]) + (
  +                    ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else ["backends/htp/target/*.cpp"]) + (
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
                   ),
                   exclude = ["Logging.cpp"],
               ),

winskuo-quic · 2026-03-05T02:31:47Z

@winskuo-quic my fix seem to fix all CI signals, instead of me landing this diff and putting a forward fix, can you apply this patch and rebase. I will then re-import and push.

  diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
  index abcdef1..1234567 100644                                                                                                                                                                                                                 
  --- a/executorch/backends/qualcomm/runtime/targets.bzl
  +++ b/executorch/backends/qualcomm/runtime/targets.bzl                                                                                                                                                                                        
  @@ -47,9 +47,9 @@                                         
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  -                ] + (["backends/gpu/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/aarch64/*.cpp"]) + (
  -                    ["backends/htp/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htp/aarch64/*.cpp"]) + (
  -                    ["backends/ir/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/ir/aarch64/*.cpp"]
  +                ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/target/*.cpp"]) + (
  +                    ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else ["backends/htp/target/*.cpp"]) + (
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
                   ),
                   exclude = ["Logging.cpp"],
               ),

Hi @abhinaykukkadapu,
Thanks for sharing the patch to fix buck build issues.
I have applied the patch and also rebased.
Thanks

### Summary failing on trunk after pytorch#17326 ### Test plan ci

winskuo-quic requested review from JacobSzwejbka, cccclai, kirklandsign, larryliu0820 and lucylq as code owners February 10, 2026 07:13

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026

cccclai requested review from billmguo, limintang, mohankumarkumar and sxu February 10, 2026 17:42

haowhsu-quic referenced this pull request Feb 11, 2026

lpai e2e & minimum inference runtime support

1232bac

winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch 2 times, most recently from 48d08f1 to dafc1bc Compare February 11, 2026 05:11

winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from dafc1bc to bf68399 Compare February 23, 2026 02:47

winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from d67773d to 35adcc0 Compare March 2, 2026 01:35

winskuo-quic added 6 commits March 5, 2026 10:17

Qualcomm AI Engine Direct - CDSP Direct Mode

6145396

Fix Rebase

681f647

Resolve rebase multibackend pushing

eb4ec3d

Fixing Meta internal CI

439a565

define macro in header so not limited to cmake build

90c0597

buck2 patch fix

02bf8db

winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from 35adcc0 to 02bf8db Compare March 5, 2026 02:24

winskuo-quic requested a review from abhinaykukkadapu as a code owner March 5, 2026 02:24

abhinaykukkadapu approved these changes Mar 5, 2026

View reviewed changes

abhinaykukkadapu merged commit c3a140f into pytorch:main Mar 6, 2026
295 of 303 checks passed

oscarandersson8218 mentioned this pull request Mar 6, 2026

Arm backend: Add TOSA PAD dialect op #17940

Open

lucylq mentioned this pull request Mar 7, 2026

bump size test #17985

Merged

dayanruben pushed a commit to dayanruben/executorch that referenced this pull request Mar 7, 2026

bump size test (pytorch#17985)

1b46fb0

### Summary failing on trunk after pytorch#17326 ### Test plan ci

Conversation

winskuo-quic commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Example Script

Test plan

Uh oh!

pytorch-bot bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17326

❌ 4 New Failures, 3 Unrelated Failures

Uh oh!

github-actions bot commented Feb 10, 2026

This PR needs a release notes: label

Uh oh!

cccclai commented Feb 10, 2026

Uh oh!

meta-codesync bot commented Feb 10, 2026

Uh oh!

sxu commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

winskuo-quic commented Feb 11, 2026

Uh oh!

sxu commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sxu commented Feb 11, 2026

Uh oh!

JacobSzwejbka commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sxu commented Feb 11, 2026

Uh oh!

JacobSzwejbka commented Feb 11, 2026

Uh oh!

winskuo-quic commented Feb 12, 2026

Uh oh!

cccclai commented Feb 20, 2026

Uh oh!

winskuo-quic commented Feb 23, 2026

Uh oh!

cccclai commented Feb 24, 2026

Uh oh!

winskuo-quic commented Feb 25, 2026

Uh oh!

cccclai commented Feb 25, 2026

Uh oh!

cccclai commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

winskuo-quic commented Mar 2, 2026

Uh oh!

abhinaykukkadapu commented Mar 4, 2026

Uh oh!

abhinaykukkadapu commented Mar 4, 2026

Uh oh!

winskuo-quic commented Mar 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

winskuo-quic commented Feb 10, 2026 •

edited

Loading

pytorch-bot bot commented Feb 10, 2026 •

edited

Loading

This PR needs a `release notes:` label

sxu commented Feb 10, 2026 •

edited

Loading

sxu commented Feb 11, 2026 •

edited

Loading

JacobSzwejbka commented Feb 11, 2026 •

edited

Loading

cccclai commented Feb 26, 2026 •

edited

Loading