Skip to content

Qualcomm AI Engine Direct - CDSP Direct Mode#17326

Merged
abhinaykukkadapu merged 6 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/htp_direct_mode
Mar 6, 2026
Merged

Qualcomm AI Engine Direct - CDSP Direct Mode#17326
abhinaykukkadapu merged 6 commits intopytorch:mainfrom
CodeLinaro:dev1/winskuo/htp_direct_mode

Conversation

@winskuo-quic
Copy link
Collaborator

@winskuo-quic winskuo-quic commented Feb 10, 2026

Summary

  • Support CDSP direct mode by defining ExecuTorch's customized rpc protocol.

  • We have validated this PR with the following spec:

    • QNN 2.42.0
    • Hexagon SDK 6.4.02
    • Hexagon Tools Root 19.0.04 (This is tool chain under Hexagon SDK 6.4.02)
    • V79 & v81 device
  • Please refer to README file under backends/qualcomm/runtime/backends/direct_mode/README.md for setup. Please be aware of the Note section if you observe the total execution time is slower the traditional mode.

Example Script

To build:
backends/qualcomm/scripts/build.sh --enable_hexagon

To run traditional mode (Same as usual):
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_adaptive_avg_pool2d --model SM8750 --device $DEVICE --build_folder build-android

To run direct mode (add --direct_build_folder build-hexagon):
python backends/qualcomm/tests/test_qnn_delegate.py -k TestQNNQuantizedOperator.test_qnn_backend_adaptive_avg_pool2d --model SM8750 --device $DEVICE --build_folder build-android --direct_build_folder build-hexagon/

Test plan

add --direct_build_folder build-hexagon/ at end of any TestQNNQuantizedUtils, TestQNNQuantizedModel, TestQNNFloatingPointModel, TestQNNFloatingPointOperator

Author: @haowhsu-quic, @winskuo-quic

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 10, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/17326

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 3 Unrelated Failures

As of commit 02bf8db with merge base 96672a4 (image):

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 10, 2026
@github-actions
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@cccclai
Copy link
Contributor

cccclai commented Feb 10, 2026

cc: @sxu @billmguo @mohankumarkumar

@meta-codesync
Copy link
Contributor

meta-codesync bot commented Feb 10, 2026

@cccclai has imported this pull request. If you are a Meta employee, you can view this in D92849263.

@sxu
Copy link
Contributor

sxu commented Feb 10, 2026

Thanks for putting this up. It looks like most of this is setting up the IDL/stub/skel, the backend runtime itself remained pretty much the same? One thing we also need that the backend runtime should not perform any system configuration whatsoever, is the code that manages DMA buffers and sets QNN/HTP perf mode still running in this direct mode? If not could you explain how they are disabled in this PR?

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch 2 times, most recently from 48d08f1 to dafc1bc Compare February 11, 2026 05:11
@winskuo-quic
Copy link
Collaborator Author

Thanks for putting this up. It looks like most of this is setting up the IDL/stub/skel, the backend runtime itself remained pretty much the same? One thing we also need that the backend runtime should not perform any system configuration whatsoever, is the code that manages DMA buffers and sets QNN/HTP perf mode still running in this direct mode? If not could you explain how they are disabled in this PR?

Hi @sxu,
Yes, your understanding is correct. Backend runtime pretty much remains the same. We have slight changes on backend so the code can be successfully compiled with both ndk and hexagon toolchain.

For minimal runtime,
For DMA buffers, are you referring to Shared Buffer Mechanism (Zero-Copy)? If so, we do not support Shared Buffer in this PR, and this will be a follow up PR in future.
For perf mode control, I would like to confirm if you are asking what are the options to control perf mode?
Perf Config is mainly handled in this file: https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.cpp
If above assumption is correct, there are 3 options:

  1. AOT compile option: This is easiest way for quick workaround. You can just modify this so the default behavior won't set performance to burst
    htp_options.performance_mode = QnnExecuTorchHtpPerformanceMode.kHtpBurst
  2. Using runtime option: This requires adding a couple lines of code so qnn_direct_executor_runner(direct mode) supports this feature. Below is an example usage on qnn_executor_runner(traditional mode):
    executorch::runtime::BackendOptions<3> backend_options;
  3. A Global Config API to control perf mode (This will be enabled in future.)

@sxu
Copy link
Contributor

sxu commented Feb 11, 2026

@winskuo-quic What I meant to say is that we don't want the Executorch backend to perform any type of system configuration at all, we have a dedicated system service that makes global decision on HTP power state in conjunction with all other components of the SoC, including CPU, GPU, memory subsystems, etc. taking into account what's the aggregated set of features currently running on device. We can't have individual models put in their own power votes for HTP or enable RPC QoS for example, as that interferes with our settings.

My concern is, given the existing runtime is compiled as is for Hexagon, is it still setting some the default configurations which we don't want.

@sxu
Copy link
Contributor

sxu commented Feb 11, 2026

@winskuo-quic Just to clarify, for the above requirements I'm assuming we are asked to use what's in the OSS runtime exactly as is on our Hexagon deployment (that's the impression I got from QC representatives and PyTorch team). For that we are basically asking for the current OSS runtime (the part that runs on Hexagon) to be stripped down to the bare minimum to only do inference and nothing else.

I personally don't see a strong technical reason to use the runtime exactly has is, since we have very different requirements than average developer deploying to mobile phones. An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need. We already have a prototype and it's just a couple hundreds LOC, all it does is parse the OSS AoT generated binary, call contextCreateFromBinary -> graphRetrieve -> graphExecute x N -> contextFree and nothing else.

cc @cccclai

@JacobSzwejbka
Copy link
Contributor

JacobSzwejbka commented Feb 11, 2026

I personally don't see a strong technical reason to use the runtime exactly as is ....... An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need.

cc @cccclai

Is there not an inherent strong versioning risk? Theres no explicit contract to my knowledge in the AoT flow on the output blob today. In ET design its tightly coupled with the backend intended to crack it open. When you fork the runtime it imposes a contract now that is difficult to maintain in OSS because the 2nd runtime is not visible.

@sxu
Copy link
Contributor

sxu commented Feb 11, 2026

I personally don't see a strong technical reason to use the runtime exactly as is ....... An alternative to this is for us to take the OSS context binary, and just create a Meta internal version of the runtime that satisfies our need.
cc @cccclai

Is there not an inherent strong versioning risk? Theres no explicit contract to my knowledge in the AoT flow on the output blob today. In ET design its tightly coupled with the backend intended to crack it open. When you fork the runtime it imposes a contract now that is difficult to maintain in OSS because the 2nd runtime is not visible.

There is a tradeoff, I just want to bring this up to the QC engineering team tasked to work on this to get their thoughts. The way the runtime is currently structured, I honestly find it pretty complex and the initialization flow hard to trace. If the host and target flow continue to be intermingled with if-else and conditional compilation sprinkled here and there to satisfy our requirements, I worry it will be hard to maintain. I would be much more comfortable if there's a separate bare bone runtime.

Regarding the implicit contract for a fork, I think it would be between the OSS AoT and the fork, it would not be breaking any broader design decision that ET made. The majority of the compatibility is actually handled by QNN itself, we can use the OSS utility (QnnCustomProtocol) to extract the context binary and that should be everything that's needed for inference.

@JacobSzwejbka
Copy link
Contributor

Regarding the implicit contract for a fork, I think it would be between the OSS AoT and the fork

I mostly just dont want to be in a state where changes to AoT OSS cannot merge because they start breaking the fork where the tests and logic are not available in oss. QnnCustomProtocol seems like a fairly stable api surface I just worry that if pressures mount more hacky things will start getting done on the fork.

@winskuo-quic
Copy link
Collaborator Author

@sxu,

Our codebase doesn't contain many HTP specific configs. For perf config, if you would like a short workaround for this PR to disable the feature, this can be done through removing the logic under https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.cpp and https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/htp/HtpDevice.h, so it looks something like the GPU, where we don't provide any config. For GPU reference, see: https://github.com/pytorch/executorch/blob/main/backends/qualcomm/runtime/backends/gpu/GpuDevice.h. Please let me know if you run into any issues—I can provide a patch if needed.

I understand the motivation for a bare-minimum inference runtime. However, creating a minimal runtime specifically for CDSP direct mode would be challenging. The reason is that certain backends require some QNN configs in order to function correctly (e.g., LPAI). In such cases, we would need to build a minimal runtime for each backend, which would be difficult to maintain in the long run.

That said, I agree with your concern about the codebase structure being complex. To address this, we plan to work on file restructuring and CMake refactoring. This should reduce the number of macros required to enable a minimal runtime and improve overall readability of the code.

@cccclai
Copy link
Contributor

cccclai commented Feb 20, 2026

Can you rebase so I can check the CI signal? I feel like we can have this as the first step and iterate on top of @sxu 's feedback.

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from dafc1bc to bf68399 Compare February 23, 2026 02:47
@winskuo-quic
Copy link
Collaborator Author

Can you rebase so I can check the CI signal? I feel like we can have this as the first step and iterate on top of @sxu 's feedback.

Hi @cccclai,
I have rebased the PR.
Thanks

@cccclai
Copy link
Contributor

cccclai commented Feb 24, 2026

internal CI looks quite bad, I feel like it might be related to file dataloader but not sure, for example

executorch/runtime/executor/test:pte_data_map_test -- --exact 'fbcode//executorch/runtime/executor/test:pte_data_map_test - PteDataMapTest.UnimplementedMethods'

Note: Google Test filter = PteDataMapTest.UnimplementedMethods
[==========] Running 1 test from 1 test suite.
[----------] Global test environment set-up.
[----------] 1 test from PteDataMapTest
[ RUN      ] PteDataMapTest.UnimplementedMethods

=================================================================
==9840==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x505000000032 at pc 0x000000398d8a bp 0x7fff42cbb430 sp 0x7fff42cbabe8
WRITE of size 35 at 0x505000000032 thread T0
SCARINESS: 45 (multi-byte-write-heap-buffer-overflow)
    #0 0x000000398d89 in strcpy (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x398d89)
    #1 0x7f5f943c8f3d in executorch::extension::FileDataLoader::from(char const*, unsigned long) fbcode/executorch/extension/data_loader/file_data_loader.cpp:132
    #2 0x0000002f212f in PteDataMapTest::SetUp() fbcode/executorch/runtime/executor/test/pte_data_map_test.cpp:88
    #3 0x7f5f94558e86 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2745
    #4 0x7f5f9455890a in testing::Test::Run() fbsource/src/gtest.cc:2757
    #5 0x7f5f9455c1f3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2908
    #6 0x7f5f94562c8b in testing::TestSuite::Run() fbsource/src/gtest.cc:3086
    #7 0x7f5f9459188c in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6077
    #8 0x7f5f94590382 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2745
    #9 0x7f5f9458fad5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5617
    #10 0x7f5f9444a880 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2341
    #11 0x7f5f9444a5ac in main fbcode/common/gtest/LightMain.cpp:20
    #12 0x7f5f93e2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #13 0x7f5f93e2c717 in [__libc_start_main@GLIBC_2.2.5](mailto:__libc_start_main@GLIBC_2.2.5) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #14 0x0000002f1680 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

0x505000000032 is located 0 bytes after 34-byte region [0x505000000010,0x505000000032)
allocated by thread T0 here:
    #0 0x000000504b22 in operator new(unsigned long, std::align_val_t) (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x504b22)
    #1 0x7f5f943c945c in executorch::extension::(anonymous namespace)::et_aligned_alloc(unsigned long, std::align_val_t) fbcode/executorch/extension/data_loader/file_data_loader.cpp:46
    #2 0x7f5f943c8dfb in executorch::extension::FileDataLoader::from(char const*, unsigned long) fbcode/executorch/extension/data_loader/file_data_loader.cpp:125
    #3 0x0000002f212f in PteDataMapTest::SetUp() fbcode/executorch/runtime/executor/test/pte_data_map_test.cpp:88
    #4 0x7f5f94558e86 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) fbsource/src/gtest.cc:2745
    #5 0x7f5f9455890a in testing::Test::Run() fbsource/src/gtest.cc:2757
    #6 0x7f5f9455c1f3 in testing::TestInfo::Run() fbsource/src/gtest.cc:2908
    #7 0x7f5f94562c8b in testing::TestSuite::Run() fbsource/src/gtest.cc:3086
    #8 0x7f5f9459188c in testing::internal::UnitTestImpl::RunAllTests() fbsource/src/gtest.cc:6077
    #9 0x7f5f94590382 in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) fbsource/src/gtest.cc:2745
    #10 0x7f5f9458fad5 in testing::UnitTest::Run() fbsource/src/gtest.cc:5617
    #11 0x7f5f9444a880 in RUN_ALL_TESTS() fbsource/gtest/gtest.h:2341
    #12 0x7f5f9444a5ac in main fbcode/common/gtest/LightMain.cpp:20
    #13 0x7f5f93e2c656 in __libc_start_call_main /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/nptl/libc_start_call_main.h:58:16
    #14 0x7f5f93e2c717 in [__libc_start_main@GLIBC_2.2.5](mailto:__libc_start_main@GLIBC_2.2.5) /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../csu/libc-start.c:409:3
    #15 0x0000002f1680 in _start /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/csu/../sysdeps/x86_64/start.S:116

SUMMARY: AddressSanitizer: heap-buffer-overflow (/data/sandcastle/boxes/trunk-hg-full-fbsource/buck-out/v2/art/fbcode/3a3a0d59e6906d36/executorch/runtime/executor/test/__pte_data_map_test__/pte_data_map_test+0x398d89) in strcpy
Shadow bytes around the buggy address:
  0x504ffffffd80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504ffffffe00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504ffffffe80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504fffffff00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x504fffffff80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x505000000000: fa fa 00 00 00 00[02]fa fa fa fa fa fa fa fa fa
  0x505000000080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000100: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000180: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x505000000280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==9840==ABORTING

@winskuo-quic
Copy link
Collaborator Author

@cccclai,
Could you help trigger internal CI again?
Added 1 to the file_len so it could include the null terminator.

@cccclai
Copy link
Contributor

cccclai commented Feb 25, 2026

Yeah done. let's wait for the signal

@cccclai
Copy link
Contributor

cccclai commented Feb 26, 2026

Seems better now and this is a different error. Can we apply the same compiler flag in the cmake build?

Remote command returned non-zero exit code 1
Remote action, reproduce with: `frecli cas download-action 572b3092acf1870ff2eae669d19d283ab446edf5cd38ce1fdf8ad99f57f8eee0:148`
Stdout: <empty>
Stderr:
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:23:28: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   23 |     uint64_t buffer_size = QNN_CTX_BIN_ALIGNMENT + binary_size_;
      |                            ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:64:27: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   64 |   uint32_t padding_size = QNN_CTX_BIN_ALIGNMENT - magic_number_proto_size -
      |                           ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:90:10: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   90 |   return QNN_CTX_BIN_ALIGNMENT;

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from d67773d to 35adcc0 Compare March 2, 2026 01:35
@winskuo-quic
Copy link
Collaborator Author

Seems better now and this is a different error. Can we apply the same compiler flag in the cmake build?

Remote command returned non-zero exit code 1
Remote action, reproduce with: `frecli cas download-action 572b3092acf1870ff2eae669d19d283ab446edf5cd38ce1fdf8ad99f57f8eee0:148`
Stdout: <empty>
Stderr:
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:23:28: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   23 |     uint64_t buffer_size = QNN_CTX_BIN_ALIGNMENT + binary_size_;
      |                            ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:64:27: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   64 |   uint32_t padding_size = QNN_CTX_BIN_ALIGNMENT - magic_number_proto_size -
      |                           ^
fbcode/executorch/backends/qualcomm/runtime/backends/QnnCustomProtocol.cpp:90:10: error: use of undeclared identifier 'QNN_CTX_BIN_ALIGNMENT'
   90 |   return QNN_CTX_BIN_ALIGNMENT;

Thanks for sharing the error log.
I believe this error comes from buck build?
I defined a macro inside CMAKELists.txt initially and have now moved to header file.
Please refer to the latest commit for the change.
Thanks

@abhinaykukkadapu
Copy link
Contributor

@winskuo-quic @cccclai i put up a forward fix for the internal CI issues, will let you know once that pass and we can then land the changes.

@abhinaykukkadapu
Copy link
Contributor

@winskuo-quic my fix seem to fix all CI signals, instead of me landing this diff and putting a forward fix, can you apply this patch and rebase. I will then re-import and push.

  diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
  index abcdef1..1234567 100644                                                                                                                                                                                                                 
  --- a/executorch/backends/qualcomm/runtime/targets.bzl
  +++ b/executorch/backends/qualcomm/runtime/targets.bzl                                                                                                                                                                                        
  @@ -47,9 +47,9 @@                                         
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  -                ] + (["backends/gpu/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/aarch64/*.cpp"]) + (
  -                    ["backends/htp/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htp/aarch64/*.cpp"]) + (
  -                    ["backends/ir/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/ir/aarch64/*.cpp"]
  +                ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/target/*.cpp"]) + (
  +                    ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else ["backends/htp/target/*.cpp"]) + (
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
                   ),
                   exclude = ["Logging.cpp"],
               ),

@winskuo-quic winskuo-quic force-pushed the dev1/winskuo/htp_direct_mode branch from 35adcc0 to 02bf8db Compare March 5, 2026 02:24
@winskuo-quic
Copy link
Collaborator Author

@winskuo-quic my fix seem to fix all CI signals, instead of me landing this diff and putting a forward fix, can you apply this patch and rebase. I will then re-import and push.

  diff --git a/executorch/backends/qualcomm/runtime/targets.bzl b/executorch/backends/qualcomm/runtime/targets.bzl
  index abcdef1..1234567 100644                                                                                                                                                                                                                 
  --- a/executorch/backends/qualcomm/runtime/targets.bzl
  +++ b/executorch/backends/qualcomm/runtime/targets.bzl                                                                                                                                                                                        
  @@ -47,9 +47,9 @@                                         
                       "backends/gpu/*.cpp",
                       "backends/htp/*.cpp",
                       "backends/ir/*.cpp",
  -                ] + (["backends/gpu/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/aarch64/*.cpp"]) + (
  -                    ["backends/htp/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/htp/aarch64/*.cpp"]) + (
  -                    ["backends/ir/x86_64/*.cpp"] if include_aot_qnn_lib else ["backends/ir/aarch64/*.cpp"]
  +                ] + (["backends/gpu/host/*.cpp"] if include_aot_qnn_lib else ["backends/gpu/target/*.cpp"]) + (
  +                    ["backends/htp/host/*.cpp"] if include_aot_qnn_lib else ["backends/htp/target/*.cpp"]) + (
  +                    ["backends/ir/host/*.cpp"] if include_aot_qnn_lib else ["backends/ir/target/*.cpp"]
                   ),
                   exclude = ["Logging.cpp"],
               ),

Hi @abhinaykukkadapu,
Thanks for sharing the patch to fix buck build issues.
I have applied the patch and also rebased.
Thanks

@abhinaykukkadapu abhinaykukkadapu merged commit c3a140f into pytorch:main Mar 6, 2026
295 of 303 checks passed
@lucylq lucylq mentioned this pull request Mar 7, 2026
dayanruben pushed a commit to dayanruben/executorch that referenced this pull request Mar 7, 2026
### Summary
failing on trunk after pytorch#17326

### Test plan
ci
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants