ONNX Runtime Compatibility: >= 1.24.1 (compiled with v1.24.4)
QAIRT SDK Compatibility: 2.47.0
pip install onnxruntime==1.24.4
pip install onnxruntime-qnn==2.3.0
Packaging
New in 2.3.0
- NuGet — ARM64 (ARM64X) package support added. Previously ARM64-only.
- Linux x86_64 Python wheels — New preview wheels for Ubuntu 22.04 (
manylinux_2_35_x86_64), Python 3.11–3.14. Requires GLIBC >= 2.35 due to QAIRT library dependencies. - Maven (Android) — New Android ARM64 package. Group ID / Artifact ID:
com.qualcomm.qti:onnxruntime-android-qnn.
For instructions on building wheels across different architectures, see the Build Guide.
Platform Support
| Package | Windows ARM64 | Windows x64 | Linux ARM64 | Linux x86_64 | Android ARM64 |
|---|---|---|---|---|---|
| Python Wheel | Inference | AOT compilation | Inference | AOT compilation | — |
| NuGet | Inference | — | — | — | — |
| ZIP | Inference | — | — | — | — |
| tgz | — | — | Inference | — | — |
| Maven | — | — | — | — | Inference |
New Operators and Fusions
- NonZero (#217)
- RandomNormalLike (#266)
- Identity (#268)
- Gelu Pattern 3 — New
Erf*0.5 + 0.5decomposition variant; fixes models previously not fused. (#236) - DynamicQuantizeLinear + MatMulInteger — Fuses
DQL → MatMulInteger → Cast → Mul → [Add]into a float QNN MatMul. (#367) - DynamicQuantizeLinear + ConvInteger — Fuses
DQL → ConvInteger → Cast → Mul → [Add]into a float QNN Conv2d. (#364)
For the full list of supported operators, see Supported ONNX Operators and for supported fusions, see Supported Operator Fusions.
Improvements
- Added
htp_share_resource_optimizationandep.enable_htp_prepare_onlyprovider options. See Configuration Options. (#107, #347) - Added int32 input support for ScatterElements (QAIRT 2.45+). (#247)
- GatherND now uses shared index-normalization primitives for consistency with ScatterND/ScatterElements. (#336)
- Gemm with
beta=0.0now maps to QNN FullyConnected without bias instead of falling back to CPU. (#375) - RoiAlign now accepts
coordinate_transformation_mode=half_pixelandsampling_ratio=0. (#389) - MatMulNBits extended to HTP with 2-bit and 4-bit support (block sizes 32/64).(#288)
- Graph verification in tests migrated to the public
GetEpGraphAssignmentInfoAPI. (#346) - Conv now supports block-quantized weights on HTP via the
BW_FLOAT_BLOCKkernel, including int2 support. (#429) - Static MSVC runtime linkage enabled for Windows x86_64 builds. (#432)
Bug Fixes
- BatchNormalization: incorrect QNN offset handling for
QNN_DATATYPE_UFIXED_POINT_16scale inputs. (#135) - ThresholdedRelu: stale
add → relu → sign → mulpattern replaced with QAIRT-alignedGreater → Select. (#221) - Graph composition failure when
offload_graph_io_quantization=1and a graph input fans out to multiple QDQ pairs. (#295) - Softmax
axis ≠ rank-1falling back to CPU due to missing upstream tensor wrappers at validation time. (#304) - ScatterND/ScatterElements silent CPU fallback for negative or INT_64 indices. (#311, #317)
- Build failure on Ubuntu 24.04 / GCC 13 due to false-positive
-Wmaybe-uninitialized. (#387) - QNN EP failure on devices where DXCore cannot discover the NPU. (#12)
- ORT Core version floor raised to
>= 1.24.2, preventing accidental downgrade. (#448)
Full Changelog: rel/ort-qnn-ep/2.2.0...rel-2.3.0
Known Issues
- WoS AMD64 — Python 3.11 installer issue —
ep.get_library_path()returns theamd64path instead ofarm64ec; manually construct the path to thearm64eclibrary as a workaround. Ongoing.
Contributors
This release includes contributions from:
Ashwath Shankarnarayan, Badri Narayanan, Chun-Chih Teng, Hua-Yu Chou, Hung-Jui Wang, Jaykumar Luhar, Kuan-Yu Lin, Min Fong Hong, Mu-Chien Hsu, Shubham Patel, Tirupathi Reddy T, Xia Han, Yathindra Kota, Yu-Hung Chuang, Yuduo Wu