Release ONNX Runtime QNN Execution Provider v2.3.0 · onnxruntime/onnxruntime-qnn

ONNX Runtime Compatibility: >= 1.24.1 (compiled with v1.24.4)

QAIRT SDK Compatibility: 2.47.0

pip install onnxruntime==1.24.4
pip install onnxruntime-qnn==2.3.0

Packaging

NuGet — ARM64 (ARM64X) package support added. Previously ARM64-only.
Linux x86_64 Python wheels — New preview wheels for Ubuntu 22.04 (manylinux_2_35_x86_64), Python 3.11–3.14. Requires GLIBC >= 2.35 due to QAIRT library dependencies.
Maven (Android) — New Android ARM64 package. Group ID / Artifact ID: com.qualcomm.qti:onnxruntime-android-qnn.

For instructions on building wheels across different architectures, see the Build Guide.

Package	Windows ARM64	Windows x64	Linux ARM64	Linux x86_64	Android ARM64
Python Wheel	Inference	AOT compilation	Inference	AOT compilation	—
NuGet	Inference	—	—	—	—
ZIP	Inference	—	—	—	—
tgz	—	—	Inference	—	—
Maven	—	—	—	—	Inference

NonZero (#217)
RandomNormalLike (#266)
Identity (#268)
Gelu Pattern 3 — New Erf*0.5 + 0.5 decomposition variant; fixes models previously not fused. (#236)
DynamicQuantizeLinear + MatMulInteger — Fuses DQL → MatMulInteger → Cast → Mul → [Add] into a float QNN MatMul. (#367)
DynamicQuantizeLinear + ConvInteger — Fuses DQL → ConvInteger → Cast → Mul → [Add] into a float QNN Conv2d. (#364)

For the full list of supported operators, see Supported ONNX Operators and for supported fusions, see Supported Operator Fusions.

Added htp_share_resource_optimization and ep.enable_htp_prepare_only provider options. See Configuration Options. (#107, #347)
Added int32 input support for ScatterElements (QAIRT 2.45+). (#247)
GatherND now uses shared index-normalization primitives for consistency with ScatterND/ScatterElements. (#336)
Gemm with beta=0.0 now maps to QNN FullyConnected without bias instead of falling back to CPU. (#375)
RoiAlign now accepts coordinate_transformation_mode=half_pixel and sampling_ratio=0. (#389)
MatMulNBits extended to HTP with 2-bit and 4-bit support (block sizes 32/64).(#288)
Graph verification in tests migrated to the public GetEpGraphAssignmentInfo API. (#346)
Conv now supports block-quantized weights on HTP via the BW_FLOAT_BLOCK kernel, including int2 support. (#429)
Static MSVC runtime linkage enabled for Windows x86_64 builds. (#432)

BatchNormalization: incorrect QNN offset handling for QNN_DATATYPE_UFIXED_POINT_16 scale inputs. (#135)
ThresholdedRelu: stale add → relu → sign → mul pattern replaced with QAIRT-aligned Greater → Select. (#221)
Graph composition failure when offload_graph_io_quantization=1 and a graph input fans out to multiple QDQ pairs. (#295)
Softmax axis ≠ rank-1 falling back to CPU due to missing upstream tensor wrappers at validation time. (#304)
ScatterND/ScatterElements silent CPU fallback for negative or INT_64 indices. (#311, #317)
Build failure on Ubuntu 24.04 / GCC 13 due to false-positive -Wmaybe-uninitialized. (#387)
QNN EP failure on devices where DXCore cannot discover the NPU. (#12)
ORT Core version floor raised to >= 1.24.2, preventing accidental downgrade. (#448)

Full Changelog: rel/ort-qnn-ep/2.2.0...rel-2.3.0

WoS AMD64 — Python 3.11 installer issue — ep.get_library_path() returns the amd64 path instead of arm64ec; manually construct the path to the arm64ec library as a workaround. Ongoing.

This release includes contributions from: