v0.13.0
What's Changed
- update WebGPU buffer memory info name by @fs-eire in #1957
- Add
enable_profilingin Runtime Options by @xiaofeihan1 in #1949 - Fix uninitialized tools variable and improve exception debug messages by @sheller-ms in #1971
- Add common download to Phi-3 tutorial by @kunal-vaishnavi in #1973
- Add support for InternLM2 model architecture by @amdrajeevp1 in #1958
- Update cmake cuda architecture and use win-arm64 pool workaround by @baijumeswani in #1976
- Update examples after 0.12.0 release by @kunal-vaishnavi in #1980
- Add CI pipeline for WebGPU EP model testing by @qjia7 in #1956
- Fix Python nightly build by @kunal-vaishnavi in #1981
- Add missing Quark 0.11 weight patterns for ChatGLM3 output layer by @poganesh in #1983
- Support Qwen2.5-VL pre-quantized models in qwen.py by @poganesh in #1985
- [VitisAI] external_ep_libray support fix for WinML by @akholodnamdcom in #1984
- Fix guidance bug by @baijumeswani in #1988
- Fix incorrect batch responses when using multiple prompts by @lnigam in #1986
- Enable webgpu graph capture in base.py by @qjia7 in #1991
- Harden CUDA error checking across the codebase by @Copilot in #1994
- allow pruned models for prefill by @fs-eire in #1995
- Fix WinML Packaging Pipeline by @baijumeswani in #1998
- Add small changes after pruning prefill by @kunal-vaishnavi in #2000
- webgpu: Optimize Copyfrom by @qjia7 in #1992
- Add support for CUDA 13 by @baijumeswani in #2001
- add webgpu to qmoe path by @guschmue in #2005
- Fix ERNIE 4.5 model builder: rope_attrs and config architecture name by @xiaoyao9184 in #2007
- Bug fix in Continuous Decoding by @chilukam-qti in #2008
- Update Phi-4 mm README links by @kunal-vaishnavi in #2014
- Add Qwen3-VL model support + multi-image input support in Qwen VL family by @hanbitmyths in #2003
- Add Qwen3.5 model support and optimize multi-image handling by @apsonawane in #2019
- Reuse a single generator via RewindTo(0) in benchmark instead of creating multiple generators by @qjia7 in #2002
- [RyzenAI] WinML compatibility fix by @akholodnamdcom in #2026
- Nemotron ASR Support for Streaming by @nenad1002 in #1997
- [WebGPU] Fix the prefill regression when graph capture is ON by @qjia7 in #2021
- Support 4 inputs for nemotron model by @jiafatom in #2036
- Updated java packaging based on python packaging logic by @EPNW-Eric in #2029
- Fix android packaging pipeline by @baijumeswani in #2039
- Add OpenAI's Whisper to model builder by @kunal-vaishnavi in #2018
- [Java] Add a dependency on onnxruntime (#2030) by @EPNW-Eric in #2040
- Fix mutually exclusive inputs for language models by @kunal-vaishnavi in #2046
- Decouple plugin execution providers (EPs) from the USE_WINML pre-processor macro by @baijumeswani in #2038
- Route pipeline model RunOptions through SetRunOption for proper special key handling by @Copilot in #2044
- Add ort_build_version and ort_build_source parameters to nuget and python packaging pipelines, remove ROCm support by @Copilot in #2049
- Add batched multi-image vision path and window_size config for Qwen VL by @hanbitmyths in #2050
- docs: fix formatting and syntax highlighting in documentation by @riddles-the-one in #2051
- Add Silero VAD Support to Nemotron Streaming ASR by @sayanshaw24 in #2035
- Add Qwen3.5 hybrid decoder export support (GatedDeltaNet + Attention) by @apsonawane in #2043
- Add support for QNN stateful models by @qti-ashimaj in #2012
- Allocate recurrent state via device allocator to enable CUDA graph capture by @apsonawane in #2057
- Speed up CI pipelines by @Copilot in #2052
- Fix tool calling for TRT-RTX models by @kunal-vaishnavi in #2048
- Fix vision pipeline EP hardcoding and pixel_values rank mismatch for Qwen VL models by @apsonawane in #2060
New Contributors
- @sheller-ms made their first contribution in #1971
- @amdrajeevp1 made their first contribution in #1958
- @poganesh made their first contribution in #1983
- @xiaoyao9184 made their first contribution in #2007
- @chilukam-qti made their first contribution in #2008
- @EPNW-Eric made their first contribution in #2029
Full Changelog: v0.12.0...v0.13.0