Release v0.13.0 · microsoft/onnxruntime-genai

What's Changed

update WebGPU buffer memory info name by @fs-eire in #1957
Add enable_profiling in Runtime Options by @xiaofeihan1 in #1949
Fix uninitialized tools variable and improve exception debug messages by @sheller-ms in #1971
Add common download to Phi-3 tutorial by @kunal-vaishnavi in #1973
Add support for InternLM2 model architecture by @amdrajeevp1 in #1958
Update cmake cuda architecture and use win-arm64 pool workaround by @baijumeswani in #1976
Update examples after 0.12.0 release by @kunal-vaishnavi in #1980
Add CI pipeline for WebGPU EP model testing by @qjia7 in #1956
Fix Python nightly build by @kunal-vaishnavi in #1981
Add missing Quark 0.11 weight patterns for ChatGLM3 output layer by @poganesh in #1983
Support Qwen2.5-VL pre-quantized models in qwen.py by @poganesh in #1985
[VitisAI] external_ep_libray support fix for WinML by @akholodnamdcom in #1984
Fix guidance bug by @baijumeswani in #1988
Fix incorrect batch responses when using multiple prompts by @lnigam in #1986
Enable webgpu graph capture in base.py by @qjia7 in #1991
Harden CUDA error checking across the codebase by @Copilot in #1994
allow pruned models for prefill by @fs-eire in #1995
Fix WinML Packaging Pipeline by @baijumeswani in #1998
Add small changes after pruning prefill by @kunal-vaishnavi in #2000
webgpu: Optimize Copyfrom by @qjia7 in #1992
Add support for CUDA 13 by @baijumeswani in #2001
add webgpu to qmoe path by @guschmue in #2005
Fix ERNIE 4.5 model builder: rope_attrs and config architecture name by @xiaoyao9184 in #2007
Bug fix in Continuous Decoding by @chilukam-qti in #2008
Update Phi-4 mm README links by @kunal-vaishnavi in #2014
Add Qwen3-VL model support + multi-image input support in Qwen VL family by @hanbitmyths in #2003
Add Qwen3.5 model support and optimize multi-image handling by @apsonawane in #2019
Reuse a single generator via RewindTo(0) in benchmark instead of creating multiple generators by @qjia7 in #2002
[RyzenAI] WinML compatibility fix by @akholodnamdcom in #2026
Nemotron ASR Support for Streaming by @nenad1002 in #1997
[WebGPU] Fix the prefill regression when graph capture is ON by @qjia7 in #2021
Support 4 inputs for nemotron model by @jiafatom in #2036
Updated java packaging based on python packaging logic by @EPNW-Eric in #2029
Fix android packaging pipeline by @baijumeswani in #2039
Add OpenAI's Whisper to model builder by @kunal-vaishnavi in #2018
[Java] Add a dependency on onnxruntime (#2030) by @EPNW-Eric in #2040
Fix mutually exclusive inputs for language models by @kunal-vaishnavi in #2046
Decouple plugin execution providers (EPs) from the USE_WINML pre-processor macro by @baijumeswani in #2038
Route pipeline model RunOptions through SetRunOption for proper special key handling by @Copilot in #2044
Add ort_build_version and ort_build_source parameters to nuget and python packaging pipelines, remove ROCm support by @Copilot in #2049
Add batched multi-image vision path and window_size config for Qwen VL by @hanbitmyths in #2050
docs: fix formatting and syntax highlighting in documentation by @riddles-the-one in #2051
Add Silero VAD Support to Nemotron Streaming ASR by @sayanshaw24 in #2035
Add Qwen3.5 hybrid decoder export support (GatedDeltaNet + Attention) by @apsonawane in #2043
Add support for QNN stateful models by @qti-ashimaj in #2012
Allocate recurrent state via device allocator to enable CUDA graph capture by @apsonawane in #2057
Speed up CI pipelines by @Copilot in #2052
Fix tool calling for TRT-RTX models by @kunal-vaishnavi in #2048
Fix vision pipeline EP hardcoding and pixel_values rank mismatch for Qwen VL models by @apsonawane in #2060

New Contributors

@sheller-ms made their first contribution in #1971
@amdrajeevp1 made their first contribution in #1958
@poganesh made their first contribution in #1983
@xiaoyao9184 made their first contribution in #2007
@chilukam-qti made their first contribution in #2008
@EPNW-Eric made their first contribution in #2029

Full Changelog: v0.12.0...v0.13.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.13.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!