Release v0.12.0 · microsoft/onnxruntime-genai

What's Changed

Update versions after making 0.11.0 branch by @kunal-vaishnavi in #1867
Fix guidance usage in continuous decoding by @kunal-vaishnavi in #1870
Fix HelloPhi C# example by @kunal-vaishnavi in #1871
Fix regex by @apsonawane in #1875
Update extensions commit by @apsonawane in #1874
Revert removal of eps_without_if_support by @xiaofeihan1 in #1878
Fix condition for NPU by @apsonawane in #1880
Model builder refactoring by @tianleiwu in #1862
Add lintrunner to format code by @tianleiwu in #1884
Remove empty submodule leftover. by @xkszltl in #1883
Fix build for lack of RTLD_DI_ORIGIN support by @jaeyoonjung in #1888
Enable graph capture for webgpu by @qjia7 in #1848
Generic shared emb_tokens/lm_head implementation by @jixiongdeng in #1885
Fix bug in Squeeze for getting the value of total_seq_len by @Honry in #1886
Extra_options disable_qkv_fusion to untie qkv_projs from upstream choice by @jixiongdeng in #1893
Fix mac pipeline by @apsonawane in #1904
whisper: Support a variant of the whisper pipeline where encoder / decoder are stateful. by @RyanMetcalfeInt8 in #1857
Add model builder for Qwen2_5_VLTextModel by @tianleiwu in #1882
Integrate FARA-7B model by @apsonawane in #1902
Fix gpt-oss model export by @apsonawane in #1861
OpenVINO: Add support for model caching via 'cache_dir' provider option by @RyanMetcalfeInt8 in #1900
WinML - Remove the inclusive Microsoft.WindowsAppSDK.ML range check by @chrisdMSFT in #1907
Run the model in text mode by @apsonawane in #1908
Update extensions commit by @apsonawane in #1914
Fix gpt-oss export by @apsonawane in #1915
Support Olive new uint8 quantization format by @xiaoyu-work in #1916
Disable CUDA graph for Phi LongRoPE models with IF nodes on TRT-RTX by @anujj in #1921
Add support for CUDA and CPU arch for Qwen-2.5-VL and Fara-7B by @apsonawane in #1919
Add Gemma-3 vision tutorial to ONNX Runtime GenAI by @kunal-vaishnavi in #1793
Quark GPT-OSS support by @thpereir in #1903
Fix sliding window alignment regression in QNN models by @apsonawane in #1938
AMD RyzenAI EP Support by @akholodnamdcom in #1935
Update README by @natke in #1934
[RyzenAI] Non-pruned models backward compatibility by @akholodnamdcom in #1942
[VitisAI] EP loader by @akholodnamdcom in #1918
Set default top_k and top_p if it is None by @xiaoyu-work in #1944
Ensure dlls are signed in the c and nuget packages. by @baijumeswani in #1947
Bump torch from 2.7.1 to 2.7.1+cpu in /test/python/directml/torch by @dependabot[bot] in #1868
Add linker flags for 16 KB page size on Android by @sheetalarkadam in #1860
Only manually load DLLs if onnxruntime.dll is not already loaded. by @chemwolf6922 in #1800
Add a doc showing how to run GPT OSS 20B with WebGPU by @natke in #1945
Add C#, Java, and Objective-C APIs for Config by @kunal-vaishnavi in #1946
Fix GatherBlockQuantized node to support symmetric quantized LM_HEAD by @sushraja-msft in #1951
Fix QMoE blockwise quantization support for TRT-RTX execution provider by @anujj in #1926
Revert "Add a doc showing how to run GPT OSS 20B with WebGPU" by @kunal-vaishnavi in #1950
Add custom model path support for unit tests by @mpasumarthi-git in #1917
fix: patch llguidance to remove reference to ring crate by @sanaa-hamel-microsoft in #1948
Implement graph models for EPs by @qjia7 in #1895
Update handling EOS token id detection by @kunal-vaishnavi in #1925
Remove onnxruntime-genai-cuda from the foundry package by @baijumeswani in #1954
Include linux builds in the foundry ort-genai package by @baijumeswani in #1955
Support pre-registered plug-in NvTensorRtRtx execution provider library by @anujj in #1889
[RyzenAI] Linux compatibility fixes by @akholodnamdcom in #1959
Use cuda 12.8 to build ort-genai by @baijumeswani in #1960
Bump protobuf from 5.29.5 to 6.33.5 in /test/python by @dependabot[bot] in #1961
Add RAII wrappers for ORT Model Editor API types by @qjia7 in #1953
Rewrite all examples using standardization by @kunal-vaishnavi in #1939
Add versioning to the onnxruntime-genai-cuda.dll by @baijumeswani in #1965
[Build][Packaging] macOS packaging to skip building x86_64 by @baijumeswani in #1966
Sync packaging changes with ONNX Runtime by @baijumeswani in #1967
Release 0.12.0 cherry-pick PR by @baijumeswani in #1978

New Contributors

@xkszltl made their first contribution in #1883
@jaeyoonjung made their first contribution in #1888
@jixiongdeng made their first contribution in #1885
@Honry made their first contribution in #1886
@thpereir made their first contribution in #1903
@akholodnamdcom made their first contribution in #1935
@sheetalarkadam made their first contribution in #1860
@sanaa-hamel-microsoft made their first contribution in #1948

Full Changelog: v0.11.4...v0.12.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.12.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

Contributors

Uh oh!