v0.12.0
What's Changed
- Update versions after making 0.11.0 branch by @kunal-vaishnavi in #1867
- Fix guidance usage in continuous decoding by @kunal-vaishnavi in #1870
- Fix HelloPhi C# example by @kunal-vaishnavi in #1871
- Fix regex by @apsonawane in #1875
- Update extensions commit by @apsonawane in #1874
- Revert removal of eps_without_if_support by @xiaofeihan1 in #1878
- Fix condition for NPU by @apsonawane in #1880
- Model builder refactoring by @tianleiwu in #1862
- Add lintrunner to format code by @tianleiwu in #1884
- Remove empty submodule leftover. by @xkszltl in #1883
- Fix build for lack of RTLD_DI_ORIGIN support by @jaeyoonjung in #1888
- Enable graph capture for webgpu by @qjia7 in #1848
- Generic shared emb_tokens/lm_head implementation by @jixiongdeng in #1885
- Fix bug in Squeeze for getting the value of total_seq_len by @Honry in #1886
- Extra_options
disable_qkv_fusionto untie qkv_projs from upstream choice by @jixiongdeng in #1893 - Fix mac pipeline by @apsonawane in #1904
- whisper: Support a variant of the whisper pipeline where encoder / decoder are stateful. by @RyanMetcalfeInt8 in #1857
- Add model builder for Qwen2_5_VLTextModel by @tianleiwu in #1882
- Integrate FARA-7B model by @apsonawane in #1902
- Fix gpt-oss model export by @apsonawane in #1861
- OpenVINO: Add support for model caching via 'cache_dir' provider option by @RyanMetcalfeInt8 in #1900
- WinML - Remove the inclusive Microsoft.WindowsAppSDK.ML range check by @chrisdMSFT in #1907
- Run the model in text mode by @apsonawane in #1908
- Update extensions commit by @apsonawane in #1914
- Fix gpt-oss export by @apsonawane in #1915
- Support Olive new uint8 quantization format by @xiaoyu-work in #1916
- Disable CUDA graph for Phi LongRoPE models with IF nodes on TRT-RTX by @anujj in #1921
- Add support for CUDA and CPU arch for Qwen-2.5-VL and Fara-7B by @apsonawane in #1919
- Add Gemma-3 vision tutorial to ONNX Runtime GenAI by @kunal-vaishnavi in #1793
- Quark GPT-OSS support by @thpereir in #1903
- Fix sliding window alignment regression in QNN models by @apsonawane in #1938
- AMD RyzenAI EP Support by @akholodnamdcom in #1935
- Update README by @natke in #1934
- [RyzenAI] Non-pruned models backward compatibility by @akholodnamdcom in #1942
- [VitisAI] EP loader by @akholodnamdcom in #1918
- Set default top_k and top_p if it is None by @xiaoyu-work in #1944
- Ensure dlls are signed in the c and nuget packages. by @baijumeswani in #1947
- Bump torch from 2.7.1 to 2.7.1+cpu in /test/python/directml/torch by @dependabot[bot] in #1868
- Add linker flags for 16 KB page size on Android by @sheetalarkadam in #1860
- Only manually load DLLs if onnxruntime.dll is not already loaded. by @chemwolf6922 in #1800
- Add a doc showing how to run GPT OSS 20B with WebGPU by @natke in #1945
- Add C#, Java, and Objective-C APIs for Config by @kunal-vaishnavi in #1946
- Fix GatherBlockQuantized node to support symmetric quantized LM_HEAD by @sushraja-msft in #1951
- Fix QMoE blockwise quantization support for TRT-RTX execution provider by @anujj in #1926
- Revert "Add a doc showing how to run GPT OSS 20B with WebGPU" by @kunal-vaishnavi in #1950
- Add custom model path support for unit tests by @mpasumarthi-git in #1917
- fix: patch
llguidanceto remove reference toringcrate by @sanaa-hamel-microsoft in #1948 - Implement graph models for EPs by @qjia7 in #1895
- Update handling EOS token id detection by @kunal-vaishnavi in #1925
- Remove onnxruntime-genai-cuda from the foundry package by @baijumeswani in #1954
- Include linux builds in the foundry ort-genai package by @baijumeswani in #1955
- Support pre-registered plug-in NvTensorRtRtx execution provider library by @anujj in #1889
- [RyzenAI] Linux compatibility fixes by @akholodnamdcom in #1959
- Use cuda 12.8 to build ort-genai by @baijumeswani in #1960
- Bump protobuf from 5.29.5 to 6.33.5 in /test/python by @dependabot[bot] in #1961
- Add RAII wrappers for ORT Model Editor API types by @qjia7 in #1953
- Rewrite all examples using standardization by @kunal-vaishnavi in #1939
- Add versioning to the onnxruntime-genai-cuda.dll by @baijumeswani in #1965
- [Build][Packaging] macOS packaging to skip building x86_64 by @baijumeswani in #1966
- Sync packaging changes with ONNX Runtime by @baijumeswani in #1967
- Release 0.12.0 cherry-pick PR by @baijumeswani in #1978
New Contributors
- @xkszltl made their first contribution in #1883
- @jaeyoonjung made their first contribution in #1888
- @jixiongdeng made their first contribution in #1885
- @Honry made their first contribution in #1886
- @thpereir made their first contribution in #1903
- @akholodnamdcom made their first contribution in #1935
- @sheetalarkadam made their first contribution in #1860
- @sanaa-hamel-microsoft made their first contribution in #1948
Full Changelog: v0.11.4...v0.12.0