Olive 0.13.0

New Features

MobiusBuilder pass for Mobius-backed ONNX export (#2406, #2447, #2472, #2471, by @justinchuby and @xiaoyu-work): Added a new pass (originally MobiusModelBuilder, renamed to MobiusBuilder) that exports ONNX via Mobius, produces loadable ORT GenAI composite packages with caching, and added a CLI option to capture the ONNX graph.
QairtPipeline pass for QCOM devices (#2465, by @qti-kromero): Added a single-pass QAIRT LLM pipeline driven by a YAML recipe that runs model loading, quantization, and compilation end-to-end, replacing the multi-step QairtPreparation→QairtGenAIBuilder workflow.
PyTorch-native K-quant pass (#2479, by @jambayk): Added a KQuant pass implementing ggml-style weight-only K-quant quantization (asymmetric and symmetric, 2/4/8-bit), with Rtn and KQuant now advertising uint2/int2 precisions.
ONNX K-quant quantization pass (#2428, by @jiafatom): Added an OnnxKquantQuantization pass for K-quant quantization of ONNX models.
INT8 embedding quantization surgeries (#2464, by @apsonawane): Added QuantizeEmbeddingInt8 and ShareEmbeddingLmHead graph surgeries for INT8 embedding quantization and shared embedding/LM-head weights.
SimplifiedLayerNormToRMSNorm surgery (#2348, by @unnim-qti): Added a graph surgery to convert SimplifiedLayerNorm nodes to RMSNorm.
LFM2 hybrid model support (#2410, by @ykhrustalev): Added support for LFM2 hybrid models.
ONNX discrepancy check pass (#2478, by @xadupre): Added a pass to measure numerical discrepancies on a test model to help validate conversions and optimizations.
AMD VitisAI SD1.5 support (#2359, by @liujij): Added Stable Diffusion 1.5 support for the VitisAI execution path.
QNN ABI execution provider support (#2434, by @rM-planet): Added Olive changes to support the QNN ABI execution provider.
Whisper recipe integration (#2450, by @kunal-vaishnavi): Added changes to integrate Olive with Whisper recipes.
Speech evaluation metrics (#2444, by @jiafatom): Added WER and RTFx speech evaluation metrics to the Olive evaluator.
Vision evaluation metrics and inference path (#2476, #2488, by @jiafatom): Added vision evaluation metrics (exact_match, relaxed_accuracy, word_sort_ratio) and a vision GenAI inference path for multi-file VLM evaluation.
HY-MT evaluation workflows (#2482, by @hanbitmyths): Added support for HY-MT evaluation workflows.
ORTGenAI backend option for benchmark CLI (#2420, by @GopalakrishnanN): Added a --backend option (auto/ort/ortgenai) to the olive benchmark command for ONNX models while preserving existing defaults.
Chat-template hooks for ORT GenAI LM evaluation (#2462, by @ykhrustalev): Added chat-template hooks to LMEvalORTGenAIEvaluator.
Test CLI path for small random models (#2459, by @Copilot): Added a --test HF CLI path for 2-layer random model configs with olive run and ModelBuilder support.

Improvements

Selective mixed-precision enhancements (#2475, by @jambayk): Added QKV-aware overrides, an AUTO memory mode, and MULTI_GPU dispatch to the selective mixed-precision pass.
Model package CLI alignment (#2495, #2445, by @xiaoyu-work): Aligned the generate-model-package CLI with onnxruntime-genai and updated it to match the latest schema.
ORT GenAI generation comparison in discrepancy check (#2487, by @xadupre): Added an ONNX Runtime GenAI generation comparison in the OnnxDiscrepancyCheck pass.
Vision VQA evaluation alignment (#2499, by @jiafatom): Improved vision VQA evaluation with dynamic choice detection, configurable max_length, and more robust error handling.
Faster ORT GenAI evaluation (#2452, by @justinchuby): Used get_logits() to avoid a massive GPU→CPU logits copy in the ORT GenAI evaluator.
Tie-word embedding surgery update (#2430, by @apsonawane): Updated the tie-word embedding graph surgery.
Deprecate auto-opt command (#2442, by @shaahji): Marked the auto-opt command as deprecated.

Security

Disable trusting remote code by default (#2413, by @shaahji): Stopped implicitly trusting remote code so it is no longer executed unless explicitly enabled.

Bug Fixes

Fix optimize CLI EP and device (#2418, by @jambayk): Fixed the optimize CLI to correctly set the system execution provider and device.
Fix MTEBEvaluator embedding evaluation (#2415, by @natke): Fixed device mapping, padding-free GenAI inference, last-token pooling, and L2 normalization, closing the score gap between HF and GenAI evaluation.
Fix node output issues (#2497, by @apsonawane): Fixed node output handling issues.
Fix input validation and multiple-choice handling (#2501, by @apsonawane): Fixed input validation issues and updated multiple-choice options handling.
Handle list eos_token_id in ORT GenAI evaluator (#2449, by @justinchuby): Fixed handling of a list-valued eos_token_id in the ORT GenAI evaluator.
Fix typos and bugs (#2438, by @xiaoyu-work): Fixed assorted typos and bugs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Olive-ai 0.13.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Olive 0.13.0

New Features

Improvements

Security

Bug Fixes

Contributors

Uh oh!