You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MobiusBuilder pass for Mobius-backed ONNX export (#2406, #2447, #2472, #2471, by @justinchuby and @xiaoyu-work): Added a new pass (originally MobiusModelBuilder, renamed to MobiusBuilder) that exports ONNX via Mobius, produces loadable ORT GenAI composite packages with caching, and added a CLI option to capture the ONNX graph.
QairtPipeline pass for QCOM devices (#2465, by @qti-kromero): Added a single-pass QAIRT LLM pipeline driven by a YAML recipe that runs model loading, quantization, and compilation end-to-end, replacing the multi-step QairtPreparation→QairtGenAIBuilder workflow.
PyTorch-native K-quant pass (#2479, by @jambayk): Added a KQuant pass implementing ggml-style weight-only K-quant quantization (asymmetric and symmetric, 2/4/8-bit), with Rtn and KQuant now advertising uint2/int2 precisions.
ONNX K-quant quantization pass (#2428, by @jiafatom): Added an OnnxKquantQuantization pass for K-quant quantization of ONNX models.
INT8 embedding quantization surgeries (#2464, by @apsonawane): Added QuantizeEmbeddingInt8 and ShareEmbeddingLmHead graph surgeries for INT8 embedding quantization and shared embedding/LM-head weights.
SimplifiedLayerNormToRMSNorm surgery (#2348, by @unnim-qti): Added a graph surgery to convert SimplifiedLayerNorm nodes to RMSNorm.
LFM2 hybrid model support (#2410, by @ykhrustalev): Added support for LFM2 hybrid models.
ONNX discrepancy check pass (#2478, by @xadupre): Added a pass to measure numerical discrepancies on a test model to help validate conversions and optimizations.
AMD VitisAI SD1.5 support (#2359, by @liujij): Added Stable Diffusion 1.5 support for the VitisAI execution path.
QNN ABI execution provider support (#2434, by @rM-planet): Added Olive changes to support the QNN ABI execution provider.
Whisper recipe integration (#2450, by @kunal-vaishnavi): Added changes to integrate Olive with Whisper recipes.
Speech evaluation metrics (#2444, by @jiafatom): Added WER and RTFx speech evaluation metrics to the Olive evaluator.
Vision evaluation metrics and inference path (#2476, #2488, by @jiafatom): Added vision evaluation metrics (exact_match, relaxed_accuracy, word_sort_ratio) and a vision GenAI inference path for multi-file VLM evaluation.
HY-MT evaluation workflows (#2482, by @hanbitmyths): Added support for HY-MT evaluation workflows.
ORTGenAI backend option for benchmark CLI (#2420, by @GopalakrishnanN): Added a --backend option (auto/ort/ortgenai) to the olive benchmark command for ONNX models while preserving existing defaults.
Chat-template hooks for ORT GenAI LM evaluation (#2462, by @ykhrustalev): Added chat-template hooks to LMEvalORTGenAIEvaluator.
Test CLI path for small random models (#2459, by @Copilot): Added a --test HF CLI path for 2-layer random model configs with olive run and ModelBuilder support.
Improvements
Selective mixed-precision enhancements (#2475, by @jambayk): Added QKV-aware overrides, an AUTO memory mode, and MULTI_GPU dispatch to the selective mixed-precision pass.
Model package CLI alignment (#2495, #2445, by @xiaoyu-work): Aligned the generate-model-package CLI with onnxruntime-genai and updated it to match the latest schema.
ORT GenAI generation comparison in discrepancy check (#2487, by @xadupre): Added an ONNX Runtime GenAI generation comparison in the OnnxDiscrepancyCheck pass.
Vision VQA evaluation alignment (#2499, by @jiafatom): Improved vision VQA evaluation with dynamic choice detection, configurable max_length, and more robust error handling.
Faster ORT GenAI evaluation (#2452, by @justinchuby): Used get_logits() to avoid a massive GPU→CPU logits copy in the ORT GenAI evaluator.
Tie-word embedding surgery update (#2430, by @apsonawane): Updated the tie-word embedding graph surgery.
Deprecate auto-opt command (#2442, by @shaahji): Marked the auto-opt command as deprecated.
Security
Disable trusting remote code by default (#2413, by @shaahji): Stopped implicitly trusting remote code so it is no longer executed unless explicitly enabled.
Bug Fixes
Fix optimize CLI EP and device (#2418, by @jambayk): Fixed the optimize CLI to correctly set the system execution provider and device.
Fix MTEBEvaluator embedding evaluation (#2415, by @natke): Fixed device mapping, padding-free GenAI inference, last-token pooling, and L2 normalization, closing the score gap between HF and GenAI evaluation.