Olive-ai 0.10.0

xiaoyu-work released this 05 Nov 19:24

4705557

New Features

Quark Quantization for ONNX Models (#2236) — New QuarkQuantization pass via olive run with support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant.
Embedding Quantization & RTN Improvements (#2238) — Added QuantEmbedding, a composable Rtn pass, and a unified checkpoint format aligned with MatMulNBits/GatherBlockQuantized (block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params).
Word Embedding Tying Surgery (#2240) — TieWordEmbeddings ties input embeddings and lm_head for both unquantized (Gemm) and quantized (MatMulNBits + GatherBlockQuantized) graphs.
Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.

Improvements

AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in quantize CLI, and supports manual precision overrides.
GPTQ Updates (#2202, #2203) — Supports user-provided module overrides and transformers >= 4.53.
Quantization Export Compatibility (#2218) — Updates checks for ort-genai > 0.9.0 and fixes minor OnnxDAG name clashes.
Torch Dynamo Export Alignment (#2185) — extract_adapter recovers folded LoRA and decomposes DORA-fused Gemm to MatMul for quantization.
Post-Surgery Deduplication (#2228) — Runs DeduplicateHashedInitializersPass after surgeries to remove duplicate initializers.
QNN Execution Provider: GPU Enablement (#2220) — Enables QNN-EP GPU, updates StaticLLM and ContextBinaryGeneration, keeps NPU default.
Run API Ergonomics (#2199) — olive.run() now accepts a dict run_config.
OpenVINO Config Overrides (#2191) — Allows overriding genai_config.json properties in OV encapsulation.
ReplaceAttentionMaskValue Robustness (#2213) — Adds Shape to ALLOWED_CONSUMER_OPS for text-encoder graphs.
Implicit Olive Version Tagging (#2183) — Automatically embeds the Olive version in saved ONNX model protos.

Assets 2