You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Quark Quantization for ONNX Models (#2236) — New QuarkQuantization pass via olive run with support for int8/uint8/int16/uint16/int32/uint32/bf16/bfp16 and CLE/SmoothQuant/AdaRound/AdaQuant.
Embedding Quantization & RTN Improvements (#2238) — Added QuantEmbedding, a composable Rtn pass, and a unified checkpoint format aligned with MatMulNBits/GatherBlockQuantized (block/shape constraints enforced; AutoGPTQ/AutoAWQ export updated to 2D params).
Word Embedding Tying Surgery (#2240) — TieWordEmbeddings ties input embeddings and lm_head for both unquantized (Gemm) and quantized (MatMulNBits + GatherBlockQuantized) graphs.
Custom ONNX Model Naming (#2235) — Allows specifying a custom ONNX model name in the output directory.
Intel OpenVINO Weight Compression Pass (#2180) — Adds NNCF-based weight compression for HF/ONNX models to OpenVINO or compressed ONNX.
Improvements
AIMET Enhancements (#2158, #2187, #2215) — Adds Sequential MSE, enables AIMET in quantize CLI, and supports manual precision overrides.