Release v2.0.0 · huggingface/optimum-intel

Breaking Changes

Intel Neural Compressor (INC) and Intel Extension for PyTorch (IPEX) integrations have been fully removed (#1687). Both were deprecated in v1.27.0. Users relying on these should stay on v1.27
ONNX dependency was removed from package requirements (#1753)
OpenVINO and NNCF are now installed by default, the [openvino] and [nncf] extras are now deprecated (#1602)

Extended dataset options for calibration: Datasets can now be specified with parameters, e.g. wikitext2:seq_len=128 (#1564)
Default 8-bit quantization configs with configurable dynamic quantization group size (#1570)
NNCF CB4 mode renamed to cb4_f8e4m3 for newer NNCF versions (#1597)
Data-Aware AWQ for Qwen3-30B added to configuration (#1620)
Fix quantized model save path: Immediate save after quantization now writes to the correct path (#1576)
Fix per_layer_inputs value error during quantization (#1714)
Fix calibration data collection (#1778)

Transformers v5 compatibility (#1589)
Hybrid attention models: past_key_values in attention_mask now supported for stateful inference (#1641)
beam_idx connected to Linear Attention Layers (CausalConv1D, SSM, GDN) for correct beam search with recurrent models (#1619)
Fix long-context inference for Phi-3.5 and Phi-4 (#1744)
Fix SpeechT5 dynamic batch inference (#1664)
Fix MoE patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation (#1741)
Improved numpy input handling for model inputs with mixed types (#1646)
Fix task inference for Phi-4-multimodal-instruct (#1610)

Full Changelog: v1.27.0...v2.0.0

Compatible with transformers>=v4.45,<v5.1