Skip to content

v2.0.0

Latest

Choose a tag to compare

@echarlaix echarlaix released this 10 Jun 12:59
· 2 commits to main since this release

Breaking Changes

  • Intel Neural Compressor (INC) and Intel Extension for PyTorch (IPEX) integrations have been fully removed (#1687). Both were deprecated in v1.27.0. Users relying on these should stay on v1.27
  • ONNX dependency was removed from package requirements (#1753)
  • OpenVINO and NNCF are now installed by default, the [openvino] and [nncf] extras are now deprecated (#1602)

New Model Support

  • Arcee Trinity (AFMoE) (#1569)
  • Qwen3VL (#1551)
  • Qwen3-next (hybrid SSM/attention) (#1523)
  • Qwen3.5, Qwen3.5-MoE, Qwen3.6 (#1689)
  • Gemma 4 (#1688)
  • Eagle3: Speculative decoding draft model support (#1588)
  • LFM2-MoE (#1691)
  • Kokoro TTS (#1653)
  • Qwen3-ASR (#1677)
  • CohereLabs/tiny-aya-base (Command-R family) (#1623)
  • HY-MT1.5-1.8B (#1621)
  • VideoChat (#1637)

Quantization & Compression

  • Extended dataset options for calibration: Datasets can now be specified with parameters, e.g. wikitext2:seq_len=128 (#1564)
  • Default 8-bit quantization configs with configurable dynamic quantization group size (#1570)
  • NNCF CB4 mode renamed to cb4_f8e4m3 for newer NNCF versions (#1597)
  • Data-Aware AWQ for Qwen3-30B added to configuration (#1620)
  • Fix quantized model save path: Immediate save after quantization now writes to the correct path (#1576)
  • Fix per_layer_inputs value error during quantization (#1714)
  • Fix calibration data collection (#1778)

Improvements

  • Transformers v5 compatibility (#1589)
  • Hybrid attention models: past_key_values in attention_mask now supported for stateful inference (#1641)
  • beam_idx connected to Linear Attention Layers (CausalConv1D, SSM, GDN) for correct beam search with recurrent models (#1619)
  • Fix long-context inference for Phi-3.5 and Phi-4 (#1744)
  • Fix SpeechT5 dynamic batch inference (#1664)
  • Fix MoE patching to enable ConvertTiledMoeBlockToGatherMatmuls transformation (#1741)
  • Improved numpy input handling for model inputs with mixed types (#1646)
  • Fix task inference for Phi-4-multimodal-instruct (#1610)

New Contributors

What's Changed

Full Changelog: v1.27.0...v2.0.0

Compatible with transformers>=v4.45,<v5.1