Skip to content

Intel® Neural Compressor v2.5 Release

Compare
Choose a tag to compare
@chensuyue chensuyue released this 26 Mar 10:21
· 87 commits to master since this release
24419c9
  • Highlights
  • Features
  • Improvement
  • Productivity
  • Bug Fixes
  • External Contributes
  • Validated Configurations

Highlights

  • Integrated Weight-Only Quantization algorithm AutoRound and verified on Gaudi2, Intel CPU, NV GPU
  • Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the recipes

Features

  • [Quantization] Integrate Weight-Only Quantization algorithm AutoRound (5c7f33, dfd083, 9a7ddd, cf1de7)
  • [Quantization] Quantize weight with in-place mode in Weight-Only Quantization (deb1ed)
  • [Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 (49ab28)
  • [Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM (7a3671)

Improvement

  • [Quantization] SmoothQuant code structure refactor (a8d81c)
  • [Quantization] Optimize the workflow of parsing Keras model (b816d7)
  • [Quantization] Support static_groups options in GPTQ API (1c426a)
  • [Quantization] Update TEQ train dataloader (d1e994)
  • [Quantization] WeightOnlyLinear keeps self.weight after recover (2835bd)
  • [Quantization] Add version condition for IPEX prepare init (d96e14)
  • [Quantization] Enhance the ORT node name checking (f1597a)
  • [Pruning] Stop the tuning process early when enabling smooth quant (844a03)

Productivity

  • ORT LLM examples support latest optimum version (26b260)
  • Add coding style docs and recommended VS Code setting (c1f23c)
  • Adapt transformers 4.37 loading (6133f4)
  • Upgrade pre-commit checker for black/blacken-docs/ruff (7763ed)
  • Support CI summary in PR comments (d4bcdd))
  • Notebook example update to install latest INC & TF, add metric in fit (4239d3)

Bug Fixes

  • Fix QA IPEX example fp32 input issue (c4de19)
  • Update Conditions of Getting min-max during TF MatMul Requantize (d07175)
  • Fix TF saved_model issues (d8e60b)
  • Fix comparison of module_type and MulLinear (ba3aba)
  • Fix ORT calibration issue (cd6d24)
  • Fix ORT example bart export failure (b0dc0d)
  • Fix TF example accuracy diff during benchmark and quantization (5943ea)
  • Fix bugs for GPTQ exporting with static_groups (b4e37b)
  • Fix ORT quant issue caused by tensors having same name (0a20f3)
  • Fix Neural Solution SQL/CMD injection (14b7b0)
  • Fix the best qmodel recovery issue (f2d9b7)
  • Fix logger issue (83bc77)
  • Store token in protected file (c6f9cc)
  • Define the default SSL context (b08725)
  • Fix IPEX stats bug (5af383)
  • Fix ORT calibration for Dml EP (c58aea)
  • Fix wrong socket number retrieval for non-english system (5b2a88)
  • Fix trust remote for llm examples (2f2c9a)

External Contributes

  • Intel Mac support (21cfeb)
  • Add PTQ example for PyTorch CV Segment Anything Model (bd5e69)

Validated Configurations

  • Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
  • Python 3.8, 3.9, 3.10, 3.11
  • TensorFlow 2.13, 2.14, 2.15
  • ITEX 2.13.0, 2.14.0
  • PyTorch/IPEX 2.0, 2.1, 2.2
  • ONNX Runtime 1.15, 1.16, 1.17