Intel® Neural Compressor v2.5 Release

chensuyue released this 26 Mar 10:21

· 87 commits to master since this release

Highlights
Features
Improvement
Productivity
Bug Fixes
External Contributes
Validated Configurations

Highlights

Integrated Weight-Only Quantization algorithm AutoRound and verified on Gaudi2, Intel CPU, NV GPU
Applied SmoothQuant & Weight-Only Quantization algorithms with 15+ popular LLMs for INT8 & INT4 quantization and published the recipes

Features

[Quantization] Integrate Weight-Only Quantization algorithm AutoRound (5c7f33, dfd083, 9a7ddd, cf1de7)
[Quantization] Quantize weight with in-place mode in Weight-Only Quantization (deb1ed)
[Pruning] Enable SNIP on multiple cards using DeepSpeed ZeRO-3 (49ab28)
[Pruning] Support new pruning approach Wanda and DSNOT for PyTorch LLM (7a3671)

Improvement

[Quantization] SmoothQuant code structure refactor (a8d81c)
[Quantization] Optimize the workflow of parsing Keras model (b816d7)
[Quantization] Support static_groups options in GPTQ API (1c426a)
[Quantization] Update TEQ train dataloader (d1e994)
[Quantization] WeightOnlyLinear keeps self.weight after recover (2835bd)
[Quantization] Add version condition for IPEX prepare init (d96e14)
[Quantization] Enhance the ORT node name checking (f1597a)
[Pruning] Stop the tuning process early when enabling smooth quant (844a03)

Productivity

ORT LLM examples support latest optimum version (26b260)
Add coding style docs and recommended VS Code setting (c1f23c)
Adapt transformers 4.37 loading (6133f4)
Upgrade pre-commit checker for black/blacken-docs/ruff (7763ed)
Support CI summary in PR comments (d4bcdd))
Notebook example update to install latest INC & TF, add metric in fit (4239d3)

Bug Fixes

Fix QA IPEX example fp32 input issue (c4de19)
Update Conditions of Getting min-max during TF MatMul Requantize (d07175)
Fix TF saved_model issues (d8e60b)
Fix comparison of module_type and MulLinear (ba3aba)
Fix ORT calibration issue (cd6d24)
Fix ORT example bart export failure (b0dc0d)
Fix TF example accuracy diff during benchmark and quantization (5943ea)
Fix bugs for GPTQ exporting with static_groups (b4e37b)
Fix ORT quant issue caused by tensors having same name (0a20f3)
Fix Neural Solution SQL/CMD injection (14b7b0)
Fix the best qmodel recovery issue (f2d9b7)
Fix logger issue (83bc77)
Store token in protected file (c6f9cc)
Define the default SSL context (b08725)
Fix IPEX stats bug (5af383)
Fix ORT calibration for Dml EP (c58aea)
Fix wrong socket number retrieval for non-english system (5b2a88)
Fix trust remote for llm examples (2f2c9a)

External Contributes

Intel Mac support (21cfeb)
Add PTQ example for PyTorch CV Segment Anything Model (bd5e69)

Validated Configurations

Centos 8.4 & Ubuntu 22.04 & Win 11 & MacOS Ventura 13.5
Python 3.8, 3.9, 3.10, 3.11
TensorFlow 2.13, 2.14, 2.15
ITEX 2.13.0, 2.14.0
PyTorch/IPEX 2.0, 2.1, 2.2
ONNX Runtime 1.15, 1.16, 1.17

Assets 2