Intel® Neural Compressor v2.3.2 Release

chensuyue released this 23 Nov 15:30

· 422 commits to master since this release

Features
Bug Fixes

Features

Reduce memory consumption in ONNXRT adaptor (f64833)
Support MatMulFpQ4 for onnxruntime 1.16 (1beb43)
Support MatMulNBits for onnxruntime 1.17 (67a31b)

Bug Fixes

Update ITREX version in ONNXRT WOQ example and fix bugs in hf models (0ca51a)
Update ONNXRT WOQ example into llama-2-7b (7f2063)
Fix ONNXRT WOQ failed with None model_path (cbd0a4)

Validated Configurations

Centos 8.4 & Ubuntu 22.04
Python 3.10
TensorFlow 2.13
ITEX 2.13
PyTorch/IPEX 2.0.1+cpu
ONNX Runtime 1.15.1
MXNet 1.9.1

Assets 2