Intel Neural Compressor Release 3.8

Latest

Latest

thuang6 released this 02 Jun 01:20

· 29 commits to master since this release

62ffebf

Highlights
Features
Improvements
Validated Hardware
Validated Configurations

Highlights

Introduced new JAX framework experimental support

Features

Support FP8 quantization for Keras/JAX (experimental)
Support FP8 KV cache static quantization (experimental)
Support FP8 Attention static quantization (experimental)

Improvements

New Gemma3 FP8 PTQ example for Keras/JAX
New ViT FP8 PTQ example for Keras/JAX
Llama 3 series MXFP4 / MXFP8 PTQ example with FP8 KV & Attention
Llama 4 Scout MXFP4 / MXFP8 PTQ example with FP8 KV & Attention
Qwen3 MXFP4 / MXFP8 PTQ example with FP8 KV & Attention
DeepSeek R1 MXFP4 / MXFP8 PTQ example with FP8 KV & Attention
Transformers v5 support
Removal of deprecated 2.x API

Validated Hardware 

Intel Gaudi Al Accelerators (Gaudi 2 and 3)
Intel Xeon Scalable processor (4th, 5th and 6th Gen)
Intel® Arc™ B-Series Graphics GPU (B580 and B60)

Validated Configurations

Ubuntu 24.04 & Win 11
Python 3.11, 3.12, 3.13
PyTorch 2.9, 2.10
JAX 0.9

Notes

It is recommended to use version v3.8 or later to mitigate code CVEs.

Assets 2