Skip to content

Releases: onnx/neural-compressor

ONNX Neural Compressor v1.0 Release

02 Aug 02:13
71c2484
Compare
Choose a tag to compare

Neural Compressor provides ONNX model quantization techniques inherited from Intel Neural Compressor, including Post-training Quantization and Weight-only Quantization.

  • Features
  • Validated Configurations

Features

  • Support Post-training Quantization, including static and dynamic approach
  • Support SmoothQuant for Post-training Quantization
  • Support Weight-only Quantization with several algorithms, including RTN, GPTQ, AWQ
  • Support layer-wise quantization for RTN, GPTQ
  • Validate popular LLMs such as Llama3, Phi-3, Qwen2 with weight-only quantization on multiple Intel hardware, such as Intel Xeon Scalable processor and Intel Core Ultra Processors

Validated Configurations

  • OS version: CentOS 8.4, Ubuntu 22.04
  • Python version: 3.10
  • ONNX Runtime version: 1.18.1