Version 2.6.0

aimetci released this 16 May 23:52

1b87a6a

What's Changed

New Features

ONNX
- Support for passing onnxruntime EPs directly to QuantizationSimModel.__init__
PyTorch
- Support for simulating float8 quantization
- Experimental: Added aimet_torch.onnx.export API for exporting QuantizationSimModel to onnx QDQ graph

Bug Fixes and Improvements

ONNX
- Reduced CPU and GPU memory usage during sequential MSE
- Fixed AMP generating incompatible quantizer configurations
- Fixed AMP errors with dynamic Conv ops
- Aligned computation of symmetric encodings with aimet_torch
PyTorch
- Fixed AttributeError when catching torch.onnx.export failures during QuantSim export
- Fixed errors being thrown when deepspeed import fails
- Aligned input and output encodings for Resize layers
- Added supergroup fusion handling for LeakyRelu layers
- Docs: Updated LoRA user guide

Deprecations

ONNX
- Deprecated use_cuda, device, rounding_mode, and use_symmetric_encodings args to QuantizationSimModel.__init__

Assets 8