Version 2.6.0
What's Changed
New Features
-
ONNX
- Support for passing onnxruntime EPs directly to
QuantizationSimModel.__init__
- Support for passing onnxruntime EPs directly to
-
PyTorch
- Support for simulating float8 quantization
- Experimental: Added
aimet_torch.onnx.exportAPI for exportingQuantizationSimModelto onnx QDQ graph
Bug Fixes and Improvements
-
ONNX
- Reduced CPU and GPU memory usage during sequential MSE
- Fixed AMP generating incompatible quantizer configurations
- Fixed AMP errors with dynamic Conv ops
- Aligned computation of symmetric encodings with
aimet_torch
-
PyTorch
- Fixed AttributeError when catching
torch.onnx.exportfailures during QuantSim export - Fixed errors being thrown when deepspeed import fails
- Aligned input and output encodings for Resize layers
- Added supergroup fusion handling for LeakyRelu layers
- Docs: Updated LoRA user guide
- Fixed AttributeError when catching
Deprecations
- ONNX
- Deprecated
use_cuda,device,rounding_mode, anduse_symmetric_encodingsargs toQuantizationSimModel.__init__
- Deprecated