Skip to content

Version 2.6.0

Choose a tag to compare

@aimetci aimetci released this 16 May 23:52

What's Changed

New Features

  • ONNX

    • Support for passing onnxruntime EPs directly to QuantizationSimModel.__init__
  • PyTorch

    • Support for simulating float8 quantization
    • Experimental: Added aimet_torch.onnx.export API for exporting QuantizationSimModel to onnx QDQ graph

Bug Fixes and Improvements

  • ONNX

    • Reduced CPU and GPU memory usage during sequential MSE
    • Fixed AMP generating incompatible quantizer configurations
    • Fixed AMP errors with dynamic Conv ops
    • Aligned computation of symmetric encodings with aimet_torch
  • PyTorch

    • Fixed AttributeError when catching torch.onnx.export failures during QuantSim export
    • Fixed errors being thrown when deepspeed import fails
    • Aligned input and output encodings for Resize layers
    • Added supergroup fusion handling for LeakyRelu layers
    • Docs: Updated LoRA user guide

Deprecations

  • ONNX
    • Deprecated use_cuda, device, rounding_mode, and use_symmetric_encodings args to QuantizationSimModel.__init__