Skip to content

Version 2.29.0

Choose a tag to compare

@aimetci aimetci released this 20 Apr 18:53
  • New Features

    • ONNX

      • Add support for Qwen 2.5 VL in aimet-onnx (f256686)
    • Torch

      • Support OOTB quantization of nn.MultiHeadAttention (4d19f47)
      • Support OOTB quantization of Qwen 3.5 normalization layers (01b912f)
      • Support OOTB quantization of InternVL GELU (c5f65b7)
  • Bug fixes and Improvements

    • Common

      • Make export_int32_bias default to True if encoding_version >= 2.0.0 (22876ca)
    • ONNX

      • Optimize QDQ latency for fp16 models (c817a17)
      • Support pattern matching LayerNormalization without bias (84f880a)
      • Make from_onnx_export ignore unloadable encodings by default (1b50727)
      • Enable loading models with redundant back-to-back QDQ using from_onnx_qdq (0f2be91)
      • Skip folding BatchNormalization when the Conv layer has shared weights (8f552b7)
      • Fix bug in standalone BatchNormalization fold with shared tensors (eb7ae4b)
    • Torch

      • Disable activation quantizers for re-used stateless nn.Modules (8f552b7)