Skip to content

v0.2.1

Choose a tag to compare

@v-dziuba v-dziuba released this 20 May 15:27
· 226 commits to main since this release

Release Notes


This release focuses on significant improvements to quantization capabilities, particularly for advanced scenarios and performance.

Key Highlights:

  • Expanded Quantization Methods:
    • Introduced OCTAV as an alternative uniform quantization method.
    • Implemented Hadamard rotation quantization, including support for 3D tensors and performance optimizations.
  • Enhanced Blockwise Quantization:
    • Added support for FullyConnected and EmbeddingLookup ops.
    • Enabled blockwise quantization across various uniform algorithms, including OCTAV.
  • Improved Constant Tensor Handling:
    • Added robust support for constant tensors with shared buffers but different quantization parameters.
    • Optimized handling and duplication of constant tensors to prevent unnecessary duplicates and ensure correct transformation.
  • Core Infrastructure & Stability:
    • Added support for calibrating composite decompositions.
    • Addressed memory allocation issues and zero-size array crashes.
    • Improved handling of graph input/output indices, custom ops, and dynamic shapes.
    • Adjusted buffer size for larger model quantization.
    • General code refactoring and test improvements for better maintainability and reliability.
  • New Recipes & Features:
    • Added a new dynamic_wi4_afp32 recipe.
    • Supported integer inputs in dataset creation.