v0.2.1
Release Notes
This release focuses on significant improvements to quantization capabilities, particularly for advanced scenarios and performance.
Key Highlights:
- Expanded Quantization Methods:
- Introduced OCTAV as an alternative uniform quantization method.
- Implemented Hadamard rotation quantization, including support for 3D tensors and performance optimizations.
- Enhanced Blockwise Quantization:
- Added support for FullyConnected and EmbeddingLookup ops.
- Enabled blockwise quantization across various uniform algorithms, including OCTAV.
- Improved Constant Tensor Handling:
- Added robust support for constant tensors with shared buffers but different quantization parameters.
- Optimized handling and duplication of constant tensors to prevent unnecessary duplicates and ensure correct transformation.
- Core Infrastructure & Stability:
- Added support for calibrating composite decompositions.
- Addressed memory allocation issues and zero-size array crashes.
- Improved handling of graph input/output indices, custom ops, and dynamic shapes.
- Adjusted buffer size for larger model quantization.
- General code refactoring and test improvements for better maintainability and reliability.
- New Recipes & Features:
- Added a new
dynamic_wi4_afp32recipe. - Supported integer inputs in dataset creation.
- Added a new