Release v0.4.0 · google-ai-edge/ai-edge-quantizer

🚀 Release Notes: v0.4.0

New Operator Support (AEQ): Added int8/int16 support for the following operations via Adaptive Edge Quantization (AEQ):
- PADV2 (Pull #246)
- REDUCE_MIN (Pull #313)
- EQUAL (Pull #316)
- MIRROR_PAD (Pull #323)
- SPACE_TO_DEPTH (int8 only) (Pull #332)
MSE Quantization Support: Added Mean Squared Error (MSE) quantization support for:
- FullyConnected and EmbeddingLookup (Pull #320)
- Conv, TransposeConv, and DepthWise Conv (Pull #336)
New Validation Metrics: Added new metrics for model validation:
- Cosine Similarity (Pull #328)
- KL Divergence (Pull #333)
- Signal-to-Noise Ratio (SNR) (Pull #345)
Improved Quantization Precision: The minimum bound for the quantization scale was reduced from $10^{-4}$ to $10^{-9}$ to support finer precision levels (e.g., $10^{-6}$ for int8, $10^{-8}$ for int16). (Pull #344)
Blockwise Quantization: Added support for blockwise dequantization in AEQ. (Pull #314)
Weight Only Quantization: Added Embedding Lookup to supported subchannel operations. (Pull #310)

Bias Quantization Enhancements:
- Quantized bias is now 32 bits and stored as 64 bits for 16-bit activations. (Pull #308)
- Added numerical checks for bias quantization. (Pull #311)
- Increased the error tolerance for bias quantization. (Pull #338)
Hadamard Quantization:
- Removed channel-wise constraints in Hadamard quantization. (Pull #306)
- Decomposed Hadamard rotation as a FullyConnected operation. (Pull #318)
Quantization Scope & Granularity:
- Removed dynamically quantized FullyConnected and Conv2D from latest operations. (Pull #294)
- Extended the list of quantizable composites. (Pull #303)
- Refactored blockwise quantization granularity. (Pull #341)
- Removed EMBEDDING_LOOKUP from the static quantization allowlist. (Pull #324)
Validator & SRQ Fixes:
- Added validator support for pre-quantized models. (Pull #331)
- Corrected signature defs when dequantization precedes graph output during Static Range Quantization (SRQ). (Pull #327)
- Fixed a bug to allow the float_casting algorithm in add_weight_only_config. (Pull #322)
- Handle zero-length tensors in cosine_similarity metric. (Pull #330)

Documentation Update: The README now includes detailed explanations of dynamic, weight-only, and static quantization, including their characteristics, pros, and cons. (Pull #295)
Refactoring: Extracted the constrained op list generation to a utility function. (Pull #301)
Fixes & Stability:
- Fixed failing getting_started.ipynb in nightly Colab. (Pull #300)
- Fixed AEQ notebooks to run correctly in Google Colab. (Pull #315)
- Updated dependencies to use tf-nightly. (Pull #298, Pull #305, Pull #210)
- Added a note to the Colab notebook warning that quantizer.validate may cause an "out of memory" error. (Pull #321)
- Added an error check: raise an error if the quantized dimension is not divisible by the block size. (Pull #307)

Full Changelog: v0.3.0...v0.4.0

Would you like me to focus on one of these sections, like the new operator support, and elaborate on it?