Skip to content

v0.4.0

Choose a tag to compare

@v-dziuba v-dziuba released this 11 Nov 22:26
· 143 commits to main since this release

πŸš€ Release Notes: v0.4.0


✨ New Features & Operator Support

  • New Operator Support (AEQ): Added int8/int16 support for the following operations via Adaptive Edge Quantization (AEQ):
    • PADV2 (Pull #246)
    • REDUCE_MIN (Pull #313)
    • EQUAL (Pull #316)
    • MIRROR_PAD (Pull #323)
    • SPACE_TO_DEPTH (int8 only) (Pull #332)
  • MSE Quantization Support: Added Mean Squared Error (MSE) quantization support for:
    • FullyConnected and EmbeddingLookup (Pull #320)
    • Conv, TransposeConv, and DepthWise Conv (Pull #336)
  • New Validation Metrics: Added new metrics for model validation:
    • Cosine Similarity (Pull #328)
    • KL Divergence (Pull #333)
    • Signal-to-Noise Ratio (SNR) (Pull #345)
  • Improved Quantization Precision: The minimum bound for the quantization scale was reduced from $10^{-4}$ to $10^{-9}$ to support finer precision levels (e.g., $10^{-6}$ for int8, $10^{-8}$ for int16). (Pull #344)
  • Blockwise Quantization: Added support for blockwise dequantization in AEQ. (Pull #314)
  • Weight Only Quantization: Added Embedding Lookup to supported subchannel operations. (Pull #310)

βš™οΈ Improvements & Refinements

  • Bias Quantization Enhancements:
    • Quantized bias is now 32 bits and stored as 64 bits for 16-bit activations. (Pull #308)
    • Added numerical checks for bias quantization. (Pull #311)
    • Increased the error tolerance for bias quantization. (Pull #338)
  • Hadamard Quantization:
    • Removed channel-wise constraints in Hadamard quantization. (Pull #306)
    • Decomposed Hadamard rotation as a FullyConnected operation. (Pull #318)
  • Quantization Scope & Granularity:
    • Removed dynamically quantized FullyConnected and Conv2D from latest operations. (Pull #294)
    • Extended the list of quantizable composites. (Pull #303)
    • Refactored blockwise quantization granularity. (Pull #341)
    • Removed EMBEDDING_LOOKUP from the static quantization allowlist. (Pull #324)
  • Validator & SRQ Fixes:
    • Added validator support for pre-quantized models. (Pull #331)
    • Corrected signature defs when dequantization precedes graph output during Static Range Quantization (SRQ). (Pull #327)
    • Fixed a bug to allow the float_casting algorithm in add_weight_only_config. (Pull #322)
    • Handle zero-length tensors in cosine_similarity metric. (Pull #330)

πŸ“š Documentation, Refactoring & Stability

  • Documentation Update: The README now includes detailed explanations of dynamic, weight-only, and static quantization, including their characteristics, pros, and cons. (Pull #295)
  • Refactoring: Extracted the constrained op list generation to a utility function. (Pull #301)
  • Fixes & Stability:
    • Fixed failing getting_started.ipynb in nightly Colab. (Pull #300)
    • Fixed AEQ notebooks to run correctly in Google Colab. (Pull #315)
    • Updated dependencies to use tf-nightly. (Pull #298, Pull #305, Pull #210)
    • Added a note to the Colab notebook warning that quantizer.validate may cause an "out of memory" error. (Pull #321)
    • Added an error check: raise an error if the quantized dimension is not divisible by the block size. (Pull #307)

Full Changelog: v0.3.0...v0.4.0

Would you like me to focus on one of these sections, like the new operator support, and elaborate on it?