Skip to content

v2.6.0

Compare
Choose a tag to compare
@KodiaqQ KodiaqQ released this 18 Sep 15:59
· 1941 commits to develop since this release

Post-training Quantization:

Features:

  • Added CPU_SPR device type support.
  • Added quantizers scales unification.
  • Added quantization scheme for ReduceSum operation.
  • Added new types (ReduceL2, ReduceSum, Maximum) to the ignored scope for ModelType.Transformer.
  • (OpenVINO) Added SmoothQuant algorithm.
  • (OpenVINO) Added ChannelAlignment algorithm.
  • (OpenVINO) Added HyperparameterTuner algorithm.
  • (PyTorch) Added FastBiasCorrection algorithm support.
  • (OpenVINO, ONNX) Added embedding weights quantization.
  • (OpenVINO, PyTorch) Added new compress_weights method that provides data-free INT8 weights compression.

Fixes:

  • Fixed detection of decomposed post-processing in models.
  • Multiple fixes (new patterns, bugfixes, etc.) to solve #1936 issue.
  • Fixed model reshaping while quantization to keep original model shape.
  • (OpenVINO) Added support for sequential models quanitzation.
  • (OpenVINO) Fixed in-place statistics cast to support empty dimensions.
  • (OpenVINO, ONNX) Fixed quantization of the MatMul operation with weights rank > 2.
  • (OpenVINO, ONNX) Fixed BiasCorrection algorithm to enable CLIP model quantization.

Improvements:

  • Optimized quantize(…) pipeline (up to 4.3x speed up in total).
  • Optimized quantize_with_accuracy_control(…) pipelilne (up to 8x speed up for 122-quantizing-model-with-accuracy-control notebook).
  • Optimized general statistics collection (up to 1.2x speed up for ONNX backend).
  • Ignored patterns separated from Fused patterns scheme (with multiple patterns addition).

Tutorials:

Compression-aware training:

Features:

  • Added shape pruning processor for BootstrapNAS algorithm.
  • Added KD loss for BootstrapNAS algorithm.
  • Added validate_scopes parameter for NNCF configuration.
  • (PyTorch) Added PyTorch 2.0 support.
  • (PyTorch) Added .strip() option to API.
  • (PyTorch) Enabled bfloat data type for quantization kernels.
  • (PyTorch) Quantized models can now be torch.jit.traced without calling .strip().
  • (PyTorch) Added support for overridden forward instance attribute on model objects passed into create_compressed_model.
  • (Tensorflow) Added Tensorflow 2.12 support.

Fixes:

  • (PyTorch) Fixed padding adjustment issue in the elastic kernel to work with the different active kernel sizes.
  • (PyTorch) Fixed the torch graph tracing in the case the tensors belonging to parallel edges are interleaved in the order of the tensor argument.
  • (PyTorch) Fixed recurrent nodes matching (LSTM, GRU cells) condition with the strict rule to avoid adding not necessary nodes to the ignored scope.
  • (PyTorch) Fixed torch.jit.script wrapper so that user-side handling exceptions during torch.jit.script invocation do not cause NNCF to be permanently disabled.
  • (PyTorch, Tensorflow) Adjusted quantizer propagation algorithm to check if quantizer propagation will result in output quantization.
  • (PyTorch) Added redefined __class__ method for ProxyModule that avoids causing error while calling .super() in forward method.

Deprecations/Removals:

  • (PyTorch) Removed deprecated NNCFNetwork.__getattr__, NNCFNetwork.get_nncf_wrapped_model methods.

Requirements:

  • Updated PyTorch version (2.0.1).
  • Updated Tensorflow version (2.12.0).