Skip to content

Releases: openvinotoolkit/nncf

v2.13.0

19 Sep 10:24
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • (OpenVINO) Added support for combining GPTQ with AWQ and Scale Estimation (SE) algorithms in nncf.compress_weights() for more accurate weight compression of LLMs. Thus, the following combinations with GPTQ are now supported: AWQ+GPTQ+SE, AWQ+GPTQ, GPTQ+SE, GPTQ.
  • (OpenVINO) Added LoRA Correction Algorithm to further improve the accuracy of int4 compressed models on top of other algorithms - AWQ and Scale Estimation. It can be enabled via the optional lora_correction parameter of the nncf.compress_weights() API. The algorithm increases compression time and incurs a negligible model size overhead. Refer to accuracy/footprint trade-off for different int4 compression methods.
  • (PyTorch) Added implementation of the experimental Post-training Activation Pruning algorithm. Refer to Activation Sparsity for details.
  • Added a memory monitoring tool for logging the memory a piece of python code or a script allocates. Refer to NNCF tools for details.

Fixes:

  • (OpenVINO) Fixed the quantization of Convolution and LSTMSequence operations in cases where some inputs are part of a ShapeOF subgraph.
  • (OpenVINO) Fixed issue with the FakeConvert duplication for FP8.
  • Fixed Smooth Quant algorithm issue in case of the incorrect shapes.
  • Fixed non-deterministic layer-wise scheduling.

Improvements:

  • (OpenVINO) Increased hardware-fused pattern coverage.
  • Improved progress bar logic during weights compression for more accurate remaining time estimation.
  • Extended Scale estimation bitness range support for the nncf.compress_weights().
  • Removed extra logging for the algorithm-generated ignored scope.

Tutorials:

Compression-aware training:

Fixes:

  • (PyTorch) Fixed some scenarios of NNCF patching interfering with torch.compile.

Requirements:

  • Updated PyTorch (2.4.0) and Torchvision (0.19.0) versions.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rk119

v2.12.0

31 Jul 12:28
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • (OpenVINO, PyTorch, ONNX) Excluded comparison operators from the quantization scope for nncf.ModelType.TRANSFORMER.
  • (OpenVINO, PyTorch) Changed the representation of symmetrically quantized weights from an unsigned integer with a fixed zero-point to a signed data type without a zero-point in the nncf.compress_weights() method.
  • (OpenVINO) Extended patterns support of the AWQ algorithm as part of nncf.compress_weights(). This allows apply AWQ for the wider scope of the models.
  • (OpenVINO) Introduced nncf.CompressWeightsMode.E2M1 mode option of nncf.compress_weights() as the new MXFP4 precision (Experimental).
  • (OpenVINO) Added support for models with BF16 precision in the nncf.quantize() method.
  • (PyTorch) Added quantization support for the torch.addmm.
  • (PyTorch) Added quantization support for the torch.nn.functional.scaled_dot_product_attention.

Fixes:

  • (OpenVINO, PyTorch, ONNX) Fixed Fast-/BiasCorrection algorithms with correct support of transposed MatMul layers.
  • (OpenVINO) Fixed nncf.IgnoredScope() functionality for models with If operation.
  • (OpenVINO) Fixed patterns with PReLU operations.
  • Fixed runtime error while importing NNCF without Matplotlib package.

Improvements:

  • Reduced the amount of memory required for applying nncf.compress_weights() to OpenVINO models.
  • Improved logging in case of the not empty nncf.IgnoredScope().

Tutorials:

Compression-aware training:

Fixes:

  • (PyTorch) Fixed issue with wrapping for operator without patched state.

Requirements:

  • Updated Tensorflow (2.15) version. This version requires Python 3.9-3.11.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@Lars-Codes

v2.11.0

17 Jun 11:02
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • (OpenVINO) Added Scale Estimation algorithm for 4-bit data-aware weights compression. The optional scale_estimation parameter was introduced to nncf.compress_weights() and can be used to minimize accuracy degradation of compressed models (note that this algorithm increases the compression time).
  • (OpenVINO) Added GPTQ algorithm for 8/4-bit data-aware weights compression, supporting INT8, INT4, and NF4 data types. The optional gptq parameter was introduced to nncf.compress_weights() to enable the GPTQ algorithm.
  • (OpenVINO) Added support for models with BF16 weights in the weights compression method, nncf.compress_weights().
  • (PyTorch) Added support for quantization and weight compression of the custom modules.

Fixes:

  • (OpenVINO) Fixed incorrect node with bias determination in Fast-/BiasCorrection and ChannelAlighnment algorithms.
  • (OpenVINO, PyTorch) Fixed incorrect behaviour of nncf.compress_weights() in case of compressed model as input.
  • (OpenVINO, PyTorch) Fixed SmoothQuant algorithm to work with Split ports correctly.

Improvements:

  • (OpenVINO) Aligned resulting compression subgraphs for the nncf.compress_weights() in different FP precisions.
  • Aligned 8-bit scheme for NPU target device with the CPU.

Examples:

  • (OpenVINO, ONNX) Updated ignored scope for YOLOv8 examples utilizing a subgraphs approach.

Tutorials:

Compression-aware training:

Features:

  • (PyTorch) nncf.quantize method is now the recommended path for the quantization initialization for Quantization-Aware Training.
  • (PyTorch) Compression modules placement in the model now can be serialized and restored with new API functions: compressed_model.nncf.get_config() and nncf.torch.load_from_config. The documentation for the saving/loading of a quantized model is available, and Resnet18 example was updated to use the new API.

Fixes:

  • (PyTorch) Fixed compatibility with torch.compile.

Improvements:

  • (PyTorch) Base parameters were extended for the EvolutionOptimizer (LeGR algorithm part).
  • (PyTorch) Improved wrapping for parameters which are not tensors.

Examples:

  • (PyTorch) Added an example for STFPM model from Anomalib.

Tutorials:

Deprecations/Removals:

  • Removed extra dependencies to install backends from setup.py (like [torch] are [tf], [onnx] and [openvino]).
  • Removed openvino-dev dependency.

Requirements:

  • Updated PyTorch (2.3.0) and Torchvision (0.18.0) versions.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@DaniAffCH
@UsingtcNower
@anzr299
@AdiKsOnDev
@Viditagarwal7479
@truhinnm

v2.10.0

25 Apr 12:01
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • Introduced the subgraph defining functionality for the nncf.IgnoredScope() option.
  • Introduced limited support for the batch size of more than 1. MobilenetV2 PyTorch example was updated with batch support.

Fixes:

  • Fixed issue with the nncf.OverflowFix parameter absence in some scenarios.
  • Aligned the list of correctable layers for the FastBiasCorrection algorithm between PyTorch, OpenVINO and ONNX backends.
  • Fixed issue with the nncf.QuantizationMode parameters combination.
  • Fixed MobilenetV2 (PyTorch, ONNX, OpenVINO) examples for the Windows platform.
  • (OpenVINO) Fixed Anomaly Classification example for the Windows platform.
  • (PyTorch) Fixed bias shift magnitude calculation for fused layers.
  • (OpenVINO) Fixed removing the ShapeOf graph which led to an error in the nncf.quantize_with_accuracy_control() method.
  • Improvements:
  • OverflowFix, AdvancedSmoothQuantParameters and AdvancedBiasCorrectionParameters were exposed into the nncf.* namespace.
  • (OpenVINO, PyTorch) Introduced scale compression to FP16 for weights in nncf.compress_weights() method, regardless of model weights precision.
  • (PyTorch) Modules that NNCF inserted were excluded from parameter tracing.
  • (OpenVINO) Extended the list of correctable layers for the BiasCorrection algorithm.
  • (ONNX) Aligned BiasCorrection algorithm behaviour with OpenVINO in specific cases.

Tutorials:

Compression-aware training:

Features:

  • (PyTorch) nncf.quantize method now may be used as quantization initialization for Quantization-Aware Training. Added a Resnet18-based example with the transition from the Post-Training Quantization to a Quantization-Aware Training algorithm.
  • (PyTorch) Introduced extractors for the fused Convolution, Batch-/GroupNorm, and Linear functions.

Fixes:

  • (PyTorch) Fixed apply_args_defaults function issue.
  • (PyTorch) Fixed dtype handling for the compressed torch.nn.Parameter.
  • (PyTorch) Fixed is_shared parameter propagation.

Improvements:

  • (PyTorch) Updated command creation behaviour to reduce the number of adapters.
  • (PyTorch) Added option to insert point for models that wrapped with replace_modules=False.

Deprecations/Removals:

  • (PyTorch) Removed the binarization algorithm.
  • NNCF installation via pip install nncf[] option is now deprecated.

Requirements:

  • Updated PyTorch (2.2.1) and CUDA (12.1) versions.
  • Updated ONNX (1.16.0) and ONNXRuntime (1.17.1) versions.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@Candyzorua
@clinty
@UsingtcNower
@DaniAffCH

v2.9.0

06 Mar 11:39
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • (OpenVINO) Added modified AWQ algorithm for 4-bit data-aware weights compression. This algorithm applied only for patterns MatMul->Multiply->Matmul. For that awq optional parameter has been added to nncf.compress_weights() and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).
  • (ONNX) Introduced support for the ONNX backend in the nncf.quantize_with_accuracy_control() method. Users can now perform quantization with accuracy control for onnx.ModelProto. By leveraging this feature, users can enhance the accuracy of quantized models while minimizing performance impact.
  • (ONNX) Added an example based on the YOLOv8n-seg model for demonstrating the usage of quantization with accuracy control for the ONNX backend.
  • (PT) Added SmoothQuant algorithm for PyTorch backend in nncf.quantize().
  • (OpenVINO) Added an example with the hyperparameters tuning for the TinyLLama model.
  • Introduced the nncf.AdvancedAccuracyRestorerParameters.
  • Introduced the subset_size option for the nncf.compress_weights().
  • Introduced TargetDevice.NPU as the replacement for TargetDevice.VPU.

Fixes:

  • Fixed API Enums serialization/deserialization issue.
  • Fixed issue with required arguments for revert_operations_to_floating_point_precision method.

Improvements:

  • (ONNX) Aligned statistics collection with OpenVINO and PyTorch backends.
  • Extended nncf.compress_weights() with Convolution & Embeddings compression in order to reduce memory footprint.

Deprecations/Removals:

  • (OpenVINO) Removed outdated examples with nncf.quantize() for BERT and YOLOv5 models.
  • (OpenVINO) Removed outdated example with nncf.quantize_with_accuracy_control() for SSD MobileNetV1 FPN model.
  • (PyTorch) Deprecated the binarization algorithm.
  • Removed Post-training Optimization Tool as OpenVINO backend.
  • Removed Dockerfiles.
  • TargetDevice.VPU was replaced by TargetDevice.NPU.

Tutorials:

Compression-aware training:

Fixes

  • (PyTorch) Fixed issue with NNCFNetworkInterface.get_clean_shallow_copy missed arguments.

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@AishwaryaDekhane
@UsingtcNower
@Om-Doiphode

v2.8.1

09 Feb 09:45
Compare
Choose a tag to compare

Post-training Quantization:

Bugfixes:

  • (Common) Fixed issue with nncf.compress_weights() to avoid overflows on 32-bit Windows systems.
  • (Common) Fixed performance issue with nncf.compress_weights() on LLama models.
  • (Common) Fixed nncf.quantize_with_accuracy_control pipeline with tune_hyperparams=True enabled option.
  • (OpenVINO) Fixed issue for stateful LLM models and added state restoring after the inference for it.
  • (PyTorch) Fixed issue with nncf.compress_weights() for LLM models with the executing is_floating_point with tracing.

v2.8.0

24 Jan 13:06
Compare
Choose a tag to compare

Post-training Quantization:

Breaking changes:

  • nncf.quantize signature has been changed to add mode: Optional[nncf.QuantizationMode] = None as its 3-rd argument, between the original calibration_dataset and preset arguments.
  • (Common) nncf.common.quantization.structs.QuantizationMode has been renamed to nncf.common.quantization.structs.QuantizationScheme

General:

  • (OpenVINO) Changed default OpenVINO opset from 9 to 13.

Features:

  • (OpenVINO) Added 4-bit data-aware weights compression. For that dataset optional parameter has been added to nncf.compress_weights() and can be used to minimize accuracy degradation of compressed models (note that this option increases the compression time).
  • (PyTorch) Added support for PyTorch models with shared weights and custom PyTorch modules in nncf.compress_weights(). The weights compression algorithm for PyTorch models is now based on tracing the model graph. The dataset parameter is now required in nncf.compress_weights() for the compression of PyTorch models.
  • (Common) Renamed the nncf.CompressWeightsMode.INT8 to nncf.CompressWeightsMode.INT8_ASYM and introduce nncf.CompressWeightsMode.INT8_SYM that can be efficiently used with dynamic 8-bit quantization of activations.
    The original nncf.CompressWeightsMode.INT8 enum value is now deprecated.
  • (OpenVINO) Added support for quantizing the ScaledDotProductAttention operation from OpenVINO opset 13.
  • (OpenVINO) Added FP8 quantization support via nncf.QuantizationMode.FP8_E4M3 and nncf.QuantizationMode.FP8_E5M2 enum values, invoked via passing one of these values as an optional mode argument to nncf.quantize. Currently, OpenVINO supports inference of FP8-quantized models in reference mode with no performance benefits and can be used for accuracy projections.
  • (Common) Post-training Quantization with Accuracy Control - nncf.quantize_with_accuracy_control() has been extended by restore_mode optional parameter to revert weights to int8 instead of the original precision.
    This parameter helps to reduce the size of the quantized model and improves its performance.
    By default, it's disabled and model weights are reverted to the original precision in nncf.quantize_with_accuracy_control().
  • (Common) Added an all_layers: Optional[bool] = None argument to nncf.compress_weights to indicate whether embeddings and last layers of the model should be compressed to a primary precision. This is relevant to 4-bit quantization only.
  • (Common) Added a sensitivity_metric: Optional[nncf.parameters.SensitivityMetric] = None argument to nncf.compress_weights for finer control over the sensitivity metric for assigning quantization precision to layers.
    Defaults to weight quantization error if a dataset is not provided for weight compression and to maximum variance of the layers' inputs multiplied by inverted 8-bit quantization noise if a dataset is provided.
    By default, the backup precision is assigned for the embeddings and last layers.

Fixes:

  • (OpenVINO) Models with embeddings (e.g. gpt-2, stable-diffusion-v1-5, stable-diffusion-v2-1, opt-6.7b, falcon-7b, bloomz-7b1) are now more accurately quantized.
  • (PyTorch) nncf.strip(..., do_copy=True) now actually returns a deepcopy (stripped) of the model object.
  • (PyTorch) Post-hooks can now be set up on operations that return torch.return_type (such as torch.max).
  • (PyTorch) Improved dynamic graph tracing for various tensor operations from torch namespace.
  • (PyTorch) More robust handling of models with disjoint traced graphs when applying PTQ.

Improvements:

  • Reformatted the tutorials section in the top-level README.md for better readability.

Deprecations/Removals:

  • (Common) The original nncf.CompressWeightsMode.INT8 enum value is now deprecated.
  • (PyTorch) The Git patch for integration with HuggingFace transformers repository is marked as deprecated and will be removed in a future release.
    Developers are advised to use optimum-intel instead.
  • Dockerfiles in the NNCF Git repository are deprecated and will be removed in a future release.

v2.7.0

16 Nov 14:59
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • (OpenVINO) Added support for data-free 4-bit weights compression through NF4 and INT4 data types (compress_weights(…) pipeline).
  • (OpenVINO) Added support for IF operation quantization.
  • (OpenVINO) Added dump_intermediate_model parameter support for AccuracyAwareAlgorithm (quantize_with_accuracy_control(…) pipeline).
  • (OpenVINO) Added support for SmoothQuant and ChannelAlignment algorithms for HyperparameterTuner algorithm (quantize_with_tune_hyperparams(…) pipeline).
  • (PyTorch) Post-training Quantization is now supported with quantize(…) pipeline and the common implementation of quantization algorithms. Deprecated create_compressed_model() method for Post-training Quantization.
  • Added new types (AvgPool, GroupNorm, LayerNorm) to the ignored scope for ModelType.Transformer scheme.
  • QuantizationPreset.Mixed was set as the default for ModelType.Transformer scheme.

Fixes:

  • (OpenVINO, ONNX, PyTorch) Aligned/added patterns between backends (SE block, MVN layer, multiple activations, etc.) to restore performance/metrics.
  • Fixed patterns for ModelType.Transformer to align with the quantization scheme.

Improvements:

  • Improved UX with the new progress bar for pipeline, new exceptions, and .dot graph visualization updates.
  • (OpenVINO) Optimized WeightsCompression algorithm (compress_weights(…) pipeline) execution time for LLM's quantization, added ignored scope support.
  • (OpenVINO) Optimized AccuracyAwareQuantization algorithm execution time with multi-threaded approach while calculating ranking score (quantize_with_accuracy_control(…) pipeline).
  • (OpenVINO) Added extract_ov_subgraph tool for large IR subgraph extraction.
  • (ONNX) Optimized quantization pipeline (up to 1.15x speed up).

Tutorials:

Known issues:

  • (ONNX) quantize(...) method can generate inaccurate int8 results for models with the BatchNormalization layer that contains biases. To get the best accuracy, use the do_constant_folding=True option during export from PyTorch to ONNX.

Compression-aware training:

Fixes:

  • (PyTorch) Fixed Hessian trace calculation to solve #2155 issue.

Requirements:

  • Updated PyTorch version (2.1.0).
  • Updated numpy version (<1.27).

Deprecations/Removals:

  • (PyTorch) Removed legacy external quantizer storage names.
  • (PyTorch) Removed torch < 2.0 version support.

v2.6.0

18 Sep 15:59
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • Added CPU_SPR device type support.
  • Added quantizers scales unification.
  • Added quantization scheme for ReduceSum operation.
  • Added new types (ReduceL2, ReduceSum, Maximum) to the ignored scope for ModelType.Transformer.
  • (OpenVINO) Added SmoothQuant algorithm.
  • (OpenVINO) Added ChannelAlignment algorithm.
  • (OpenVINO) Added HyperparameterTuner algorithm.
  • (PyTorch) Added FastBiasCorrection algorithm support.
  • (OpenVINO, ONNX) Added embedding weights quantization.
  • (OpenVINO, PyTorch) Added new compress_weights method that provides data-free INT8 weights compression.

Fixes:

  • Fixed detection of decomposed post-processing in models.
  • Multiple fixes (new patterns, bugfixes, etc.) to solve #1936 issue.
  • Fixed model reshaping while quantization to keep original model shape.
  • (OpenVINO) Added support for sequential models quanitzation.
  • (OpenVINO) Fixed in-place statistics cast to support empty dimensions.
  • (OpenVINO, ONNX) Fixed quantization of the MatMul operation with weights rank > 2.
  • (OpenVINO, ONNX) Fixed BiasCorrection algorithm to enable CLIP model quantization.

Improvements:

  • Optimized quantize(…) pipeline (up to 4.3x speed up in total).
  • Optimized quantize_with_accuracy_control(…) pipelilne (up to 8x speed up for 122-quantizing-model-with-accuracy-control notebook).
  • Optimized general statistics collection (up to 1.2x speed up for ONNX backend).
  • Ignored patterns separated from Fused patterns scheme (with multiple patterns addition).

Tutorials:

Compression-aware training:

Features:

  • Added shape pruning processor for BootstrapNAS algorithm.
  • Added KD loss for BootstrapNAS algorithm.
  • Added validate_scopes parameter for NNCF configuration.
  • (PyTorch) Added PyTorch 2.0 support.
  • (PyTorch) Added .strip() option to API.
  • (PyTorch) Enabled bfloat data type for quantization kernels.
  • (PyTorch) Quantized models can now be torch.jit.traced without calling .strip().
  • (PyTorch) Added support for overridden forward instance attribute on model objects passed into create_compressed_model.
  • (Tensorflow) Added Tensorflow 2.12 support.

Fixes:

  • (PyTorch) Fixed padding adjustment issue in the elastic kernel to work with the different active kernel sizes.
  • (PyTorch) Fixed the torch graph tracing in the case the tensors belonging to parallel edges are interleaved in the order of the tensor argument.
  • (PyTorch) Fixed recurrent nodes matching (LSTM, GRU cells) condition with the strict rule to avoid adding not necessary nodes to the ignored scope.
  • (PyTorch) Fixed torch.jit.script wrapper so that user-side handling exceptions during torch.jit.script invocation do not cause NNCF to be permanently disabled.
  • (PyTorch, Tensorflow) Adjusted quantizer propagation algorithm to check if quantizer propagation will result in output quantization.
  • (PyTorch) Added redefined __class__ method for ProxyModule that avoids causing error while calling .super() in forward method.

Deprecations/Removals:

  • (PyTorch) Removed deprecated NNCFNetwork.__getattr__, NNCFNetwork.get_nncf_wrapped_model methods.

Requirements:

  • Updated PyTorch version (2.0.1).
  • Updated Tensorflow version (2.12.0).

v2.5.0

06 Jun 16:40
Compare
Choose a tag to compare

Post-training Quantization:

Features:

  • Official release of OpenVINO framework support.
    • Ported NNCF OpenVINO backend to use the nGraph representation of OpenVINO models.
    • Changed dependecies of NNCF OpenVINO backend. It now depends on openvino package and not on the openvino-dev package.
    • Added GRU/LSTM quantization support.
    • Added quantizer scales unification.
    • Added support for models with 3D and 5D Depthwise convolution.
    • Added FP16 OpenVINO models support.
  • Added "overflow_fix" parameter (for quantize(...) & quantize_with_accuracy_control(...) methods) support & functionality. It improves accuracy for optimized model for affected devices. More details in Quantization section.
  • (OpenVINO) Added support for in-place statistics collection (reduce memory footprint during optimization).
  • (OpenVINO) Added Quantization with accuracy control algorithm.
  • (OpenVINO) Added YOLOv8 examples for quantize(...) & quantize_with_accuracy_control(...) methods.
  • (PyTorch) Added min-max quantization algorithm as experimental.

Fixes:

  • Fixed ignored_scope attribute behaviour for weights. Now, the weighted layers excludes from optimization scope correctly.
  • (ONNX) Checking correct ONNX opset version via the nncf.quantize(...). Now, models with opset < 13 are optimized correctly in per-tensor quantization.

Improvements:

  • Added improvements for statistic collection process (collect weights statistics only once).
  • (PyTorch, OpenVINO, ONNX) Introduced unified quantizer parameters calculation.

Known issues:

  • quantize(...) method can generate inaccurate int8 results for models with the DenseNet-like architecture. Use quantize_with_accuracy_control(...) in such case.
  • quantize(...) method can hang on models with transformer architecture when fast_bias_correction optional parameter is set to False. Don't set it to False or use quantize_with_accuracy_control(...) in such case.
  • quantize(...) method can generate inaccurate int8 results for models with the MobileNet-like architecture on non-VNNI machines.

Compression-aware training:

New Features:

  • Introduced automated structured pruning algorithm for JPQD with support for BERT, Wave2VecV2, Swin, ViT, DistilBERT, CLIP, and MobileBERT models.
  • Added nncf.common.utils.patcher.Patcher - this class can be used to patch methods on live PyTorch model objects with wrappers such as nncf.torch.dynamic_graph.context.no_nncf_trace when doing so in the model code is not possible (e.g. if the model comes from an external library package).
  • Compression controllers of the nncf.api.compression.CompressionAlgorithmController class now have a .strip() method that will return the compressed model object with as many custom NNCF additions removed as possible while preserving the functioning of the model object as a compressed model.

Fixes:

  • Fixed statistics computation for pruned layers.
  • (PyTorch) Fixed traced tensors to implement the YOLOv8 from Ultralytics.

Improvements:

  • Extension of attributes (transpose/permute/getitem) for pruning node selector.
  • NNCFNetwork was refactored from a wrapper-approach to a mixin-like approach.
  • Added average pool 3d-like ops to pruning mask.
  • Added Conv3d for overflow fix.
  • nncf.set_log_file(...) can now be used to set location of the NNCF log file.
  • (PyTorch) Added support for pruning of torch.nn.functional.pad operation.
  • (PyTorch) Added torch.baddbmm as an alias for the matmul metatype for quantization purposes.
  • (PyTorch) Added config file for ResNet18 accuracy-aware pruning + quantization on CIFAR10.
  • (PyTorch) Fixed JIT-traceable PyTorch models with internal patching.
  • (PyTorch) Added __matmul__ magic functions to the list of patched ops (for SwinTransformer by Microsoft).

Requirements:

  • Updated ONNX version (1.13)
  • Updated Tensorflow version (2.11)

General changes:

  • Added Windows support for NNCF.