Skip to content

Latest commit

 

History

History
326 lines (269 loc) · 16.4 KB

quantization-support.rst

File metadata and controls

326 lines (269 loc) · 16.4 KB

Quantization Operation coverage

Quantized Tensors support a limited subset of data manipulation methods of the regular full-precision tensor. For NN operators included in PyTorch, we restrict support to:

  1. 8 bit weights (data_type = qint8)
  2. 8 bit activations (data_type = quint8)

Note that operator implementations currently only support per channel quantization for weights of the conv and linear operators. Furthermore the minimum and the maximum of the input data is mapped linearly to the minimum and the maximum of the quantized data type such that zero is represented with no quantization error.

Additional data types and quantization schemes can be implemented through the custom operator mechanism.

Many operations for quantized tensors are available under the same API as full float version in torch or torch.nn. Quantized version of NN modules that perform re-quantization are available in torch.nn.quantized. Those operations explicitly take output quantization parameters (scale and zero_point) in the operation signature.

In addition, we also support fused versions corresponding to common fusion patterns that impact quantization at: torch.nn.intrinsic.quantized.

For quantization aware training, we support modules prepared for quantization aware training at torch.nn.qat and torch.nn.intrinsic.qat

The following operation list is sufficient to cover typical CNN and RNN models

Quantized torch.Tensor operations

Operations that are available from the torch namespace or as methods on Tensor for quantized tensors:

torch.nn.functional

Basic activations are supported.

torch.nn.intrinsic

Fused modules are provided for common patterns in CNNs. Combining several operations together (like convolution and relu) allows for better quantization accuracy

torch.nn.qat

Layers for the quantization-aware training

torch.quantization

torch.nn.quantized

Quantized version of standard NN layers.

torch.nn.quantized.dynamic

Layers used in dynamically quantized models (i.e. quantized only on weights)

torch.nn.quantized.functional

Functional versions of quantized NN layers (many of them accept explicit quantization output parameters)

Quantized dtypes and quantization schemes