Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[quant][doc] Adding a table comparing eager and fx graph mode #50413

Closed
wants to merge 10 commits into from
Closed
42 changes: 39 additions & 3 deletions docs/source/quantization.rst
Expand Up @@ -84,7 +84,42 @@ PyTorch provides two different modes of quantization: Eager Mode Quantization an

Eager Mode Quantization is a beta feature. User needs to do fusion and specify where quantization and dequantization happens manually, also it only supports modules and not functionals.

FX Graph Mode Quantization is a new automated quantization framework in PyTorch, and currently it's a prototype feature. It improves upon Eager Mode Quantization by adding support for functionals and automating the quantization process. Although people might need to refactor the model a bit to make the model compatible with FX Graph Mode Quantization (symbolically traceable with torch.fx).
FX Graph Mode Quantization is a new automated quantization framework in PyTorch, and currently it's a prototype feature. It improves upon Eager Mode Quantization by adding support for functionals and automating the quantization process, although people might need to refactor the model to make the model compatible with FX Graph Mode Quantization (symbolically traceable with ``torch.fx``).

The following table compares the differences between Eager Mode Quantization and FX Graph Mode Quantization:

+-----------------+-------------------+-------------------+
jerryzh168 marked this conversation as resolved.
Show resolved Hide resolved
| |Eager Mode |FX Graph |
| |Quantization |Mode |
| | |Quantization |
+-----------------+-------------------+-------------------+
|Release |beta |prototype |
|Status | | |
+-----------------+-------------------+-------------------+
|Operator |Manual |Automatic |
|Fusion | | |
+-----------------+-------------------+-------------------+
|Quant/DeQuant |Manual |Automatic |
|Placement | | |
+-----------------+-------------------+-------------------+
|Quantizing |Supported |Supported |
|Modules | | |
+-----------------+-------------------+-------------------+
|Quantizing |Need Refactor |Automatic |
jerryzh168 marked this conversation as resolved.
Show resolved Hide resolved
|Functionals/Torch|(FloatFunctional) | |
|Ops | | |
+-----------------+-------------------+-------------------+
|Support for |Limited Support |Fully |
|Customization | |Supported |
+-----------------+-------------------+-------------------+
|Input/Output |``torch.nn.Module``|``torch.nn.Module``|
|Model Type | |(May need some |
| | |refactors to make |
| | |the model |
| | |compatible with FX |
| | |Graph Mode |
| | |Quantization) |
+-----------------+-------------------+-------------------+
jerryzh168 marked this conversation as resolved.
Show resolved Hide resolved

Eager Mode Quantization
^^^^^^^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -357,6 +392,7 @@ Quantization types supported by FX Graph Mode can be classified in two ways:
These two ways of classification are independent, so theoretically we can have 6 different types of quantization.

The supported quantization types in FX Graph Mode Quantization are:

- Post Training Quantization

- Weight Only Quantization
Expand Down Expand Up @@ -424,8 +460,8 @@ API Example::
model_fused = quantize_fx.fuse_fx(model_to_quantize)

Please see the following tutorials for more information about FX Graph Mode Quantization:
- FX Graph Mode Post Training Static Quantization (TODO: link)
- FX Graph Mode Post Training Dynamic Quantization (TODO: link)
- `FX Graph Mode Post Training Static Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_static_tutorial.html>`_
- `FX Graph Mode Post Training Dynamic Quantization <https://pytorch.org/tutorials/prototype/fx_graph_mode_ptq_dynamic_tutorial.html>`_

Quantized Tensors
---------------------------------------
Expand Down