From 892177a665f297c946a6004b44c56f5a87d9ed2f Mon Sep 17 00:00:00 2001 From: Jerry Zhang Date: Wed, 20 Jan 2021 16:34:00 -0800 Subject: [PATCH] [quant][doc] Adding a table comparing eager and fx graph mode Summary: Test Plan: . Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 979c0ab5fa8cd04f8736e43595862b523dd7b85e Pull Request resolved: https://github.com/pytorch/pytorch/pull/50413 --- docs/source/quantization.rst | 55 ++++++++++++++++++++++++++++++++++-- 1 file changed, 52 insertions(+), 3 deletions(-) diff --git a/docs/source/quantization.rst b/docs/source/quantization.rst index 1cac90ffab86..9cb6191cabf8 100644 --- a/docs/source/quantization.rst +++ b/docs/source/quantization.rst @@ -84,7 +84,54 @@ PyTorch provides two different modes of quantization: Eager Mode Quantization an Eager Mode Quantization is a beta feature. User needs to do fusion and specify where quantization and dequantization happens manually, also it only supports modules and not functionals. -FX Graph Mode Quantization is a new automated quantization framework in PyTorch, and currently it's a prototype feature. It improves upon Eager Mode Quantization by adding support for functionals and automating the quantization process. Although people might need to refactor the model a bit to make the model compatible with FX Graph Mode Quantization (symbolically traceable with torch.fx). +FX Graph Mode Quantization is a new automated quantization framework in PyTorch, and currently it's a prototype feature. It improves upon Eager Mode Quantization by adding support for functionals and automating the quantization process, although people might need to refactor the model to make the model compatible with FX Graph Mode Quantization (symbolically traceable with ``torch.fx``). Note that FX Graph Mode Quantization is not expected to work on arbitrary models since the model might not be symbolically traceable, we will integrate it into domain libraries like torchvision and users will be able to quantize models similar to the ones in supported domain libraries with FX Graph Mode Quantization. For arbitrary models we'll provide general guidelines, but to actually make it work, users might need to be familiar with ``torch.fx``, especially on how to make a model symbolically traceable. + +New users of quantization are encouraged to try out FX Graph Mode Quantization first, if it does not work, user may try to follow the guideline of `using FX Graph Mode Quantization `_ or fall back to eager mode quantization. + +The following table compares the differences between Eager Mode Quantization and FX Graph Mode Quantization: + ++-----------------+-------------------+-------------------+ +| |Eager Mode |FX Graph | +| |Quantization |Mode | +| | |Quantization | ++-----------------+-------------------+-------------------+ +|Release |beta |prototype | +|Status | | | ++-----------------+-------------------+-------------------+ +|Operator |Manual |Automatic | +|Fusion | | | ++-----------------+-------------------+-------------------+ +|Quant/DeQuant |Manual |Automatic | +|Placement | | | ++-----------------+-------------------+-------------------+ +|Quantizing |Supported |Supported | +|Modules | | | ++-----------------+-------------------+-------------------+ +|Quantizing |Manual |Automatic | +|Functionals/Torch| | | +|Ops | | | ++-----------------+-------------------+-------------------+ +|Support for |Limited Support |Fully | +|Customization | |Supported | ++-----------------+-------------------+-------------------+ +|Quantization Mode|Post Training |Post Training | +|Support |Quantization: |Quantization: | +| |Static, Dynamic, |Static, Dynamic, | +| |Weight Only |Weight Only | +| | | | +| |Quantiztion Aware |Quantiztion Aware | +| |Training: |Training: | +| |Static |Static | ++-----------------+-------------------+-------------------+ +|Input/Output |``torch.nn.Module``|``torch.nn.Module``| +|Model Type | |(May need some | +| | |refactors to make | +| | |the model | +| | |compatible with FX | +| | |Graph Mode | +| | |Quantization) | ++-----------------+-------------------+-------------------+ + Eager Mode Quantization ^^^^^^^^^^^^^^^^^^^^^^^ @@ -357,6 +404,7 @@ Quantization types supported by FX Graph Mode can be classified in two ways: These two ways of classification are independent, so theoretically we can have 6 different types of quantization. The supported quantization types in FX Graph Mode Quantization are: + - Post Training Quantization - Weight Only Quantization @@ -424,8 +472,9 @@ API Example:: model_fused = quantize_fx.fuse_fx(model_to_quantize) Please see the following tutorials for more information about FX Graph Mode Quantization: -- FX Graph Mode Post Training Static Quantization (TODO: link) -- FX Graph Mode Post Training Dynamic Quantization (TODO: link) +- `User Guide on Using FX Graph Mode Quantization `_ +- `FX Graph Mode Post Training Static Quantization `_ +- `FX Graph Mode Post Training Dynamic Quantization `_ Quantized Tensors ---------------------------------------