Quantization refactor #663

anmarques · 2022-03-31T14:08:41Z

Changes in quantization:

Created flag to remove Q/DQ from output activations for conv layers. Set to True by default.
Changed default to True for flag to remove Q/DQ from output activations for linear layers.
Created flag to change number of bits for weight activation (similar to the already existing flag for activation bits).
Created flag to avoid quantization of BatchNorm layers when fusing is not used. Set to True by default.
Created flag to enable quantization compatible with tensorrt. Set to False by default.
- Sets activation quantization to symmetric and UINT8.
- Sets default fusing function to 'no_fuse'.
Created a wrapper module (BNWrapper) that allows one to freeze BatchNorm statistics when BN layers are not fused. PyTorch default only works when BN layers are fused.
Crated a wrapper module (_AddReLU) for the FloatFunctional class that performs the Add + ReLU operations in ResNet. This wrapper enables the module to be wrapped by a QATWrapper when quantized, and is set to quantize only the first input.

… Removed fusing of BN and ReLU by default.

… range to get_qat_config_config where it has full information about data type.

… Removed fusing of BN and ReLU by default.

… range to get_qat_config_config where it has full information about data type.

bfineran · 2022-04-08T18:15:38Z

merged in fixes for testing and additional tests from #685

contributor

…ml into quantization-refactor

spacemanidol

look good

* Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantization. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantization. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed testing files * Style and quality fixes. * Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantization. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Removed output quantization from conv layers * Added _Add_ReLU module that enables QATWrapper for quantizaiton. * Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default. * Minor fixes. Style and quality fixes. * Added support to freezing bn stats. * Added mode argument to wrapping of train function in BNWrapper * Set BN fusing back as default. * Set BN fusing back as default. * Fixed custom freeze_bn_stats. * Temporary files for evaluating changes to graphs. * Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type. * Added support to TensorRT quantization * Included check to account for when weight_qconfig_kwatgs is None. * Modified argument names for backwards compatibility. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Updated documentation to reflect changes. * Fixed default weights data type. * Style and quality fixes. * Removed unused method * Removed testing files * Style and quality fixes. * Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case. * Changed default number of activation and weight bits from None to 8. * Revert "Changed default number of activation and weight bits from None to 8." This reverts commit 95e966ed929fa3512331a73667d5ba2ac3d594b1. * Revert "Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case." This reverts commit a675813. * Lumped qconfig properties into a dataclass. * Lumped qconfig properties into a dataclass. * Lumped qconfig properties into a dataclass. * Resetting conv and linear activation flags to True. * Renamed class BNWrapper as _BNWrapper. * Added logging messages for when tensorrt forces overriding of configs. * Style and quality fixes. * ConvInteger quantization conversion for quant refactor (#644) * ConvInteger quantization conversion for quant refactor * [quantization-refactor] mark/propagate conv export mode (#672) * batch norm fold with existing bias param bug fix * Quantization Refactor Tests (#685) * rebase import fix * update manager serialization test cases for new quantization params Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com> Co-authored-by: spacemanidol <dcampos3@illinois.edu> Co-authored-by: Benjamin <ben@neuralmagic.com>

anmarques added 30 commits March 9, 2022 12:14

Removed output quantization from conv layers

4d386e0

Added _Add_ReLU module that enables QATWrapper for quantizaiton.

d875f97

Removed quantization of output for linear and conv layers by default.…

831e474

… Removed fusing of BN and ReLU by default.

Minor fixes. Style and quality fixes.

479d3c9

Added support to freezing bn stats.

30be2e4

Added mode argument to wrapping of train function in BNWrapper

89b025f

Set BN fusing back as default.

5865ca6

Set BN fusing back as default.

91cd835

Fixed custom freeze_bn_stats.

9819a15

Temporary files for evaluating changes to graphs.

749ca72

Added support to tensorrt flag. Moved the computation of quantization…

601cdea

… range to get_qat_config_config where it has full information about data type.

Added support to TensorRT quantization

f7aecf1

Included check to account for when weight_qconfig_kwatgs is None.

d994b45

Removed output quantization from conv layers

3099efb

Added _Add_ReLU module that enables QATWrapper for quantizaiton.

36bc220

Removed quantization of output for linear and conv layers by default.…

722f866

… Removed fusing of BN and ReLU by default.

Minor fixes. Style and quality fixes.

b553482

Added support to freezing bn stats.

aef2347

Added mode argument to wrapping of train function in BNWrapper

df88638

Set BN fusing back as default.

d0b6354

Set BN fusing back as default.

fbf2206

Fixed custom freeze_bn_stats.

2312af8

Temporary files for evaluating changes to graphs.

f3bb04b

Added support to tensorrt flag. Moved the computation of quantization…

5bca8d7

… range to get_qat_config_config where it has full information about data type.

Added support to TensorRT quantization

55b0a40

Included check to account for when weight_qconfig_kwatgs is None.

5730071

Rebasing changes

7ceff2c

Modified argument names for backwards compatibility.

4938803

Updated documentation to reflect changes.

70f5704

Updated documentation to reflect changes.

dc773f0

Quantization Refactor Tests (#685)

194fb16

Merge branch 'main' into quantization-refactor

5b605f5

markurtz previously approved these changes Apr 8, 2022

View reviewed changes

bfineran requested review from a team, KSGulin and rahul-tuli and removed request for a team April 8, 2022 18:38

Merge branch 'main' into quantization-refactor

de70118

spacemanidol previously approved these changes Apr 8, 2022

View reviewed changes

anmarques and others added 3 commits April 8, 2022 14:56

Merge branch 'main' into quantization-refactor

4f7d425

Merge branch 'main' into quantization-refactor

e6b099e

rebase import fix

5f74e31

bfineran dismissed stale reviews from spacemanidol and markurtz via 5f74e31 April 8, 2022 19:12

markurtz previously approved these changes Apr 8, 2022

View reviewed changes

Merge branch 'main' into quantization-refactor

2ea8270

rahul-tuli previously approved these changes Apr 8, 2022

View reviewed changes

bfineran and others added 3 commits April 8, 2022 15:21

Merge branch 'main' into quantization-refactor

c863017

update manager serialization test cases for new quantization params

5796f4f

Merge branch 'quantization-refactor' of github.com:neuralmagic/sparse…

91243ff

…ml into quantization-refactor

bfineran dismissed stale reviews from rahul-tuli and markurtz via 91243ff April 8, 2022 19:31

mgoin approved these changes Apr 8, 2022

View reviewed changes

markurtz approved these changes Apr 8, 2022

View reviewed changes

anmarques merged commit 1db70be into main Apr 8, 2022

anmarques deleted the quantization-refactor branch April 8, 2022 19:44

spacemanidol reviewed Apr 8, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization refactor #663

Quantization refactor #663

Uh oh!

anmarques commented Mar 31, 2022

Uh oh!

bfineran commented Apr 8, 2022

Uh oh!

spacemanidol left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Quantization refactor #663

Quantization refactor #663

Uh oh!

Conversation

anmarques commented Mar 31, 2022

Uh oh!

bfineran commented Apr 8, 2022

Uh oh!

spacemanidol left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants