Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

@bfineran
Copy link
Contributor

@bfineran bfineran commented Mar 25, 2022

To land after the quantization-refactor branch lands (Will remove DRAFT tag after). This PR adds support for changing the main pathway of converting quantizable convs to ConvInteger. The advantage of this update is that activation ranges of Convs no longer need to be observed during training under the ConvInteger spec which will help reduce quantization loss while not sacrificing inference performance.

Additionally, a check was added to not remove QDQ blocks immediately preceding Add nodes. This is to give the deepsparse engine the option of representing parts of a quantizable Add operation as INT8.

Expected quantizable conv block prior conversion:
Screen Shot 2022-03-25 at 4 23 33 PM

Result after conversion:
Screen Shot 2022-03-25 at 4 23 50 PM

Residual add with 1 QDQ input:
Screen Shot 2022-03-25 at 4 24 10 PM

Tested against sample RN50 models from research team against ONNX checker and tested accuracy against imagenet subset

@bfineran bfineran requested review from a team and anmarques March 25, 2022 20:25
@bfineran bfineran self-assigned this Mar 25, 2022
@bfineran bfineran requested review from KSGulin and natuan and removed request for a team March 25, 2022 20:25
@bfineran bfineran force-pushed the quant-refactor-conversion branch from 8b1b8b9 to e36e707 Compare March 25, 2022 20:27
@bfineran bfineran marked this pull request as draft March 25, 2022 21:57
@bfineran bfineran force-pushed the quant-refactor-conversion branch from e36e707 to 32b5c19 Compare April 5, 2022 02:33
@bfineran bfineran changed the base branch from main to quantization-refactor April 5, 2022 02:33
@bfineran bfineran force-pushed the quant-refactor-conversion branch from 32b5c19 to c7b8d0f Compare April 5, 2022 02:36
@bfineran bfineran force-pushed the quant-refactor-conversion branch from c7b8d0f to 47ea29e Compare April 5, 2022 02:40
@bfineran bfineran marked this pull request as ready for review April 5, 2022 02:40
@dbogunowicz dbogunowicz added the 0.12 release Pull request pending for 0.12 release. label Apr 8, 2022
@bfineran bfineran merged commit 1a57909 into quantization-refactor Apr 8, 2022
@bfineran bfineran deleted the quant-refactor-conversion branch April 8, 2022 16:51
@bfineran bfineran mentioned this pull request Apr 8, 2022
anmarques added a commit that referenced this pull request Apr 8, 2022
* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed testing files

* Style and quality fixes.

* Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed testing files

* Style and quality fixes.

* Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case.

* Changed default number of activation and weight bits from None to 8.

* Revert "Changed default number of activation and weight bits from None to 8."

This reverts commit 95e966ed929fa3512331a73667d5ba2ac3d594b1.

* Revert "Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case."

This reverts commit a675813.

* Lumped qconfig properties into a dataclass.

* Lumped qconfig properties into a dataclass.

* Lumped qconfig properties into a dataclass.

* Resetting conv and linear activation flags to True.

* Renamed class BNWrapper as _BNWrapper.

* Added logging messages for when tensorrt forces overriding of configs.

* Style and quality fixes.

* ConvInteger quantization conversion for quant refactor (#644)

* ConvInteger quantization conversion for quant refactor

* [quantization-refactor] mark/propagate conv export mode (#672)

* batch norm fold with existing bias param bug fix

* Quantization Refactor Tests (#685)

* rebase import fix

* update manager serialization test cases for new quantization params

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: spacemanidol <dcampos3@illinois.edu>
Co-authored-by: Benjamin <ben@neuralmagic.com>
dbogunowicz pushed a commit that referenced this pull request Apr 11, 2022
* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed testing files

* Style and quality fixes.

* Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantization.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Removed output quantization from conv layers

* Added _Add_ReLU module that enables QATWrapper for quantizaiton.

* Removed quantization of output for linear and conv layers by default. Removed fusing of BN and ReLU by default.

* Minor fixes. Style and quality fixes.

* Added support to freezing bn stats.

* Added mode argument to wrapping of train function in BNWrapper

* Set BN fusing back as default.

* Set BN fusing back as default.

* Fixed custom freeze_bn_stats.

* Temporary files for evaluating changes to graphs.

* Added support to tensorrt flag. Moved the computation of quantization range to get_qat_config_config where it has full information about data type.

* Added support to TensorRT quantization

* Included check to account for when weight_qconfig_kwatgs is None.

* Modified argument names for backwards compatibility.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Updated documentation to reflect changes.

* Fixed default weights data type.

* Style and quality fixes.

* Removed unused method

* Removed testing files

* Style and quality fixes.

* Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case.

* Changed default number of activation and weight bits from None to 8.

* Revert "Changed default number of activation and weight bits from None to 8."

This reverts commit 95e966ed929fa3512331a73667d5ba2ac3d594b1.

* Revert "Changed call to get_qat_qconfig to not specify symmetry and data type arguments for default case."

This reverts commit a675813.

* Lumped qconfig properties into a dataclass.

* Lumped qconfig properties into a dataclass.

* Lumped qconfig properties into a dataclass.

* Resetting conv and linear activation flags to True.

* Renamed class BNWrapper as _BNWrapper.

* Added logging messages for when tensorrt forces overriding of configs.

* Style and quality fixes.

* ConvInteger quantization conversion for quant refactor (#644)

* ConvInteger quantization conversion for quant refactor

* [quantization-refactor] mark/propagate conv export mode (#672)

* batch norm fold with existing bias param bug fix

* Quantization Refactor Tests (#685)

* rebase import fix

* update manager serialization test cases for new quantization params

Co-authored-by: Benjamin Fineran <bfineran@users.noreply.github.com>
Co-authored-by: spacemanidol <dcampos3@illinois.edu>
Co-authored-by: Benjamin <ben@neuralmagic.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

0.12 release Pull request pending for 0.12 release.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants