Add QDQBert model and quantization examples of SQUAD task #14066

shangz-ai · 2021-10-19T17:15:45Z

What does this PR do?

This PR includes:

Add support of Q/DQ BERT model based on HF BERT model.
(src/transformers/models/qdqbert/)

QDQBERT model add fake quantization operations (pair of QuantizeLinear/DequantizeLinear ops) to:

linear layer inputs and weights
matmul inputs
residual add inputs

in BERT model.

QDQBERT model will be able to load from any checkpoint of HF BERT model, and perform Quantization Aware Training/Post Training Quantization with the support from PyTorch-Quantization toolkit.

Add an example of SQUAD tasks finetuned by the QDQBERT model and inferenced by TensorRT
(transformers/examples/research_projects/quantization-qdqbert/)

In the example, we use qdqbert model to do Quantization Aware Training from pretrained HF BERT model on SQUAD task. Then TensorRT can run the inference of the generated ONNX model for optimal INT8 performance out-of-the-box.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.

A related discussion on this topic Issue 10639

Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

shangz-ai · 2021-10-19T17:24:51Z

@LysandreJik @sgugger Thanks!

shangz-ai · 2021-10-21T22:49:25Z

Some CIs failed since QDQBERT model needs the dependency of Pytorch Quantization Toolkit (https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization). This dependency is good to go with simple one-line installation as:
pip install pytorch-quantization --extra-index-url https://pypi.ngc.nvidia.com

I'm thinking of either adding the one line installation change to CI or adding quantization toolkit installation to transformers installation (or any other suggestions which are smooth and neat for the HF community) if we want to upstream the model. @LysandreJik @sgugger

Thanks!

sgugger

Sorry for the long time without review, I had somehow missed the notification for this PR.

Thanks a lot for all your work! The new example you add is very cool! There is a little bit of work to polish the PR before we can merge it but this is already in great shape! I have left a few comments, the two main things are:

don't define a new tokenizer class since we can re-use the existing BERT tokenizers (I know some old models have subclasses but we don't do this anymore)
use "Copied from" statements in your modeling file to keep copies from the BERT modeling files up to date (you can see examples in the RoBERTa modeling file for instance)

One last point is the new dependency added, as you mentioned. Since one cannot import the model without it, you should add a new function that checks whether the necessary modules are installed (see for instance is_scatter_available used for the TAPAS model) and you should only conditionally import in the main init, like it is done for TAPAS. Running make fix-copies will then create the appropriate file with dummy classes so we can still import something called QDQBertModel in the init when the dependency is not there.

And for proper testing, and doc, the line with the pip install you mention should probably be added in the config.yml file that triggers circleCI.

Let us know if you need help with any of those steps!

docs/source/model_doc/qdqbert.rst

src/transformers/models/qdqbert/modeling_qdqbert.py

sgugger · 2021-11-02T00:23:23Z

src/transformers/models/qdqbert/tokenization_qdqbert.py

@@ -0,0 +1,54 @@
+# coding=utf-8


You don't need to define a new tokenizer if you just re-use the BERT one. In the tokenization auto module, just set a line in the auto mapping that maps "qdqbert" to ("BertTokenizer, "BertTokenizerFast") (there is already an example with ibert using the RoBERTa tokenizer for instance).

addressed in commit: 873d352

src/transformers/models/qdqbert/tokenization_qdqbert_fast.py

shangz-ai · 2021-11-05T21:48:02Z

As for the CI failure of check_code_quality, import pycuda.autoinit is needed, even if not used, so to initialize CUDA environment. Any suggestions to resolve this?

For the other two check failures, I'm not super sure about what is the root cause. Glad to get insights about how to fix that.

LysandreJik

This is in really good shape, thanks a lot for all the effort! Before merging, I'd like to understand if we can't make the code examples in the docstrings a bit clearer by leveraging the QuantDescriptor as you use in your integration test. Doing so would significantly reduce friction for users trying to use the model.

Thank you!

src/transformers/__init__.py

src/transformers/models/qdqbert/configuration_qdqbert.py

LysandreJik · 2021-11-06T13:48:53Z

src/transformers/models/qdqbert/modeling_qdqbert.py

+        return hidden_states
+
+
+class QDQBertAttention(nn.Module):


That sounds good, thank you @shangz-ai!

LysandreJik · 2021-11-06T13:51:04Z

tests/test_modeling_qdqbert.py

+    # Override
+    def test_feed_forward_chunking(self):
+        pass


Could you mention in the comment why it is overridden?

It should be due to the fact that when using feed_forward_chunking, the tensors change their shapes from chunking. Then quantizing those tesnors per channel/tensor will change the scaling factors during calibration. Thus one cannot obtain identical results before and after chunking.

As a result, I will also remove the chunk_size_feed_forward feature in qdqbert, like what iBert did. Does that make sense? addressed in f5188b7

Yes that makes sense, thank you!

LysandreJik · 2021-11-06T13:52:59Z

tests/test_modeling_qdqbert.py

+        import pytorch_quantization.nn as quant_nn
+        from pytorch_quantization.tensor_quant import QuantDescriptor
+
+        model = QDQBertForMaskedLM.from_pretrained("bert-base-uncased")
+        input_ids = torch.tensor([[0, 345, 232, 328, 740, 140, 1695, 69, 6078, 1588, 2]])
+
+        input_desc = QuantDescriptor(num_bits=8, calib_method="max")
+        weight_desc = QuantDescriptor(num_bits=8, axis=((0,)))
+        quant_nn.QuantLinear.set_default_quant_desc_input(input_desc)
+        quant_nn.QuantLinear.set_default_quant_desc_weight(weight_desc)
+
+        output = model(input_ids)[0]
+
+        expected_shape = torch.Size((1, 11, 768))
+        self.assertEqual(output.shape, expected_shape)
+
+        expected_slice = torch.tensor(
+            [[[-0.0483, 0.1188, -0.0313], [-0.0606, 0.1435, 0.0199], [-0.0235, 0.1519, 0.0175]]]
+        )
+        self.assertTrue(torch.allclose(output[:, :3, :3], expected_slice, atol=1e-4))


If this approach is the canonical approach to using the model, I would favor updating the model docstring examples to reflect that, alongside with inline comments explaining in a few words what is happening. I think this would go a long way to improve usability of the model. Thank you!

addressed in 69da5d6

LysandreJik · 2021-11-11T16:00:22Z

Thanks for working on this! It seems the code quality is not yet passing, could you run the quality scripts? You can do so with the following, from the root of your clone:

pip install -e ".[quality]"
make fixup

shangz-ai · 2021-11-11T18:33:31Z

Thanks for working on this! It seems the code quality is not yet passing, could you run the quality scripts? You can do so with the following, from the root of your clone:
pip install -e ".[quality]"
make fixup

Thanks for the comments! This is actually somewhere I want to check.

The code quality failure is from the TensorRT inference script import pycuda.autoinit. This line of code is needed, but not used, to initialize CUDA environment. Is there a way that I can keep this line of code in the script and pass the code quality test? @patrickvonplaten @LysandreJik @sgugger

src/transformers/models/qdqbert/modeling_qdqbert.py

patrickvonplaten

Very clean PR! Thanks for adding the model

patrickvonplaten · 2021-11-15T17:01:01Z

Think there is just one small clean-up left to do:

examples/research_projects/quantization-qdqbert/evaluate-hf-trt-qa.py:29:1: F401 'pycuda.autoinit' imported but unused

patrickvonplaten · 2021-11-18T11:01:05Z

Think there is another line to clean up :-) examples/research_projects/quantization-qdqbert/evaluate-hf-trt-qa.py:29:1: F401 'pycuda.autoinit' imported but unused

shangz-ai · 2021-11-18T17:36:07Z

Think there is another line to clean up :-) examples/research_projects/quantization-qdqbert/evaluate-hf-trt-qa.py:29:1: F401 'pycuda.autoinit' imported but unused

@patrickvonplaten Is there a workaround for it? The pycuda.autoinit is imported for cuda environment setup so it is needed in the script. Thanks!

sgugger · 2021-11-18T18:17:10Z

You can add a a comment at the end of the import line # noqa: F401 to have it be ignored by our styler. To check locally if the test will pass or not, just run make quality.

Note that with the merge of #14431, you will need to rebase your PR on master and replace the lines

        self.init_weights()

by

        # Initialize weights and apply final processing
        self.post_init()

Let us know if you need any help!

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

…at to quant

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

shangz-ai · 2021-11-18T23:13:55Z

Rebase the PR but not sure why there is the model templates runner CI failure now.

LysandreJik · 2021-11-19T12:33:16Z

src/transformers/__init__.py

@@ -571,8 +597,6 @@
    _import_structure["generation_utils"] = ["top_k_top_p_filtering"]
    _import_structure["modeling_outputs"] = []
    _import_structure["modeling_utils"] = ["Conv1D", "PreTrainedModel", "apply_chunking_to_forward", "prune_layer"]
-
-    # PyTorch models structure


The templates test is failing because this line was deleted!

sgugger · 2021-11-19T18:33:36Z

Thanks again for all your work on this!

…e#14066) * clean up branch for add-qdqbert-model * README update for QAT example; update docstrings in modeling_qdqbert.py * Update qdqbert.rst * Update README.md * Update README.md * calibration data using traning set; QAT example runs in fp32 * re-use BERTtokenizer for qdqbert * Update docs/source/model_doc/qdqbert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/qdqbert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update docs/source/model_doc/qdqbert.rst Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * remove qdqbert tokenizer * Update qdqbert.rst * update evaluate-hf-trt-qa.py * update configuration_qdqbert.py * update modeling_qdqbert.py: add copied statement; replace assert with ValueError * update copied from statement * add is_quantization_available; run make fix-copies * unittest add require_quantization * add backend dependency to qdqbert model * update README; update evaluate script; make style * lint * docs qdqbert update * circleci build_doc add pytorch-quantization for qdqbert * update README * update example readme with instructions to upgrade TensorRT to 8.2 * Update src/transformers/models/qdqbert/configuration_qdqbert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/qdqbert/configuration_qdqbert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/qdqbert/configuration_qdqbert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * Update src/transformers/models/qdqbert/configuration_qdqbert.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * change quantization to pytorch_quantization for backend requirement * feed_forward_chunking not supported in QDQBert * make style * update model docstrings and comments in testing scripts * rename example to quantization-qdqbert; rename example scripts from qat to quant * Update src/transformers/models/qdqbert/modeling_qdqbert.py Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com> * rm experimental functions in quant_trainer * qa cleanup * make fix-copies for docs index.rst * fix doctree; use post_init() for qdqbert * fix early device assignment for qdqbert * fix CI:Model templates runner Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Lysandre Debut <lysandre@huggingface.co> Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

shangz-ai mentioned this pull request Oct 19, 2021

Add QDQBert model and QAT example of SQUAD task #14057

Closed

5 tasks

shangz-ai marked this pull request as ready for review October 19, 2021 17:23

sgugger reviewed Nov 2, 2021

View reviewed changes

shangz-ai requested a review from sgugger November 5, 2021 21:48

LysandreJik reviewed Nov 6, 2021

View reviewed changes

LysandreJik requested a review from patrickvonplaten November 11, 2021 16:00

shangz-ai changed the title ~~Add QDQBert model and QAT example of SQUAD task~~ Add QDQBert model and quantization examples of SQUAD task Nov 12, 2021

patrickvonplaten reviewed Nov 15, 2021

View reviewed changes

src/transformers/models/qdqbert/modeling_qdqbert.py Outdated Show resolved Hide resolved

patrickvonplaten approved these changes Nov 15, 2021

View reviewed changes

shangz-ai and others added 12 commits November 18, 2021 11:59

clean up branch for add-qdqbert-model

79a1b9c

README update for QAT example; update docstrings in modeling_qdqbert.py

c799149

Update qdqbert.rst

9913ca9

Update README.md

5d57ae5

Update README.md

a571b88

calibration data using traning set; QAT example runs in fp32

d06ae32

re-use BERTtokenizer for qdqbert

16ee8ef

Update docs/source/model_doc/qdqbert.rst

365e16c

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update docs/source/model_doc/qdqbert.rst

d667d49

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Update docs/source/model_doc/qdqbert.rst

3661e28

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

remove qdqbert tokenizer

64d342d

Update qdqbert.rst

48925c4

shangz-ai and others added 19 commits November 18, 2021 12:02

update README; update evaluate script; make style

96be6e4

lint

947c174

docs qdqbert update

9d8c0a6

circleci build_doc add pytorch-quantization for qdqbert

db120f1

update README

61212c1

update example readme with instructions to upgrade TensorRT to 8.2

23d2673

Update src/transformers/models/qdqbert/configuration_qdqbert.py

134db4a

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Update src/transformers/models/qdqbert/configuration_qdqbert.py

04c1549

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Update src/transformers/models/qdqbert/configuration_qdqbert.py

967ac7f

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

Update src/transformers/models/qdqbert/configuration_qdqbert.py

c7d5423

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

change quantization to pytorch_quantization for backend requirement

d2a883f

feed_forward_chunking not supported in QDQBert

6194211

make style

e0982e6

update model docstrings and comments in testing scripts

590a77e

rename example to quantization-qdqbert; rename example scripts from q…

225e140

…at to quant

Update src/transformers/models/qdqbert/modeling_qdqbert.py

6590647

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

rm experimental functions in quant_trainer

f0be0f9

qa cleanup

3fdd6bd

make fix-copies for docs index.rst

43f3c55

shangz-ai force-pushed the add-qdqbert-model branch from 200dae7 to 43f3c55 Compare November 18, 2021 20:20

shangz-ai and others added 3 commits November 18, 2021 12:46

fix doctree; use post_init() for qdqbert

0ce0779

Merge branch 'huggingface:master' into add-qdqbert-model

cd0f325

fix early device assignment for qdqbert

4cddf3a

LysandreJik reviewed Nov 19, 2021

View reviewed changes

fix CI:Model templates runner

1f03523

sgugger merged commit a59e7c1 into huggingface:master Nov 19, 2021

shangz-ai deleted the add-qdqbert-model branch November 19, 2021 18:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add QDQBert model and quantization examples of SQUAD task #14066

Add QDQBert model and quantization examples of SQUAD task #14066

shangz-ai commented Oct 19, 2021 •

edited

Loading

shangz-ai commented Oct 19, 2021

shangz-ai commented Oct 21, 2021

sgugger left a comment

sgugger Nov 2, 2021

shangz-ai Nov 2, 2021

shangz-ai commented Nov 5, 2021 •

edited

Loading

LysandreJik left a comment

LysandreJik Nov 6, 2021

LysandreJik Nov 6, 2021

shangz-ai Nov 10, 2021 •

edited

Loading

LysandreJik Nov 11, 2021

LysandreJik Nov 6, 2021

shangz-ai Nov 11, 2021

LysandreJik commented Nov 11, 2021

shangz-ai commented Nov 11, 2021 •

edited

Loading

patrickvonplaten left a comment

patrickvonplaten commented Nov 15, 2021

patrickvonplaten commented Nov 18, 2021

shangz-ai commented Nov 18, 2021

sgugger commented Nov 18, 2021

shangz-ai commented Nov 18, 2021

LysandreJik Nov 19, 2021

sgugger commented Nov 19, 2021

Add QDQBert model and quantization examples of SQUAD task #14066

Add QDQBert model and quantization examples of SQUAD task #14066

Conversation

shangz-ai commented Oct 19, 2021 • edited Loading

What does this PR do?

Before submitting

Who can review?

shangz-ai commented Oct 19, 2021

shangz-ai commented Oct 21, 2021

sgugger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shangz-ai commented Nov 5, 2021 • edited Loading

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shangz-ai Nov 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LysandreJik commented Nov 11, 2021

shangz-ai commented Nov 11, 2021 • edited Loading

patrickvonplaten left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Nov 15, 2021

patrickvonplaten commented Nov 18, 2021

shangz-ai commented Nov 18, 2021

sgugger commented Nov 18, 2021

shangz-ai commented Nov 18, 2021

Choose a reason for hiding this comment

sgugger commented Nov 19, 2021

shangz-ai commented Oct 19, 2021 •

edited

Loading

shangz-ai commented Nov 5, 2021 •

edited

Loading

shangz-ai Nov 10, 2021 •

edited

Loading

shangz-ai commented Nov 11, 2021 •

edited

Loading