-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QDQBert model and QAT example of SQUAD task #14057
Conversation
+ enable save_onnx for QA trainer
Hi, this PR includes both the support of QDQBert model and the QAT example of using QDQBert model for SQUAD task. |
…el' into add-nvidia-qdqbert-model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR. Note that it's hard to review because it include changes from other commits on master (bad rebase?) so it would be better if you could re-open a clean PR from your branch.
Concerning the examples:
- I don't think the QAT example should go in the examples maintained by the team, given it introduces a lot of new code no one on the team wrote and will be able to maintain properly. It should go in a research project.
- The classic QA example should not be touched by this PR. In general any new functionality should be added to all examples at the same time, which could be done in a separate PR. It's also my understanding that the ONNX conversion won't work for many of the models, but maybe I'm wrong on this.
Thanks for the comments! I'm opening up a new PR here: #14066 |
What does this PR do?
This PR includes:
(src/transformers/models/qdqbert/)
QDQBERT model add fake quantization operations (pair of QuantizeLinear/DequantizeLinear ops) to:
in BERT model.
QDQBERT model will be able to load from any checkpoint of HF BERT model, and perform Quantization Aware Training/Post Training Quantization with the support from PyTorch-Quantization toolkit.
(examples/pytorch/question-answering/QAT-qdqbert/)
In the example, we use qdqbert model to do Quantization Aware Training from pretrained HF BERT model on SQUAD task. Then TensorRT can run the inference of the generated ONNX model for optimal INT8 performance out-of-the-box.
Also added a module in (examples/pytorch/question-answering/run_qa.py, trainer_qa.py) for saving the SQUAD task specific BERT model as ONNX files, for a consistency check with QAT-qdqbert example.
Before submitting
Pull Request section?
to it if that's the case.
A related discussion on this topic Issue 10639
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.