Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved performance of decoders #354
Improved performance of decoders #354
Changes from all commits
2268315
98a2e42
f44ec06
0e51edd
8551fe5
f74426c
1c3acd0
e2a1ecf
9e03798
9ee8a54
44ab81e
a13cc3d
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to the PR, but what do you think about uniformizing how quantization is applied on causal langage models depending on whether the user gives a
torch.nn.Module
or aOVBaseDecoderModel
(the number of generation steps is currently not the same). We could also instantiate anOVModel
in thefrom_pretrained
method when the given model is aPreTrainedModel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is hard to accomplish this with the current NNCF PTQ API implementation we have for PyTorch. I think we should deprecate PTQ for PyTorch at some point because it also introduces ambiguity for the user about what workflow to use for quantization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this modification added to reduce accuracy degradation resulting from quantization? If yes, what did you observe when varying this parameter ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the process, and will update a bit later.