-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Performance of INT8] Feature requested #31
Comments
You can find the performance comparison at subsection "Performance on INT8 without quantizing residual connection" in https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#encoder-performance-on-t4-and-tensorflow |
This subsection shows the time and speedup, but it doesn't show the exact match / F1 score for INT8 like https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#performance-on-application-codes-of-tensorflow. Could you please tell me where to find performance on application code for INT8? |
Thanks for your timely reply :) |
The quantization in DeepLearningExamples/FasterTransformer/v3.0/bert-tf-quantization is fake quantizaton which uses FP32 to calculate the quantized values. However, the speedup is tested using INT8 which is 8 bits. Are they the same? Or is there something I misunderstand? Looking forward to your reply. |
|
Thanks for your reply. |
It's exciting to see the open source of Fast Transformer v3.0. However, I don't find the performance of INT8 on applications codes in the README.md, while FP32 and FP16 are both analyzed. Where can I find these results?
The text was updated successfully, but these errors were encountered: