Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance of INT8] Feature requested #31

Closed
Rivendile opened this issue Sep 28, 2020 · 7 comments
Closed

[Performance of INT8] Feature requested #31

Rivendile opened this issue Sep 28, 2020 · 7 comments

Comments

@Rivendile
Copy link

It's exciting to see the open source of Fast Transformer v3.0. However, I don't find the performance of INT8 on applications codes in the README.md, while FP32 and FP16 are both analyzed. Where can I find these results?

@byshiue
Copy link
Collaborator

byshiue commented Sep 28, 2020

You can find the performance comparison at subsection "Performance on INT8 without quantizing residual connection" in https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#encoder-performance-on-t4-and-tensorflow

@byshiue byshiue closed this as completed Sep 28, 2020
@Rivendile
Copy link
Author

Rivendile commented Sep 28, 2020

This subsection shows the time and speedup, but it doesn't show the exact match / F1 score for INT8 like https://github.com/NVIDIA/DeepLearningExamples/tree/master/FasterTransformer/v3.0#performance-on-application-codes-of-tensorflow. Could you please tell me where to find performance on application code for INT8?

@Rivendile
Copy link
Author

Thanks for your timely reply :)

@Rivendile
Copy link
Author

The quantization in DeepLearningExamples/FasterTransformer/v3.0/bert-tf-quantization is fake quantizaton which uses FP32 to calculate the quantized values. However, the speedup is tested using INT8 which is 8 bits. Are they the same? Or is there something I misunderstand? Looking forward to your reply.

@hxbai
Copy link

hxbai commented Sep 28, 2020

bert-tf-quantization is only for training. You should train a checkpoint and import it with FT tensorflow op. FasterTransformer op does inference in INT8 precision. The whole workflow is in Evaluate the accuracy of FasterTransformer under INT8 part of README.

@Rivendile
Copy link
Author

Thanks for your reply.
Besides, I would appreciate it if the mechanism and optimizations taken for INT8 will be made more clearly in the README.

@byshiue byshiue transferred this issue from NVIDIA/DeepLearningExamples Apr 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants