Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Amphere/tf32 defaults for transformers #14450

Closed
stas00 opened this issue Nov 18, 2021 · 0 comments · Fixed by #14606
Closed

[RFC] Amphere/tf32 defaults for transformers #14450

stas00 opened this issue Nov 18, 2021 · 0 comments · Fixed by #14606
Assignees

Comments

@stas00
Copy link
Contributor

stas00 commented Nov 18, 2021

Background

It's possible to use the new TF32 format automatically when doing fp32 processing for a ~3x speed up, without doing any changes to the code, other than flipping the switch on. But the speed up may come at a cost of accuracy. You can see the differences between the formats in the following image:

tf32-Mantissa-chart-hi-res-FINAL-400x255

You can see that both TF32 and FP32 have the same dynamic range (the magnitude of numbers), but the former has a much lower precision, which depending on a situation may or may not impact the final outcome.

Emerging Need

As Amphere hardware is emerging and TF32 automatic enabling seems to be going in the direction of being disabled by default (probably starting from pt-1.11?) as discussed here: pytorch/pytorch#67384, we need to communicate to our users how to turn it on/off and what are the impacts on speed and accuracy might be.

Having it on could bring a ~3x speed improvement, and most likely according to the NVIDIA engineers the training quality shouldn't be impacted. But we don't have our first hand experiences yet to provide pragmatic recommendations.

Available Guides

The on/off machinery is explained here: https://pytorch.org/docs/master/notes/cuda.html#tf32-on-ampere

The other crucial document is: https://pytorch.org/docs/master/notes/numerical_accuracy.html

TF32 Educational blog post: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

Plan of action

So this issue is both an RFC and is also documenting the need to update: https://huggingface.co/transformers/performance.html

I trust that the commentary will emerge once you start experimenting with the new hardware.

@sgugger, @LysandreJik, @patil-suraj, @patrickvonplaten

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant