You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's possible to use the new TF32 format automatically when doing fp32 processing for a ~3x speed up, without doing any changes to the code, other than flipping the switch on. But the speed up may come at a cost of accuracy. You can see the differences between the formats in the following image:
You can see that both TF32 and FP32 have the same dynamic range (the magnitude of numbers), but the former has a much lower precision, which depending on a situation may or may not impact the final outcome.
Emerging Need
As Amphere hardware is emerging and TF32 automatic enabling seems to be going in the direction of being disabled by default (probably starting from pt-1.11?) as discussed here: pytorch/pytorch#67384, we need to communicate to our users how to turn it on/off and what are the impacts on speed and accuracy might be.
Having it on could bring a ~3x speed improvement, and most likely according to the NVIDIA engineers the training quality shouldn't be impacted. But we don't have our first hand experiences yet to provide pragmatic recommendations.
Background
It's possible to use the new TF32 format automatically when doing fp32 processing for a ~3x speed up, without doing any changes to the code, other than flipping the switch on. But the speed up may come at a cost of accuracy. You can see the differences between the formats in the following image:
You can see that both TF32 and FP32 have the same dynamic range (the magnitude of numbers), but the former has a much lower precision, which depending on a situation may or may not impact the final outcome.
Emerging Need
As Amphere hardware is emerging and TF32 automatic enabling seems to be going in the direction of being disabled by default (probably starting from pt-1.11?) as discussed here: pytorch/pytorch#67384, we need to communicate to our users how to turn it on/off and what are the impacts on speed and accuracy might be.
Having it on could bring a ~3x speed improvement, and most likely according to the NVIDIA engineers the training quality shouldn't be impacted. But we don't have our first hand experiences yet to provide pragmatic recommendations.
Available Guides
The on/off machinery is explained here: https://pytorch.org/docs/master/notes/cuda.html#tf32-on-ampere
The other crucial document is: https://pytorch.org/docs/master/notes/numerical_accuracy.html
TF32 Educational blog post: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/
Plan of action
So this issue is both an RFC and is also documenting the need to update: https://huggingface.co/transformers/performance.html
I trust that the commentary will emerge once you start experimenting with the new hardware.
@sgugger, @LysandreJik, @patil-suraj, @patrickvonplaten
The text was updated successfully, but these errors were encountered: