[RFC] Amphere/tf32 defaults for transformers #14450

stas00 · 2021-11-18T20:29:10Z

Background

It's possible to use the new TF32 format automatically when doing fp32 processing for a ~3x speed up, without doing any changes to the code, other than flipping the switch on. But the speed up may come at a cost of accuracy. You can see the differences between the formats in the following image:

You can see that both TF32 and FP32 have the same dynamic range (the magnitude of numbers), but the former has a much lower precision, which depending on a situation may or may not impact the final outcome.

Emerging Need

As Amphere hardware is emerging and TF32 automatic enabling seems to be going in the direction of being disabled by default (probably starting from pt-1.11?) as discussed here: pytorch/pytorch#67384, we need to communicate to our users how to turn it on/off and what are the impacts on speed and accuracy might be.

Having it on could bring a ~3x speed improvement, and most likely according to the NVIDIA engineers the training quality shouldn't be impacted. But we don't have our first hand experiences yet to provide pragmatic recommendations.

Available Guides

The on/off machinery is explained here: https://pytorch.org/docs/master/notes/cuda.html#tf32-on-ampere

The other crucial document is: https://pytorch.org/docs/master/notes/numerical_accuracy.html

TF32 Educational blog post: https://blogs.nvidia.com/blog/2020/05/14/tensorfloat-32-precision-format/

Plan of action

So this issue is both an RFC and is also documenting the need to update: https://huggingface.co/transformers/performance.html

I trust that the commentary will emerge once you start experimenting with the new hardware.

@sgugger, @LysandreJik, @patil-suraj, @patrickvonplaten

stas00 self-assigned this Nov 18, 2021

stas00 added the Performance label Nov 18, 2021

stas00 mentioned this issue Nov 18, 2021

RFC: Should matmuls use tf32 by default? pytorch/pytorch#67384

Closed

stas00 mentioned this issue Dec 3, 2021

[trainer] add tf32-mode control #14606

Merged

stas00 closed this as completed in #14606 Dec 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Amphere/tf32 defaults for transformers #14450

[RFC] Amphere/tf32 defaults for transformers #14450

stas00 commented Nov 18, 2021 •

edited

Loading

[RFC] Amphere/tf32 defaults for transformers #14450

[RFC] Amphere/tf32 defaults for transformers #14450

Comments

stas00 commented Nov 18, 2021 • edited Loading

Background

Emerging Need

Available Guides

Plan of action

stas00 commented Nov 18, 2021 •

edited

Loading