bfloat16 Training #41

LeoXinhaoLee · 2023-09-02T23:43:14Z

Thank you for releasing code for these inspiring works!

I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them for training, but noticed that training slowed down by about 3 times. Also, the performance becomes noticeably worse. I'm wondering if it is wrong to use bfloat16 in this way?

Thank you very much for your help.

andsteing · 2023-09-05T07:21:20Z

You mean that you have 3x slower step time? Or is it 3x slower to target accuracy? The first would be unexpected, but I wouldn't know why that is the case without examining the training with a profiler.

In general, you could not expect to have the same performance when going from float32 to bfloat16. With ViTs we found that the first Adam momentum can safely be kept in bfloat16 (example config), but the second moment and the model weights need to be kept in float32.

lucasb-eyer added the more info needed Issue/question is stuck on needing more info from OP label Nov 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bfloat16 Training #41

bfloat16 Training #41

LeoXinhaoLee commented Sep 2, 2023

andsteing commented Sep 5, 2023

bfloat16 Training #41

bfloat16 Training #41

Comments

LeoXinhaoLee commented Sep 2, 2023

andsteing commented Sep 5, 2023