You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for releasing code for these inspiring works!
I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them for training, but noticed that training slowed down by about 3 times. Also, the performance becomes noticeably worse. I'm wondering if it is wrong to use bfloat16 in this way?
Thank you very much for your help.
The text was updated successfully, but these errors were encountered:
You mean that you have 3x slower step time? Or is it 3x slower to target accuracy? The first would be unexpected, but I wouldn't know why that is the case without examining the training with a profiler.
In general, you could not expect to have the same performance when going from float32 to bfloat16. With ViTs we found that the first Adam momentum can safely be kept in bfloat16 (example config), but the second moment and the model weights need to be kept in float32.
Thank you for releasing code for these inspiring works!
I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them for training, but noticed that training slowed down by about 3 times. Also, the performance becomes noticeably worse. I'm wondering if it is wrong to use bfloat16 in this way?
Thank you very much for your help.
The text was updated successfully, but these errors were encountered: