Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bfloat16 Training #41

Open
LeoXinhaoLee opened this issue Sep 2, 2023 · 1 comment
Open

bfloat16 Training #41

LeoXinhaoLee opened this issue Sep 2, 2023 · 1 comment
Labels
more info needed Issue/question is stuck on needing more info from OP

Comments

@LeoXinhaoLee
Copy link

Thank you for releasing code for these inspiring works!

I tried to use bfloat16 for model parameters, and manually converted images and labels from float32 to bfloat16 before feeding them for training, but noticed that training slowed down by about 3 times. Also, the performance becomes noticeably worse. I'm wondering if it is wrong to use bfloat16 in this way?

Thank you very much for your help.

@andsteing
Copy link
Collaborator

You mean that you have 3x slower step time? Or is it 3x slower to target accuracy? The first would be unexpected, but I wouldn't know why that is the case without examining the training with a profiler.

In general, you could not expect to have the same performance when going from float32 to bfloat16. With ViTs we found that the first Adam momentum can safely be kept in bfloat16 (example config), but the second moment and the model weights need to be kept in float32.

@lucasb-eyer lucasb-eyer added the more info needed Issue/question is stuck on needing more info from OP label Nov 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more info needed Issue/question is stuck on needing more info from OP
Projects
None yet
Development

No branches or pull requests

3 participants