You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was wondering how would you recommend choosing optimal hyperparams for large batch size ?
For example, if i train Electra Large model on v3-128 tpu, a batch size of 4096 is affordable. In this case, what learning rate and training steps would you suggest ? As for the data, I'm planning to train the model with my own dataset, which is of ~ 300GB of tfrecords
Do you have any rough ideas ?
Thank you
The text was updated successfully, but these errors were encountered:
First of all, thank you for sharing great work !
I was wondering how would you recommend choosing optimal hyperparams for large batch size ?
For example, if i train Electra Large model on v3-128 tpu, a batch size of 4096 is affordable. In this case, what
learning rate
andtraining steps
would you suggest ? As for the data, I'm planning to train the model with my own dataset, which is of ~ 300GB of tfrecordsDo you have any rough ideas ?
Thank you
The text was updated successfully, but these errors were encountered: