You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sorry to disturb you, can you guys answer my following question?
Why did the training pipeline of TinyBert "general_distill.py" not use DDP to initialize the student model, instead of only initializing the teacher model? And why there is no synchronization of the normalization layer?
And when opening the mixed-precision, where can I find the function "backward" from "optimizer"?
thx
The text was updated successfully, but these errors were encountered:
mexiQQ
changed the title
#TinyBert
#TinyBert Training Pipeline Problems
Nov 8, 2021
Hi,
This code does not support fp16 training and DDP training. So the relevant part is redundant.
Please refer to the AutoTinyBERT code which supports both fp16 and DDP training.
Hi Huawei team:
Sorry to disturb you, can you guys answer my following question?
Why did the training pipeline of TinyBert "general_distill.py" not use DDP to initialize the student model, instead of only initializing the teacher model? And why there is no synchronization of the normalization layer?
And when opening the mixed-precision, where can I find the function "backward" from "optimizer"?
thx
The text was updated successfully, but these errors were encountered: