#TinyBert Training Pipeline Problems #153

mexiQQ · 2021-11-08T17:32:28Z

Hi Huawei team:

Sorry to disturb you, can you guys answer my following question?

Why did the training pipeline of TinyBert "general_distill.py" not use DDP to initialize the student model, instead of only initializing the teacher model? And why there is no synchronization of the normalization layer?

And when opening the mixed-precision, where can I find the function "backward" from "optimizer"？

thx

zwjyyc · 2021-11-09T12:02:10Z

Hi,
This code does not support fp16 training and DDP training. So the relevant part is redundant.
Please refer to the AutoTinyBERT code which supports both fp16 and DDP training.

mexiQQ · 2021-11-10T01:15:50Z

Thanks for your reply.

mexiQQ changed the title ~~#TinyBert~~ #TinyBert Training Pipeline Problems Nov 8, 2021

mexiQQ closed this as completed Nov 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#TinyBert Training Pipeline Problems #153

#TinyBert Training Pipeline Problems #153

mexiQQ commented Nov 8, 2021 •

edited

zwjyyc commented Nov 9, 2021 •

edited

mexiQQ commented Nov 10, 2021

#TinyBert Training Pipeline Problems #153

#TinyBert Training Pipeline Problems #153

Comments

mexiQQ commented Nov 8, 2021 • edited

zwjyyc commented Nov 9, 2021 • edited

mexiQQ commented Nov 10, 2021

mexiQQ commented Nov 8, 2021 •

edited

zwjyyc commented Nov 9, 2021 •

edited