Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_train_steps for further pretraining #44

Closed
DayuanJiang opened this issue Apr 18, 2020 · 1 comment
Closed

num_train_steps for further pretraining #44

DayuanJiang opened this issue Apr 18, 2020 · 1 comment

Comments

@DayuanJiang
Copy link

Hello, I am trying to further pretrain the base model and large model use domain-specific corpus. But I see in the document, it says that when continuing pre-training from the released small ELECTRA checkpoints, we should:

Setting num_train_steps by (for example) adding "num_train_steps": 4010000 to the --hparams. This will continue training the small model for 10000 more steps (it has already been trained for 4e6 steps).

But Table 6 of the paper shows that small ELECTRA model is trained for 1M steps. Which one should we set?

If 4e6 is correct, how many steps has the base model or large model been trained?

@pnhuy
Copy link

pnhuy commented Apr 24, 2020

For the ELECTRA Small, the trained steps could be 4e6.

Because when I tested with num_train_steps =< 4e6, the model was not trained (because it was already trained with that number of steps). And it started to be trained with num_trained_steps >= 4000001.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants