Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could you publish pre-trained models? #4

Closed
thdusdl1219 opened this issue Jul 15, 2021 · 5 comments
Closed

Could you publish pre-trained models? #4

thdusdl1219 opened this issue Jul 15, 2021 · 5 comments

Comments

@thdusdl1219
Copy link

thdusdl1219 commented Jul 15, 2021

Hi. I was trying to train the models with given scripts and dataset but it took more time than I expected.
So, if you don't mind, could you share your pre-trained models? Probably, I do not have enough GPU to achieve reasonable training time.

@thdusdl1219 thdusdl1219 changed the title Could you publish pre-trained model? Could you publish pre-trained models? Jul 15, 2021
@pdlan
Copy link
Owner

pdlan commented Jul 15, 2021

Sorry that I may not be able to publish the checkpoints as my internship at Microsoft has ended. If the training time is too long, you may reduce the max_len (both in preprocessing script and training script), or use a smaller model (by changing ENCODER_LAYERS and SMALLBERT_ENCODER_LAYERS)

@pdlan
Copy link
Owner

pdlan commented Jul 15, 2021

Also, you can use a smaller dataset by changing the line 6 of the file process-pretrain-data/process.sh (rate="1 1 1 1 1 1 1 0.3 1 0.2 1") to smaller numbers and adjust the batch size & training steps at the same time.

@thdusdl1219
Copy link
Author

Thanks for your advice. Could you share the memory size of used GPUs and the training time you took?

@pdlan
Copy link
Owner

pdlan commented Jul 15, 2021

With max_seq_len=255, max_sentences=2, update_freq=4 and 8 32GB V100s it took about a week. The memory used by each GPU should be slightly less than 16GB.

@thdusdl1219
Copy link
Author

Thanks for your comments! I was trying to increase max_sentences to improve performance but it leads OOM ... I'll check other options you mention to improve performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants