Could you publish pre-trained models? #4

thdusdl1219 · 2021-07-15T18:21:39Z

Hi. I was trying to train the models with given scripts and dataset but it took more time than I expected.
So, if you don't mind, could you share your pre-trained models? Probably, I do not have enough GPU to achieve reasonable training time.

pdlan · 2021-07-15T19:08:25Z

Sorry that I may not be able to publish the checkpoints as my internship at Microsoft has ended. If the training time is too long, you may reduce the max_len (both in preprocessing script and training script), or use a smaller model (by changing ENCODER_LAYERS and SMALLBERT_ENCODER_LAYERS)

pdlan · 2021-07-15T19:17:18Z

Also, you can use a smaller dataset by changing the line 6 of the file process-pretrain-data/process.sh (rate="1 1 1 1 1 1 1 0.3 1 0.2 1") to smaller numbers and adjust the batch size & training steps at the same time.

thdusdl1219 · 2021-07-15T19:27:36Z

Thanks for your advice. Could you share the memory size of used GPUs and the training time you took?

pdlan · 2021-07-15T19:49:37Z

With max_seq_len=255, max_sentences=2, update_freq=4 and 8 32GB V100s it took about a week. The memory used by each GPU should be slightly less than 16GB.

thdusdl1219 · 2021-07-15T20:33:19Z

Thanks for your comments! I was trying to increase max_sentences to improve performance but it leads OOM ... I'll check other options you mention to improve performance.

thdusdl1219 changed the title ~~Could you publish pre-trained model?~~ Could you publish pre-trained models? Jul 15, 2021

thdusdl1219 closed this as completed Jul 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you publish pre-trained models? #4

Could you publish pre-trained models? #4

thdusdl1219 commented Jul 15, 2021 •

edited

pdlan commented Jul 15, 2021

pdlan commented Jul 15, 2021

thdusdl1219 commented Jul 15, 2021

pdlan commented Jul 15, 2021

thdusdl1219 commented Jul 15, 2021

Could you publish pre-trained models? #4

Could you publish pre-trained models? #4

Comments

thdusdl1219 commented Jul 15, 2021 • edited

pdlan commented Jul 15, 2021

pdlan commented Jul 15, 2021

thdusdl1219 commented Jul 15, 2021

pdlan commented Jul 15, 2021

thdusdl1219 commented Jul 15, 2021

thdusdl1219 commented Jul 15, 2021 •

edited