New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Could you publish pre-trained models? #4
Comments
Sorry that I may not be able to publish the checkpoints as my internship at Microsoft has ended. If the training time is too long, you may reduce the max_len (both in preprocessing script and training script), or use a smaller model (by changing ENCODER_LAYERS and SMALLBERT_ENCODER_LAYERS) |
Also, you can use a smaller dataset by changing the line 6 of the file |
Thanks for your advice. Could you share the memory size of used GPUs and the training time you took? |
With max_seq_len=255, max_sentences=2, update_freq=4 and 8 32GB V100s it took about a week. The memory used by each GPU should be slightly less than 16GB. |
Thanks for your comments! I was trying to increase max_sentences to improve performance but it leads OOM ... I'll check other options you mention to improve performance. |
Hi. I was trying to train the models with given scripts and dataset but it took more time than I expected.
So, if you don't mind, could you share your pre-trained models? Probably, I do not have enough GPU to achieve reasonable training time.
The text was updated successfully, but these errors were encountered: