-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inquiry about Mosaic-BERT and BERT-Base Sequence Lengths #407
Comments
Apologies if I haven't totally understood your question. From the blogpost: To fully pretrain a model with 512 sequence length, you'll just need to follow our guide, but change the Because of alibi, you can also start with a model trained with sequence length 128, and change |
Thank you! |
Hi, I have one follow-up question: What do I have to consider regarding "global_train_batch_size" and "device_train_microbatch_size" if I want to train with sequence length of 512 instead of 128 tokens? If I leave everything as in the yamls/main/hf-bert-base-uncased.yaml file I probably get memory problems. Do you have any tips in this regard? Or even better: Do you have a yml for this case? I train on a Nvidia 8x80 GB A100. Try and Error goes with me unfortunately badly, because I always have to wait quite long until I am on the GPU. Therefore the demand. Thanks a lot! |
|
Perfect, thank you for your quick response! |
I ran into another issue, sorry... As mosaic-bert is not finetunable, I use the hf-bert. I follow the approach of the original BERT paper: Train 90% of the steps with a sequence length of 128 and 10% of the steps with a sequence length of 10%. To accomplish this with your code, i run the "main" scirpt for pretraining twice. The first run completes without any issue. However, in the second run, when I load the previous checkpoint with "load_path" and change sequence length to 512, I get the following error: ValueError: Reused local directory: ['/mnt/data/train'] vs ['/ mnt/data/train']. Provide a different one. The data is stored locally. Do you have any idea why this error occurs? Thank you very much! |
HI @FinTexIFB , what is your |
Hi @karan6181, thank you for your response. Yes, setting However, I found a workaround by simply creating a new container with the same mosaic docker image and installing all dependencies. Now it works, but only once. When I try to continue pre-training with an existing checkpoint afterwards I'll get the error. Maybe that is a bug |
@FinTexIFB, mosaic-bert is finetunable, as can be seen in this yaml. Does this work for your use case? |
I have been exploring the Mosaic-BERT model and I noticed that it is trained on a sequence length of 128. It's my understanding that this length can be easily extrapolated during inference time due to Attention with Linear Biases. However, in one of your blog posts, you compared the Mosaic-BERT model with the Hugging Face BERT base model, and I'm unclear about the sequence length used for training the BERT-Base model.
Specifically, I would like to know if the BERT-Base model, which is used as a benchmark for the mosaic-bert model for example in the appended figure, is trained with a sequence length of 128 or 512? If it is trained with a sequence length of 128, I would like to inquire about the necessary steps to obtain a Mosaic-BERT model that matches the performance of the BERT-Base model with a sequence length of 512.
Thank you for your attention to this matter. I look forward to your response and clarification.
The text was updated successfully, but these errors were encountered: