-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pretraining BART language model #18030
Comments
I don't think we have plans currently to add code for BART pretraining as it's quite a time-costly endeavor . Maybe we should revisit this decision though at some point for some models (BART, T5, Wav2Vec2) as the community is asking more and more. Wdyt @sgugger @LysandreJik ? |
It's not something anyone on the team has the time to focus on right now, but we can welcome a contribution for such an example. |
I have attempted to reproduce BART pretraining for Swedish using fairseq here with some instructions in the README. We trained a custom tokenizer with Huggingface We were unfamiliar with fairseq prior to doing this and based the entire reproduction off of reading the paper closely and trying to match the reported experimental settings in the paper against fairseq args by reading the documentation and the source code. The finished model was uploaded to Huggingface. However, we learned quite a few lessons along the way that aren't documented there. The first one being the easiest process of using a custom Huggingface tokenizer with fairseq that I described above. A second important one was that you should generate several epochs worth of sharded and shuffled data and not just a shards for one single epoch. Fairseq pretraining will crash once you reach the end of your listed shards, and won't allow you to reuse or relist the same shard filename. I guess this sort of stuff maybe is obvious if you have the chance to ask your Facebook engineer colleagues questions over a lunch, but to us it wasn't obvious what best practices were, since it's not described in the docs. Anyway. Maybe the pretraining task along with the training args I researched in trying to replicate it may be of interest to you: https://github.com/kb-labb/kb_bart/blob/main/train_bart_args.sh . I cannot guarantee that it is identical to actual BART pretraining settings, but it was my best attempt based on reading the paper and source code. Edit: Seems BART author replied in the issue thread with the actual args they used in pretraining . |
@sajastu
I'm not the author of this repo but @patrickvonplaten is one of them. That's it! |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
We've released nanoT5 that reproduces T5-model (similar to BART) pre-training. You can take a look! Any suggestions are more than welcome. |
Feature request
Hi,
I'm looking into BART docs. Seems that the provided examples are on fine-tuning BART on Seq2Seq summarization tasks. I'm wondering if there is any example on pertaining BART's "Language Model" itself, with the pre-training objectives (Token infilling, Token Masking, etc.) that are mentioned in the original paper. I was looking into this a couple of months ago and found this thread: #4151 and a relevant issue in fairseq: facebookresearch/fairseq#1899. Now decided to ask it directly here to see if there has been any update so far.
Thanks, @patrickvonplaten @patil-suraj,
Motivation
Making BART for further pre-training (on Language Model).
The text was updated successfully, but these errors were encountered: