New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[XLM-R] by Facebook AI Research #1769
Comments
cc @aconneau 😬 |
is there any update in the XLM-R model? |
Let me know if you need some-help in porting the xlm-r models to HF. |
I think that's maybe not the correct way, but I adjusted the Then I used the So I guess it is not as simple as this first attempt 😅 Gist for the conversion script is here. The |
Hi @stefan-it, do you have any update for your attempt? |
The final models have been released today 😍 https://github.com/pytorch/fairseq/tree/master/examples/xlmr So I'm going to try the conversion with these models tomorrow/in the next days :) |
I think the model conversion is done correctly. But: the update: I could achieve 90.41% on CoNLL-2003 (English), paper reports 92.74 (using Flair). |
Btw I was using the XLM-R v0 checkpoints in a project I'm working on and the v0 checkpoints worked slightly better than the checkpoints added today. Is it possible to also add the older checkpoints? |
I think it's the best solution to offer both checkpoint versions! In my opinion, the ideal case is that, as like to other models in Transformers, you can select which version of XLM-R checkpoints to use, e.g.
|
Btw using XLM-R I encounter this issue: This is really annoying and makes it hard to use the model. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@ricardorei Did you happen to successfully use the XLM-R model ? I'm trying to see how this model can be used as pretrained step for NMT tasks, I tried raw version from XLM facebook repo and ran into multiple OOM issues. The best suggestion so far I got is to try smaller version of Fairseq xlmr (base) on p3dn.24xlarge instance or the Google TPU Pytorch way. Thanks ! |
I am using the base model which runs well in a 12GB GPU with batch size of 8. Depending on your implementation and task you can run even bigger batches (16, 24 for example). And I am also using the version directly from Fairseq, because you can load the v0 checkpoint. The variability in my prediction with different batch sizes I could never figure out. Probably some floating-point precision issues going on under the hood. It doesn't change overall performance but it is annoying... |
BTW, I am using the TF variant from https://huggingface.co/jplu/tf-xlm-roberta-base and https://huggingface.co/jplu/tf-xlm-roberta-large . I have successfully finetuned even the large model on a 16GB GPU and it was performing substantially better than the base model (on Czech Q&A). |
@ricardorei If you could share the prepare and train commands you are using. It would be easier than digging deep into every fairseq hyperparamter. Thanks ! |
@foxik Is TF variant more suitable for fine-tuning. Any particular preprocessing steps you carried out for fine-tuning. If you can share them, I can map the same for NMT task. Thanks ! |
@mohammedayub44 Yes I was using it for classification/regression. In your case, you need the encoder and decoder part which would take a lot more space. I would suggest that you share parameters between you encoder and decoder. I know that, with the right hyperparameter, you can achieve good results by sharing the parameters between your encoder and decoder -> A Simple and Effective Approach to Automatic Post-Editing In terms of hyperparameters that I am using, they are very simple. I freeze the encoder for 1 epoch while fine-tuning the classification head and then I fine-tune the entire model. My classification-head has a learning rate of 0.00003 while XLM-R has 0.00001. The optimizer is a standard Adam. This combination of gradual unfreezing with discriminative learning rates works well in my task. |
@ricardorei Thanks for sharing the paper. Some interesting results there. |
🌟New model addition
Model description
Yesterday, Facebook has released open source its new NLG model called XLM-R (XLM-RoBERTa) on arXiv. This model uses self-supervised training techniques to achieve state-of-the-art performance in cross-lingual understanding, a task in which a model is trained in one language and then used with other languages without additional training data. Our model improves upon previous multilingual approaches by incorporating more training data and languages — including so-called low-resource languages, which lack extensive labeled and unlabeled data sets.
Open Source status
Additional context
Facebook says these two sentences about this new model in their blog:
The text was updated successfully, but these errors were encountered: