[XLM-R] by Facebook AI Research #1769

TheEdoardo93 · 2019-11-08T08:07:35Z

🌟New model addition

Model description

Yesterday, Facebook has released open source its new NLG model called XLM-R (XLM-RoBERTa) on arXiv. This model uses self-supervised training techniques to achieve state-of-the-art performance in cross-lingual understanding, a task in which a model is trained in one language and then used with other languages without additional training data. Our model improves upon previous multilingual approaches by incorporating more training data and languages — including so-called low-resource languages, which lack extensive labeled and unlabeled data sets.

Open Source status

the model implementation is available: here under the XLMRModel Python class (row 198)
the model weights are available: Yes, here more details
who are the authors: FacebookAI Research (Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov)

Additional context

Facebook says these two sentences about this new model in their blog:

XLM-R represents an important step toward our vision of providing the best possible experience on our platforms for everyone, regardless of what language they speak

We hope to improve the performance of multilingual models created by the research community, particularly systems that use self-supervised training methods to better understand low-resource languages.

XLM-R has been trained on 2.5T of data across 100 languages data filtered from Common Crawl

julien-c · 2019-11-08T19:28:42Z

cc @aconneau 😬

ricardorei · 2019-11-20T17:27:04Z

is there any update in the XLM-R model?

ngoyal2707 · 2019-12-02T21:44:56Z

Let me know if you need some-help in porting the xlm-r models to HF.

stefan-it · 2019-12-04T18:37:28Z

I think that's maybe not the correct way, but I adjusted the convert_roberta_original_pytorch_checkpoint_to_pytorch.py script to convert the fairseq model into a transformers compatible model file. I used the sentencepiece BPE loader and adjusted the vocab size.

Then I used the CamemBERT model class to perform some evaluations on NER. But the result is not really good (I tried to replicate the CoNLL-2003 for English).

So I guess it is not as simple as this first attempt 😅

Gist for the conversion script is here.

The CamemBERT model configuration looks pretty much the same as XLM-R large?!

CZWin32768 · 2019-12-10T08:51:48Z

I think that's maybe not the correct way, but I adjusted the convert_roberta_original_pytorch_checkpoint_to_pytorch.py script to convert the fairseq model into a transformers compatible model file. I used the sentencepiece BPE loader and adjusted the vocab size.

Then I used the CamemBERT model class to perform some evaluations on NER. But the result is not really good (I tried to replicate the CoNLL-2003 for English).

So I guess it is not as simple as this first attempt 😅

Gist for the conversion script is here.

The CamemBERT model configuration looks pretty much the same as XLM-R large?!

Hi @stefan-it, do you have any update for your attempt?

stefan-it · 2019-12-11T00:12:54Z

The final models have been released today 😍

https://github.com/pytorch/fairseq/tree/master/examples/xlmr

So I'm going to try the conversion with these models tomorrow/in the next days :)

stefan-it · 2019-12-11T14:30:48Z

I think the model conversion is done correctly. But: the CamembertTokenizer implementation can't be used, because it adds some special tokens. I had to modify the tokenizer to match the output of the fairseq tokenization/.encode() method :) I'll report back some results on NER later.

update: I could achieve 90.41% on CoNLL-2003 (English), paper reports 92.74 (using Flair).
update 2: Using the run_ner.py example (incl. some hours of tokenization debugging...): 96.22 (dev) and 91.91 (test).

ricardorei · 2019-12-11T16:49:18Z

Btw I was using the XLM-R v0 checkpoints in a project I'm working on and the v0 checkpoints worked slightly better than the checkpoints added today. Is it possible to also add the older checkpoints?

TheEdoardo93 · 2019-12-11T16:55:15Z

I think it's the best solution to offer both checkpoint versions! In my opinion, the ideal case is that, as like to other models in Transformers, you can select which version of XLM-R checkpoints to use, e.g.

> from transformers import XLMRModel
> base_model = XLMRModel.from_pretrained('xlmr-base') # 250M parameters
> large_model = XLMRModel.from_pretrained('xlmr-large') # 560M parameters

Btw I was using the XLM-R v0 checkpoints in a project I'm working on and the v0 checkpoints worked slightly better than the checkpoints added today. Is it possible to also add the older checkpoints?

ricardorei · 2020-01-09T14:15:31Z

Btw using XLM-R I encounter this issue:
Batch size affecting output. #2401

This is really annoying and makes it hard to use the model.

stale · 2020-03-09T15:00:30Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

mohammedayub44 · 2020-05-05T20:20:02Z

@ricardorei Did you happen to successfully use the XLM-R model ?

I'm trying to see how this model can be used as pretrained step for NMT tasks, I tried raw version from XLM facebook repo and ran into multiple OOM issues.

The best suggestion so far I got is to try smaller version of Fairseq xlmr (base) on p3dn.24xlarge instance or the Google TPU Pytorch way.

Thanks !

ricardorei · 2020-05-06T17:24:54Z

@mohammedayub44

I am using the base model which runs well in a 12GB GPU with batch size of 8. Depending on your implementation and task you can run even bigger batches (16, 24 for example).

And I am also using the version directly from Fairseq, because you can load the v0 checkpoint.

The variability in my prediction with different batch sizes I could never figure out. Probably some floating-point precision issues going on under the hood. It doesn't change overall performance but it is annoying...

foxik · 2020-05-06T17:36:28Z

BTW, I am using the TF variant from https://huggingface.co/jplu/tf-xlm-roberta-base and https://huggingface.co/jplu/tf-xlm-roberta-large . I have successfully finetuned even the large model on a 16GB GPU and it was performing substantially better than the base model (on Czech Q&A).

mohammedayub44 · 2020-05-06T23:56:09Z

@ricardorei
Thanks for the confirmation. I'm okay with v0 checkpoints, I just need to check if the model can be fine-tuned for NMT. I'm guessing you're fine tuning for Classification tasks.

If you could share the prepare and train commands you are using. It would be easier than digging deep into every fairseq hyperparamter.

Thanks !

mohammedayub44 · 2020-05-06T23:59:40Z

@foxik Is TF variant more suitable for fine-tuning. Any particular preprocessing steps you carried out for fine-tuning. If you can share them, I can map the same for NMT task.

Thanks !

ricardorei · 2020-05-07T07:54:51Z

@mohammedayub44 Yes I was using it for classification/regression. In your case, you need the encoder and decoder part which would take a lot more space. I would suggest that you share parameters between you encoder and decoder.

I know that, with the right hyperparameter, you can achieve good results by sharing the parameters between your encoder and decoder -> A Simple and Effective Approach to Automatic Post-Editing
with Transfer Learning

In terms of hyperparameters that I am using, they are very simple. I freeze the encoder for 1 epoch while fine-tuning the classification head and then I fine-tune the entire model. My classification-head has a learning rate of 0.00003 while XLM-R has 0.00001. The optimizer is a standard Adam. This combination of gradual unfreezing with discriminative learning rates works well in my task.

mohammedayub44 · 2020-05-07T13:06:46Z

@ricardorei Thanks for sharing the paper. Some interesting results there.
Any hints on how I can setup both encoder and decoder of XLM-R and share the parameters using HuggingFace library. I could only find LM fine-tuning examples and notebook file. Nothing on NMT based fine-tuning.

Tiiiger mentioned this issue Nov 12, 2019

Update conversion script to convert XLM-R #1812

Closed

This was referenced Nov 17, 2019

xlm-mlm-17-1280 model masked word prediction #1842

Closed

Which model should I use for machine translation? #1436

Closed

KnutJaegersberg mentioned this issue Nov 24, 2019

XLM / XLM-R embeddings support? amansrivastava17/embedding-as-service#39

Open

TheEdoardo93 mentioned this issue Dec 4, 2019

XLM-R Support #2040

Closed

stale bot added the wontfix label Mar 9, 2020

stale bot closed this as completed Mar 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XLM-R] by Facebook AI Research #1769

[XLM-R] by Facebook AI Research #1769

TheEdoardo93 commented Nov 8, 2019 •

edited

julien-c commented Nov 8, 2019

ricardorei commented Nov 20, 2019

ngoyal2707 commented Dec 2, 2019

stefan-it commented Dec 4, 2019 •

edited

CZWin32768 commented Dec 10, 2019 •

edited

stefan-it commented Dec 11, 2019 •

edited

stefan-it commented Dec 11, 2019 •

edited

ricardorei commented Dec 11, 2019

TheEdoardo93 commented Dec 11, 2019

ricardorei commented Jan 9, 2020

stale bot commented Mar 9, 2020

mohammedayub44 commented May 5, 2020

ricardorei commented May 6, 2020 •

edited

foxik commented May 6, 2020

mohammedayub44 commented May 6, 2020

mohammedayub44 commented May 6, 2020 •

edited

ricardorei commented May 7, 2020 •

edited

mohammedayub44 commented May 7, 2020 •

edited

[XLM-R] by Facebook AI Research #1769

[XLM-R] by Facebook AI Research #1769

Comments

TheEdoardo93 commented Nov 8, 2019 • edited

🌟New model addition

Model description

Open Source status

Additional context

julien-c commented Nov 8, 2019

ricardorei commented Nov 20, 2019

ngoyal2707 commented Dec 2, 2019

stefan-it commented Dec 4, 2019 • edited

CZWin32768 commented Dec 10, 2019 • edited

stefan-it commented Dec 11, 2019 • edited

stefan-it commented Dec 11, 2019 • edited

ricardorei commented Dec 11, 2019

TheEdoardo93 commented Dec 11, 2019

ricardorei commented Jan 9, 2020

stale bot commented Mar 9, 2020

mohammedayub44 commented May 5, 2020

ricardorei commented May 6, 2020 • edited

foxik commented May 6, 2020

mohammedayub44 commented May 6, 2020

mohammedayub44 commented May 6, 2020 • edited

ricardorei commented May 7, 2020 • edited

mohammedayub44 commented May 7, 2020 • edited

TheEdoardo93 commented Nov 8, 2019 •

edited

stefan-it commented Dec 4, 2019 •

edited

CZWin32768 commented Dec 10, 2019 •

edited

stefan-it commented Dec 11, 2019 •

edited

stefan-it commented Dec 11, 2019 •

edited

ricardorei commented May 6, 2020 •

edited

mohammedayub44 commented May 6, 2020 •

edited

ricardorei commented May 7, 2020 •

edited

mohammedayub44 commented May 7, 2020 •

edited