Layer Norm in XLM-R XL and XXL #3600

stefan-it · 2021-06-08T16:54:05Z

Hi :)

I'm currently trying to convert the recently released XLM-R XL and XXL models into Transformers-compatible weights.

I'm using the latest fairseq master version (with commit 2fd9d8a) and there's something strange with the layer norm parameter.

For debugging, here are the parameter names (shortened) for the XLM-R Base model:

encoder.sentence_encoder.layernorm_embedding.weight        
encoder.sentence_encoder.layernorm_embedding.bias

the parameter name is layernorm_embedding. However, for the new XL models, it outputs:

encoder.sentence_encoder.layer_norm.weight
encoder.sentence_encoder.layer_norm.bias

So the parameter name is "layer_norm". When loading the model using fairseq library, like:

from fairseq.models.roberta import RobertaModel as FairseqRobertaModel


xlmr = FairseqRobertaModel.from_pretrained(roberta_checkpoint_path)
xlmr.eval()  # disable dropout

The (shortened) model list for XLM-R Base shows:

RobertaHubInterface(                                                                                 
  (model): RobertaModel(                                                                                  
    (encoder): RobertaEncoder(                                                
      (sentence_encoder): TransformerEncoder(                               
        (dropout_module): FairseqDropout()                                                               
        (embed_tokens): Embedding(250002, 768, padding_idx=1)               
        (embed_positions): LearnedPositionalEmbedding(514, 768, padding_idx=1)                           
        (layernorm_embedding): FusedLayerNorm(torch.Size([768]), eps=1e-05, elementwise_affine=True)

whereas the module list for the XL model shows:

RobertaHubInterface(                                                                                      
  (model): RobertaModel(                                                                              
    (encoder): RobertaEncoder(                                                                            
      (sentence_encoder): TransformerEncoder(                                                             
        (dropout_module): FairseqDropout()
        (embed_tokens): Embedding(250880, 2560, padding_idx=1)
        (embed_positions): LearnedPositionalEmbedding(514, 2560, padding_idx=1)

So a layer norm is missing in the XL model 🤔

Side note: I've updates the conversion script in Transformers library to be compatible with latest fairseq master. At the end, the script compares a model (forward) pass between the original fairseq model and the converted model to see the differences. For the old XLM-R Base model. the output is identical, whereas for XLM-R XL the difference is very high. Script can be found here.

Thanks for your help!

The text was updated successfully, but these errors were encountered:

ngoyal2707 · 2021-06-08T17:05:07Z

@stefan-it XLMR-base and large were post layernorm settings of transformer and XL and XXL are pre layernorm settings.

in preLN setting usually the embeddings are not normalized and there's an LN at the start of transformer block. Though there's extra LN at the end of transformer.

ngoyal2707 · 2021-06-08T17:06:38Z

You will need to create the HF transformer also in the same way to get same output

ricardorei · 2021-06-09T07:22:09Z

@ngoyal2707 independently of those changes between the base and large I can't load the new XL and XXL models using any fairseq version (without making changes to the state_dict).

If I use version 0.9.0 I get a bunch of unexpected keys because the "decoder" was renamed "encoder".
If I use version >=0.10 I have unexpected keys on the emb_layer_norm which I assume was renamed to layer_norm.

RuntimeError: Error(s) in loading state_dict for RobertaModel:
        Missing key(s) in state_dict: "encoder.sentence_encoder.emb_layer_norm.weight", "encoder.sentence_encoder.emb_layer_norm.bias".
        Unexpected key(s) in state_dict: "encoder.sentence_encoder.layer_norm.weight", "encoder.sentence_encoder.layer_norm.bias", "encoder.sentence_encoder.version".

In any case, those checkpoints seem impossible to load without hacking around.

stefan-it · 2021-06-09T08:35:20Z

@ricardorei I installed fairseq via pip3 install git+https://github.com/pytorch/fairseq.git , as I've also seen different error messages for various fairseq version. But with latest master I could load the new larger models 🤗

stefan-it · 2021-06-09T08:53:34Z

@ngoyal2707 Thanks for your explanation 👍 I could see the changes in 54423d3 so we're currently adjusting the RoBERTa model in Transformers to support the new models :)

Soonhwan-Kwon · 2021-06-09T08:57:28Z

I encountered same error, and it seems that layer_norm needs to be added in TransformerSentenceEncoder https://github.com/pytorch/fairseq/blob/master/fairseq/modules/transformer_sentence_encoder.py.

stale · 2022-04-17T17:20:43Z

This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment!

stale · 2022-04-29T16:21:45Z

Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you!

stefan-it added needs triage question labels Jun 8, 2021

stefan-it mentioned this issue Jun 9, 2021

Add support for XLM-R XL and XXL models huggingface/transformers#12082

Closed

4 tasks

jm-glowienke mentioned this issue Jun 11, 2021

Use pre-trained model DeNederlandscheBank/nqm#26

Closed

8 tasks

stale bot added the stale label Apr 17, 2022

stale bot closed this as completed Apr 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Layer Norm in XLM-R XL and XXL #3600

Layer Norm in XLM-R XL and XXL #3600

stefan-it commented Jun 8, 2021

ngoyal2707 commented Jun 8, 2021

ngoyal2707 commented Jun 8, 2021

ricardorei commented Jun 9, 2021

stefan-it commented Jun 9, 2021 •

edited

stefan-it commented Jun 9, 2021

Soonhwan-Kwon commented Jun 9, 2021 •

edited

stale bot commented Apr 17, 2022

stale bot commented Apr 29, 2022

Layer Norm in XLM-R XL and XXL #3600

Layer Norm in XLM-R XL and XXL #3600

Comments

stefan-it commented Jun 8, 2021

ngoyal2707 commented Jun 8, 2021

ngoyal2707 commented Jun 8, 2021

ricardorei commented Jun 9, 2021

stefan-it commented Jun 9, 2021 • edited

stefan-it commented Jun 9, 2021

Soonhwan-Kwon commented Jun 9, 2021 • edited

stale bot commented Apr 17, 2022

stale bot commented Apr 29, 2022

stefan-it commented Jun 9, 2021 •

edited

Soonhwan-Kwon commented Jun 9, 2021 •

edited