New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Layer Norm in XLM-R XL and XXL #3600
Comments
@stefan-it XLMR-base and large were post layernorm settings of transformer and XL and XXL are pre layernorm settings. in preLN setting usually the embeddings are not normalized and there's an LN at the start of transformer block. Though there's extra LN at the end of transformer. |
You will need to create the HF transformer also in the same way to get same output |
@ngoyal2707 independently of those changes between the base and large I can't load the new XL and XXL models using any fairseq version (without making changes to the state_dict). If I use version 0.9.0 I get a bunch of unexpected keys because the "decoder" was renamed "encoder".
In any case, those checkpoints seem impossible to load without hacking around. |
@ricardorei I installed |
@ngoyal2707 Thanks for your explanation 👍 I could see the changes in 54423d3 so we're currently adjusting the RoBERTa model in Transformers to support the new models :) |
I encountered same error, and it seems that layer_norm needs to be added in TransformerSentenceEncoder https://github.com/pytorch/fairseq/blob/master/fairseq/modules/transformer_sentence_encoder.py. |
This issue has been automatically marked as stale. If this issue is still affecting you, please leave any comment (for example, "bump"), and we'll keep it open. We are sorry that we haven't been able to prioritize it yet. If you have any new additional information, please include it with your comment! |
Closing this issue after a prolonged period of inactivity. If this issue is still present in the latest release, please create a new issue with up-to-date information. Thank you! |
Hi :)
I'm currently trying to convert the recently released XLM-R XL and XXL models into Transformers-compatible weights.
I'm using the latest
fairseq
master version (with commit 2fd9d8a) and there's something strange with the layer norm parameter.For debugging, here are the parameter names (shortened) for the XLM-R Base model:
the parameter name is
layernorm_embedding
. However, for the new XL models, it outputs:So the parameter name is "layer_norm". When loading the model using
fairseq
library, like:The (shortened) model list for XLM-R Base shows:
whereas the module list for the XL model shows:
So a layer norm is missing in the XL model 🤔
Side note: I've updates the conversion script in Transformers library to be compatible with latest
fairseq
master. At the end, the script compares a model (forward) pass between the originalfairseq
model and the converted model to see the differences. For the old XLM-R Base model. the output is identical, whereas for XLM-R XL the difference is very high. Script can be found here.Thanks for your help!
The text was updated successfully, but these errors were encountered: