-
Notifications
You must be signed in to change notification settings - Fork 26.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Converting fairseq NMT to transformers misses model weight #10298
Comments
Pinging @stas00 here |
Thank you for the ping, @NielsRogge @tagucci, when you file an issue you will find a list of who to tag for what topic, so please use it to tag the right people. Otherwise it's hard for everybody to try to follow all issues. also when you link to a line of code in github, always hit OK, could you first clarify where do you get "decoder.embed_out weight is missing" - the command line and the backtrace please. Also a dump of the model (i.e. Now to the guess work. Does your model miss The context is here: fairseq has different versions of their code, and some have keys renamed or added, that's why they have all that logic. You can see that it's a simple alias - i.e. in fsmt decoder embed and output are always shared.
So if it's missing you can assign it in the conversion script:
add this to this line: transformers/src/transformers/models/fsmt/convert_fsmt_original_pytorch_checkpoint_to_pytorch.py Line 247 in 461e8ca but again I could have guessed wrong and will need to see the model dump to tell you more. You can see the dump of original model I converted from here: https://github.com/stas00/porting/blob/master/transformers/fairseq-wmt19/nbs/config.ipynb |
@NielsRogge
As you said, fsmt does not have decoder embed and output seperately, my fairseq import torch
from pprint import pprint
chkpt = torch.load("model/checkpoint_best.pt")
model = chkpt["model"]
pprint(vars(chkpt["args"]))
print("\n".join(model.keys()))
|
Thank you for the model dump, so my guess was correct - it's missing I still don't know what the error you get, when and the backtrace, but perhaps my guessed solution is all you need. But no, you don't need to re-train. if it works could you adapt the script to check if the checkpoint that is being loaded doesn't have this key and if so to copy it as I suggested? |
@stas00 from transformers import FSMTForConditionalGeneration, FSMTTokenizer, TranslationPipeline
import torch
input_text = "Machine learning is great!"
# fairseq
en2de = torch.hub.load('pytorch/fairseq', 'transformer.wmt19.en-de', checkpoint_file='model1.pt:model2.pt:model3.pt:model4.pt',
tokenizer='moses', bpe='fastbpe')
fairseq_res = en2de.translate(input_text)
# tranformers
fsmt_path = "./fairseq2hf/data/wmt19-en-de/"
tokenizer = FSMTTokenizer.from_pretrained(fsmt_path)
model = FSMTForConditionalGeneration.from_pretrained(fsmt_path)
nlp = TranslationPipeline(model=model, tokenizer=tokenizer)
fsmt_res = nlp(input_text)[0]["translation_text"]
print("fairseq: {}".format(fairseq_res))
print("transformer: {}".format(fsmt_res))
print("match: {}".format(fairseq_res == fsmt_res))
"""
fairseq: Maschinelles Lernen ist großartig!
transformer: Maschinelles Lernen ist großartig!
match: True
""" However, my fairseq model and converted HF model have wrong result with same parameter (beam_size=5). Do you have any idea to debug why tranlation results are different? fairseq result
transformers resultencoded_token = torch.tensor([[5269, 2069, 5, 1154, 9, 4, 1823, 3382, 5, 3128, 116, 167, 1582, 7, 2192, 914, 63, 6, 1823, 2807, 124, 1219, 1106, 8, 53, 2175, 2007, 483, 4, 660, 708, 5229, 33, 44, 4, 6049, 1430, 5, 1806, 2050, 2282, 1908, 4, 334, 3229, 4808, 6102, 5, 5031, 11, 5, 291, 4214, 6485, 10, 5784, 1908, 23, 1765, 4916, 6, 2]])
fsmt = FSMTForConditionalGeneration.from_pretrained("./fairseq2HF/")
hypo = fsmt.generate(encoded_token, num_beams=5)
print(hypo)
# tensor([[ 2, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 2]]) |
I'm a bit lost - we were discussing a missing state dict key, now we are discussing invalid translation. Did my suggestion help to resolve the problem of the missing key and now you're presenting the next issue? Wrt to your transformers result with your model, do you get any better behavior if you encode the tokens via transformers and then feed it to generate? perhaps the dict has somehow changed? though a repeated 21 is suspiciously bad. |
Yes, thanks for the helpful comments.
I do not use transformers tokenizer because my fairseq model has a different vocab size, and it's impossible to encode/decode by a single tokenizer model. Converting token to id is used by fairseq's Thanks for the big help! |
Thank you for clarifying that your original issue has been resolved. Please feel free to close this issue when you feel it's working for you. Based on your comments, I'm concerned about 2 things:
|
Hi there, question about fairseq NMT model (FSMT) conversion.
I tried to convert my own fairseq-nmt model (
transformer_wmt_en_de
) based on this conversion script.However,
decoder.embed_out
weight is missing after converting fairseq model to transformers FSMT model. This parameter exists when not specifing--share-all-embeddings
or--share-decoder-input-output-embed
, while official fairseq wmt models do not havedecoder.embed_out
weight because specifying--share-all-embedding
.facebookresearch/fairseq#2537
Are there any solution or tips to converting own fairseq model?
The text was updated successfully, but these errors were encountered: