You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The original value of beam_scores is 0, but when using temperature like 0.5, the score is also divided the temperature value in logit_warper and gets larger and larger. And finally it causes the overflow of next_token_scores.
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I provide a simple code that can reproduce this issue.
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model = model.cuda()
src = 'In einem Notruf erzählte Professor Shannon Lamb mit einer etwas zittrigen Stimme der Polizei, dass er seine Freundin erschossen habe und dass die Beamten zu seinem Haus kommen müssten.'
Great! Looking forward to your solution.
For now, I just swap these two lines (L2566 && 2567) and the error disappears. But I'm not sure what I do is correct.
Are you using half or full precision here? Also inf values are not necessarily the reason for a bug, it might also be that mBart has some default logit processor settings that 0 out values which the lead to inf (cc @gante)
System Info
latest transformers version == 4.24.0
When generating samples with mBART, I encounter this problem:
Looking deeply into the codes, I find the problem roots from the beam score added to next_token_scores here:
transformers/src/transformers/generation_utils.py
Line 2566 in bc21aac
The original value of beam_scores is 0, but when using temperature like 0.5, the score is also divided the temperature value in logit_warper and gets larger and larger. And finally it causes the overflow of next_token_scores.
Who can help?
@patrickvonplaten @Narsil @gante
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I provide a simple code that can reproduce this issue.
import transformers
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model = AutoModelForSeq2SeqLM.from_pretrained("facebook/mbart-large-50-many-to-many-mmt")
model = model.cuda()
src = 'In einem Notruf erzählte Professor Shannon Lamb mit einer etwas zittrigen Stimme der Polizei, dass er seine Freundin erschossen habe und dass die Beamten zu seinem Haus kommen müssten.'
encoded_hi = tokenizer(src, return_tensors="pt", padding=True).to('cuda') # do_sample=True
generated_tokens = model.generate(**encoded_hi, forced_bos_token_id=tokenizer.lang_code_to_id['en_XX'], temperature=0.5, do_sample=True, num_beams=10, num_return_sequences=10)
tgt_txt = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)
Expected behavior
I think this should be solved but I'm not sure about the effect of the beam_scores.
The text was updated successfully, but these errors were encountered: