-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Closed
Description
I'm experiencing consistent text generation failures with both pretrained google/mt5-base and custom fine-tuned MT5 models across multiple environments (local machines, Google Colab). The models produce nonsensical outputs containing <extra_id_0> and random tokens despite correct task prefixes and parameters.
Affected Models:
- google/mt5-base
- Custom MT5 variants (cointegrated/rut5-base)
- Fine-tuned for summarization task cointegrated/rut5-base
Steps to Reproduce
model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-base")
tokenizer = AutoTokenizer.from_pretrained("google/mt5-base")
inputs = tokenizer(
"translate English to Russian: Hello world!",
return_tensors="pt"
)
output = model.generate(
inputs.input_ids,
max_new_tokens=50,
num_beams=5,
early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Expected Behavior
Expected Russian translation: "Привет, мир!"
Actual output: <extra_id_0> Hello world! or similar garbage
Environment
- Transformers 4.50.0 (also checked 4.48.3 and 4.30.0)
- PyTorch 2.0.1+cu118
- Python 3.10.12
- Both CPU and CUDA environments affected
- Reproducible in Google Colab (T4 GPU)
Additional Context
- Issue persists across multiple task formats (translation, summarization)
- Verified correct model loading: model.config shows expected architecture
- Tokenization appears correct when inspected:
# Output: ['▁translate', '▁English', '▁to', '▁Russian', ':', '▁Hello', '▁world', '!']
- Tried multiple generation strategies (greedy, beam, sampling) without success
I am happy to provide additional code and information.
Metadata
Metadata
Assignees
Labels
No labels