Persistent generation issues with MT5 models (base and fine-tuned) across environments

I'm experiencing consistent text generation failures with both pretrained google/mt5-base and custom fine-tuned MT5 models across multiple environments (local machines, Google Colab). The models produce nonsensical outputs containing <extra_id_0> and random tokens despite correct task prefixes and parameters.

**Affected Models:**
- google/mt5-base
- Custom MT5 variants (cointegrated/rut5-base)
- Fine-tuned for summarization task cointegrated/rut5-base
- 
**Steps to Reproduce**
```from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-base")
tokenizer = AutoTokenizer.from_pretrained("google/mt5-base")

inputs = tokenizer(
    "translate English to Russian: Hello world!", 
    return_tensors="pt"
)

output = model.generate(
    inputs.input_ids, 
    max_new_tokens=50,
    num_beams=5,
    early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

**Expected Behavior**

_Expected Russian translation:_ "Привет, мир!"
_Actual output:_ <extra_id_0> Hello world! or similar garbage

**Environment**

- Transformers 4.50.0 (also checked 4.48.3 and 4.30.0)
- PyTorch 2.0.1+cu118
- Python 3.10.12
- Both CPU and CUDA environments affected
- Reproducible in Google Colab (T4 GPU)

**Additional Context**

- Issue persists across multiple task formats (translation, summarization)
- Verified correct model loading: model.config shows expected architecture
- Tokenization appears correct when inspected:
```print(tokenizer.tokenize("translate English to Russian: Hello world!"))
# Output: ['▁translate', '▁English', '▁to', '▁Russian', ':', '▁Hello', '▁world', '!']
```

- Tried multiple generation strategies (greedy, beam, sampling) without success

I am happy to provide additional code and information.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Persistent generation issues with MT5 models (base and fine-tuned) across environments #37048

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persistent generation issues with MT5 models (base and fine-tuned) across environments #37048

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions