Skip to content

Persistent generation issues with MT5 models (base and fine-tuned) across environments #37048

@Elpharran

Description

@Elpharran

I'm experiencing consistent text generation failures with both pretrained google/mt5-base and custom fine-tuned MT5 models across multiple environments (local machines, Google Colab). The models produce nonsensical outputs containing <extra_id_0> and random tokens despite correct task prefixes and parameters.

Affected Models:

  • google/mt5-base
  • Custom MT5 variants (cointegrated/rut5-base)
  • Fine-tuned for summarization task cointegrated/rut5-base

Steps to Reproduce

model = AutoModelForSeq2SeqLM.from_pretrained("google/mt5-base")
tokenizer = AutoTokenizer.from_pretrained("google/mt5-base")

inputs = tokenizer(
    "translate English to Russian: Hello world!", 
    return_tensors="pt"
)

output = model.generate(
    inputs.input_ids, 
    max_new_tokens=50,
    num_beams=5,
    early_stopping=True
)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Expected Behavior

Expected Russian translation: "Привет, мир!"
Actual output: <extra_id_0> Hello world! or similar garbage

Environment

  • Transformers 4.50.0 (also checked 4.48.3 and 4.30.0)
  • PyTorch 2.0.1+cu118
  • Python 3.10.12
  • Both CPU and CUDA environments affected
  • Reproducible in Google Colab (T4 GPU)

Additional Context

  • Issue persists across multiple task formats (translation, summarization)
  • Verified correct model loading: model.config shows expected architecture
  • Tokenization appears correct when inspected:
# Output: ['▁translate', '▁English', '▁to', '▁Russian', ':', '▁Hello', '▁world', '!']
  • Tried multiple generation strategies (greedy, beam, sampling) without success

I am happy to provide additional code and information.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions