MBartForConditionalGeneration doesn't seem to be able to complete the task of filling mask.

### System Info

transformers version: 4.29.2
Platform: Linux ubt-4090 5.15.0-75-generic 
Python version: 3.9.5
PyTorch version (GPU?): 1.12.1+cu113 (True)
Tensorflow version (GPU?): not installed (NA)
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

### Who can help?

@ArthurZucker @younesbelkada @patrickvonplaten

### Information

- [ ] The official example scripts
- [X] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

When I used the official document on huggingface for mask filling, I got the expected output.
```python
from transformers import AutoTokenizer, MBartForConditionalGeneration

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-cc25")
tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-cc25")

# de_DE is the language symbol id <LID> for German
TXT = "</s> Meine Freunde sind <mask> nett aber sie essen zu viel Kuchen. </s> de_DE"

input_ids = tokenizer([TXT], add_special_tokens=False, return_tensors="pt")["input_ids"]
logits = model(input_ids).logits

masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)

tokenizer.decode(predictions).split()
['nett', 'sehr', 'ganz', 'nicht', 'so']
```

But when I changed the characters that need to be filled into Chinese, there was an accident.
```python
from transformers import AutoTokenizer, MBartForConditionalGeneration

model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-cc25")
tokenizer = AutoTokenizer.from_pretrained("facebook/mbart-large-cc25")

# de_DE is the language symbol id <LID> for German
TXT = "</s> 今天<mask>真好,我准备去公园打羽毛球. </s> zh_ZH"

input_ids = tokenizer([TXT], add_special_tokens=False, return_tensors="pt")["input_ids"]
logits = model(input_ids).logits

masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)

tokenizer.decode(predictions).split()
[',·:.']
```
![image](https://github.com/huggingface/transformers/assets/58644245/7d710492-b3cd-4024-bf85-4ed926308eec)
After that, I tried to get mBART to restore a sentence with multiple masks for me, and the effect was even worse.
```python
from transformers import MBartTokenizer,DataCollatorForLanguageModeling,MBartForConditionalGeneration
tokenizer = MBartTokenizer.from_pretrained("facebook/mbart-large-cc25")
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)
model = MBartForConditionalGeneration.from_pretrained("facebook/mbart-large-cc25")

TXT_input = "<s>The weather is so nice today, I am going to play badminton in the park</s>en_xx"

inputs = tokenizer([TXT_input], add_special_tokens=False, return_tensors="pt",max_length=32,  padding='max_length')

masked_inputs_and_labels = data_collator([inputs]) 

input_ids = masked_inputs_and_labels['input_ids'][0]
attention_mask = masked_inputs_and_labels['attention_mask'][0]
labels = masked_inputs_and_labels['labels'][0]

masked_inputs={key:value[0] for key,value in masked_inputs_and_labels.items()}
outputs = model(input_ids = input_ids,attention_mask = attention_mask,labels = labels)
logits = outputs.logits

print(f'after mask: {tokenizer.decode(masked_inputs["input_ids"][0])}')

predictions = outputs.logits.argmax(dim=-1)

print(f'Predicted sentence: {tokenizer.decode(predictions[0])}')
after mask: <s> The weather is so nice today, I am going tosähkö badminton in the park</s> en_xx<pad><pad><pad><pad><pad><pad><pad><mask><pad><pad><pad>
Predicted sentence: <s>นยยยยยนนนนนน badmintonนนนap<s><s><s><s><s><s><s><s><s><s><s><s><s><s>
```
**Excuse me, is there something wrong with my usage?In that case, how can I correctly use mBART to fill the mask?**



### Expected behavior

I think mBART has at least one Chinese token with five highest probabilities.Or restore the masked sentence for me.
such as:['天气','心情',.....]
or:Predicted sentence: "The weather is so nice today, I am going to play badminton in the park en_xx"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MBartForConditionalGeneration doesn't seem to be able to complete the task of filling mask. #25425

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MBartForConditionalGeneration doesn't seem to be able to complete the task of filling mask. #25425

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions