Skip to content

['encoder.version', 'decoder.version'] are unexpected when loading a pretrained BART model #6652

@stas00

Description

@stas00

Using an example from the bart doc:
https://huggingface.co/transformers/model_doc/bart.html#bartforconditionalgeneration

from transformers import BartTokenizer, BartForConditionalGeneration
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large')
TXT = "My friends are <mask> but they eat too many carbs."

model = BartForConditionalGeneration.from_pretrained('facebook/bart-large')
input_ids = tokenizer([TXT], return_tensors='pt')['input_ids']
logits = model(input_ids)[0]

masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)

print(tokenizer.decode(predictions).split())

gives:

Some weights of the model checkpoint at facebook/bart-large were not used 
when initializing BartForConditionalGeneration: 
['encoder.version', 'decoder.version']

- This IS expected if you are initializing BartForConditionalGeneration from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForConditionalGeneration from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
test:9: UserWarning: This overload of nonzero is deprecated:
        nonzero()
Consider using one of the following signatures instead:
        nonzero(*, bool as_tuple) (Triggered internally at  /opt/conda/conda-bld/pytorch_1597302504919/work/torch/csrc/utils/python_arg_parser.cpp:864.)
  masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
['good', 'great', 'all', 'really', 'very']

well, there is one more issue of using a weird deprecated nonzero() invocation, which has to do with some strange undocumented requirement to pass the as_tuple arg, since pytorch 1.5 .pytorch/pytorch#43425

we have authorized_missing_keys:
authorized_missing_keys = [r"final_logits_bias", r"encoder\.version", r"decoder\.version"]
https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_bart.py#L942
which correctly updates missing_keys - should there be also an authorized_unexpected_keys which would clean up unexpected_keys?

(note: I re-edited this issue once I understood it better to save reader's time, the history is there if someone needs it)

And found another variety of it: for ['model.encoder.version', 'model.decoder.version']

tests/test_modeling_bart.py::BartModelIntegrationTests::test_mnli_inference Some weights of the model checkpoint at facebook/bart-large-mnli were not used when initializing BartForSequenceClassification: ['model.encoder.version', 'model.decoder.version']
- This IS expected if you are initializing BartForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing BartForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
PASSED

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions