T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

mshen2 · 2022-06-04T21:41:33Z

Hi, thanks in advance! I am looking at the run_summarization.py under examples/pytorch/summarization/, in the following code snippets where I want to set max_source_length bigger than 512 where 512 is the max length T5 was pre-trained on:

if (
        hasattr(model.config, "max_position_embeddings")
        and model.config.max_position_embeddings < data_args.max_source_length
    ):
        if model_args.resize_position_embeddings is None:
            logger.warning(
                "Increasing the model's number of position embedding vectors from"
                f" {model.config.max_position_embeddings} to {data_args.max_source_length}."
            )
            model.resize_position_embeddings(data_args.max_source_length)
        elif model_args.resize_position_embeddings:
            model.resize_position_embeddings(data_args.max_source_length)
        else:
            raise ValueError(
                f"`--max_source_length` is set to {data_args.max_source_length}, but the model only has"
                f" {model.config.max_position_embeddings} position encodings. Consider either reducing"
                f" `--max_source_length` to {model.config.max_position_embeddings} or to automatically resize the"
                " model's position encodings by passing `--resize_position_embeddings`."
            )

My questions are:

I remembered T5Config was having a max_position_embeddings parameter before (was 512), why it is removed now?
In the script, the default max_sequence_length is set to 1024. Since it is bigger than 512, why it is not required to call resize_position_embeddings method like before in this issue: T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint? #5204 (comment)
Bart also used relative position embedding like T5, but BartConfig's max_position_embeddings is kept with 1024 and when setting max_source_length longer than 1024, it does require calling resize_position_embeddings according to the code snippets above. Is it because of different relative position embedding between BART and T5.

I think I must be misunderstanding something, appreciate if some explanations can be given here. Thanks!!

The text was updated successfully, but these errors were encountered:

LysandreJik · 2022-06-06T08:10:13Z

cc @patrickvonplaten

patrickvonplaten · 2022-06-08T22:32:54Z

Hey @mshen2,

I don't think BART uses relative position embeddings, but rather "fixed" position embeddings ("fixed" in the sence that if seq_len > 1024 is provided the model gives an index error).

Could you maybe look into this line of code in BART:

transformers/src/transformers/models/bart/modeling_bart.py

Line 718 in 66e8656

self.embed_positions = BartLearnedPositionalEmbedding(

-> it shows that the position ids are a fixed-size matrix

Also cc @patil-suraj here

github-actions · 2022-07-05T15:01:46Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions bot closed this as completed Jul 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

mshen2 commented Jun 4, 2022 •

edited

Loading

LysandreJik commented Jun 6, 2022

patrickvonplaten commented Jun 8, 2022

github-actions bot commented Jul 5, 2022

T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

Comments

mshen2 commented Jun 4, 2022 • edited Loading

LysandreJik commented Jun 6, 2022

patrickvonplaten commented Jun 8, 2022

github-actions bot commented Jul 5, 2022

mshen2 commented Jun 4, 2022 •

edited

Loading