Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5ForConditionalGeneration does not require resize_position_embeddings when input sequence length is longer than 512? #17557

Closed
mshen2 opened this issue Jun 4, 2022 · 3 comments

Comments

@mshen2
Copy link

mshen2 commented Jun 4, 2022

Hi, thanks in advance! I am looking at the run_summarization.py under examples/pytorch/summarization/, in the following code snippets where I want to set max_source_length bigger than 512 where 512 is the max length T5 was pre-trained on:

if (
        hasattr(model.config, "max_position_embeddings")
        and model.config.max_position_embeddings < data_args.max_source_length
    ):
        if model_args.resize_position_embeddings is None:
            logger.warning(
                "Increasing the model's number of position embedding vectors from"
                f" {model.config.max_position_embeddings} to {data_args.max_source_length}."
            )
            model.resize_position_embeddings(data_args.max_source_length)
        elif model_args.resize_position_embeddings:
            model.resize_position_embeddings(data_args.max_source_length)
        else:
            raise ValueError(
                f"`--max_source_length` is set to {data_args.max_source_length}, but the model only has"
                f" {model.config.max_position_embeddings} position encodings. Consider either reducing"
                f" `--max_source_length` to {model.config.max_position_embeddings} or to automatically resize the"
                " model's position encodings by passing `--resize_position_embeddings`."
            )

My questions are:

  1. I remembered T5Config was having a max_position_embeddings parameter before (was 512), why it is removed now?
  2. In the script, the default max_sequence_length is set to 1024. Since it is bigger than 512, why it is not required to call resize_position_embeddings method like before in this issue: T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint? #5204 (comment)
  3. Bart also used relative position embedding like T5, but BartConfig's max_position_embeddings is kept with 1024 and when setting max_source_length longer than 1024, it does require calling resize_position_embeddings according to the code snippets above. Is it because of different relative position embedding between BART and T5.

I think I must be misunderstanding something, appreciate if some explanations can be given here. Thanks!!

@LysandreJik
Copy link
Member

cc @patrickvonplaten

@patrickvonplaten
Copy link
Contributor

Hey @mshen2,

I don't think BART uses relative position embeddings, but rather "fixed" position embeddings ("fixed" in the sence that if seq_len > 1024 is provided the model gives an index error).

Could you maybe look into this line of code in BART:

self.embed_positions = BartLearnedPositionalEmbedding(
-> it shows that the position ids are a fixed-size matrix

Also cc @patil-suraj here

@github-actions
Copy link

github-actions bot commented Jul 5, 2022

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants