Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint? #5204

Closed
shamanez opened this issue Jun 23, 2020 · 12 comments
Assignees

Comments

@shamanez
Copy link
Contributor

As the paper described, T5 uses a relative attention mechanism and the answer for this issue says, T5 can use any sequence length were the only constraint is memory.

According to this, can I use T5 to summarize inputs that have more than 512 tokens in a sequence?

@patrickvonplaten
Copy link
Contributor

Yes you can, but you should be aware that memory requirements quadruple when doubling the input sequence length for "normal" self-attention (as in T5).

So you will quickly run out of memory.

Here a snippet that shows that you can run input ids longer than config.max_postion_embeddings.

import torch
from transformers import T5ForConditionalGeneration

model = T5ForConditionalGeneration.from_pretrained("t5-base")
model.config.max_position_embeddings  # 512
input_ids = torch.tensor([600 * [0]])  # shape (1, 600)
model(input_ids, decoder_input_ids=input_ids)  # => no error

For more memory efficient models, you should take a look at Reformer and Longformer

@patrickvonplaten
Copy link
Contributor

I hope we will soon have these models ready for summarization

@shamanez
Copy link
Contributor Author

shamanez commented Jun 23, 2020

Thanks for the quick help.

So basically, the T5 model in hugging face can handled arbitrary sequence length outputs right?
So the second line (model.config.max_position_embeddings) basically shows the default max input seq length right ?

What do you think of the following code (Here I simply modify the tokenizer max_length):

model = T5ForConditionalGeneration.from_pretrained('t5-small')
 tokenizer = T5Tokenizer.from_pretrained('t5-small')
 t5_prepared_Text = "summarize: "+some_preprocess_text 
 tokenized_text = tokenizer.encode(t5_prepared_Text,  max_length=1024,return_tensors="pt")

 summary_ids = model.generate(tokenized_text,
                                    num_beams=4,
                                    no_repeat_ngram_size=2,
                                    min_length=30,
                                    max_length=100,
                                    early_stopping=True)


@shamanez
Copy link
Contributor Author

Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior?

@mars997
Copy link

mars997 commented Feb 15, 2021

Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior?

Hi I have the same question. Do you happen to figure out why?

@shamanez
Copy link
Contributor Author

shamanez commented Feb 15, 2021 via email

@mars997
Copy link

mars997 commented Feb 15, 2021

Hi, Those days I haven't had much of idea on huggiface models. Since we can add any length as the input.. the main parameter should be minimum generation length. Try to change it.

I am still very new to huggiface. I have a pretty long text about 1500 words. The issue I was having is when I set max_length=512 or 1024, they kinda return the same summary. Do you know why?

@shamanez
Copy link
Contributor Author

shamanez commented Feb 15, 2021 via email

@PastelBelem8
Copy link

Hi, do we have to fine-tune the model when changing the model.config.max_position_embeddings?

@shamanez
Copy link
Contributor Author

shamanez commented Feb 7, 2022

No really, cz T5 uses relative positional embeddings.

@RenzeLou
Copy link

RenzeLou commented Jan 3, 2023

I think it is because minimum length is unchanged. Regardless of the input.. algorthm tries to generate a text until it gets the EOS (end of sentence) token. So it is common to get same length summary even if u add few more sentence to the original input.

On Mon, Feb 15, 2021, 16:40 mars997 @.***> wrote: Hi, Those days I haven't had much of idea on huggiface models. Since we can add any length as the input.. the main parameter should be minimum generation length. Try to change it. I am still very new to huggiface. I have a pretty long text about 1500 words. The issue I was having is when I set max_length=512 or 1024, they kinda return the same summary. Do you know why? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#5204 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEA4FGXCWKQKTGQML5LWTPLS7CJSLANCNFSM4OFG7QHA .

Personally, I think there is another reason:

First, if you use the off-the-shelf T5-base model to summarize directly (i.e., no fine-tuning), a longer input would result in the same output as the original input. Because the T5-base model was pre-trained with max_source_length==512, those tokens exceeding 512 may not be attended by the T5Attention layer.

But after fine-tuning the T5-base model with a longer max_source_length, an input with a longer max_source_length perhaps gives you a different output than 512.

@shanto-Rahman
Copy link

What is the maximum sequence length for the T5-large?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants