New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint? #5204
Comments
Yes you can, but you should be aware that memory requirements quadruple when doubling the input sequence length for "normal" self-attention (as in T5). So you will quickly run out of memory. Here a snippet that shows that you can run input ids longer than import torch
from transformers import T5ForConditionalGeneration
model = T5ForConditionalGeneration.from_pretrained("t5-base")
model.config.max_position_embeddings # 512
input_ids = torch.tensor([600 * [0]]) # shape (1, 600)
model(input_ids, decoder_input_ids=input_ids) # => no error For more memory efficient models, you should take a look at |
I hope we will soon have these models ready for summarization |
Thanks for the quick help. So basically, the T5 model in hugging face can handled arbitrary sequence length outputs right? What do you think of the following code (Here I simply modify the tokenizer max_length):
|
Hi, I checked two summary outputs of T5, after using 1024 and 512 sequence lengths. I do not see any difference in generated summaries. Any idea for this behavior? |
Hi I have the same question. Do you happen to figure out why? |
Hi,
Those days I haven't had much of idea on huggiface models. Since we can add
any length as the input.. the main parameter should be minimum generation
length.
Try to change it.
|
I am still very new to huggiface. I have a pretty long text about 1500 words. The issue I was having is when I set max_length=512 or 1024, they kinda return the same summary. Do you know why? |
I think it is because minimum length is unchanged. Regardless of the
input.. algorthm tries to generate a text until it gets the EOS (end of
sentence) token. So it is common to get same length summary even if u add
few more sentence to the original input.
…On Mon, Feb 15, 2021, 16:40 mars997 ***@***.***> wrote:
Hi, Those days I haven't had much of idea on huggiface models. Since we
can add any length as the input.. the main parameter should be minimum
generation length. Try to change it.
I am still very new to huggiface. I have a pretty long text about 1500
words. The issue I was having is when I set max_length=512 or 1024, they
kinda return the same summary. Do you know why?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#5204 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AEA4FGXCWKQKTGQML5LWTPLS7CJSLANCNFSM4OFG7QHA>
.
|
Hi, do we have to fine-tune the model when changing the |
No really, cz T5 uses relative positional embeddings. |
Personally, I think there is another reason: First, if you use the off-the-shelf T5-base model to summarize directly (i.e., no fine-tuning), a longer input would result in the same output as the original input. Because the T5-base model was pre-trained with But after fine-tuning the T5-base model with a longer |
What is the maximum sequence length for the T5-large? |
As the paper described, T5 uses a relative attention mechanism and the answer for this issue says, T5 can use any sequence length were the only constraint is memory.
According to this, can I use T5 to summarize inputs that have more than 512 tokens in a sequence?
The text was updated successfully, but these errors were encountered: