You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks in advance! I am looking at the run_summarization.py under examples/pytorch/summarization/, in the following code snippets where I want to set max_source_length bigger than 512 where 512 is the max length T5 was pre-trained on:
if (
hasattr(model.config, "max_position_embeddings")
and model.config.max_position_embeddings < data_args.max_source_length
):
if model_args.resize_position_embeddings is None:
logger.warning(
"Increasing the model's number of position embedding vectors from"
f" {model.config.max_position_embeddings} to {data_args.max_source_length}."
)
model.resize_position_embeddings(data_args.max_source_length)
elif model_args.resize_position_embeddings:
model.resize_position_embeddings(data_args.max_source_length)
else:
raise ValueError(
f"`--max_source_length` is set to {data_args.max_source_length}, but the model only has"
f" {model.config.max_position_embeddings} position encodings. Consider either reducing"
f" `--max_source_length` to {model.config.max_position_embeddings} or to automatically resize the"
" model's position encodings by passing `--resize_position_embeddings`."
)
My questions are:
I remembered T5Config was having a max_position_embeddings parameter before (was 512), why it is removed now?
Bart also used relative position embedding like T5, but BartConfig's max_position_embeddings is kept with 1024 and when setting max_source_length longer than 1024, it does require calling resize_position_embeddings according to the code snippets above. Is it because of different relative position embedding between BART and T5.
I think I must be misunderstanding something, appreciate if some explanations can be given here. Thanks!!
The text was updated successfully, but these errors were encountered:
I don't think BART uses relative position embeddings, but rather "fixed" position embeddings ("fixed" in the sence that if seq_len > 1024 is provided the model gives an index error).
Could you maybe look into this line of code in BART:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Hi, thanks in advance! I am looking at the run_summarization.py under examples/pytorch/summarization/, in the following code snippets where I want to set
max_source_length
bigger than 512 where 512 is the max length T5 was pre-trained on:My questions are:
max_position_embeddings
parameter before (was 512), why it is removed now?max_sequence_length
is set to 1024. Since it is bigger than 512, why it is not required to callresize_position_embeddings
method like before in this issue: T5 Model : What is maximum sequence length that can be used with pretrained T5 (3b model) checkpoint? #5204 (comment)max_position_embeddings
is kept with 1024 and when settingmax_source_length
longer than 1024, it does require callingresize_position_embeddings
according to the code snippets above. Is it because of different relative position embedding between BART and T5.I think I must be misunderstanding something, appreciate if some explanations can be given here. Thanks!!
The text was updated successfully, but these errors were encountered: