Skip to content
This repository was archived by the owner on Jun 3, 2025. It is now read-only.

Conversation

@dbogunowicz
Copy link
Contributor

@dbogunowicz dbogunowicz commented Nov 28, 2023

This PR: #1826 has added new functionality so that when no sequence_length is being specified, we are defaulting to a max sequence length from the config. Unfortunately, as discussed in the PR comments of 1826, indeed - not only an HF config is not guaranteed to have max_position_embeddings attribute, but this information can be present under a different key value.
E.g
For the model TinyLlama-1.1B-Chat-v0.3 we are looking at config.max_position_embeddings
For the model zoo:mpt-7b-mpt_chat_mpt_pretrain-pruned80_quantized we are looking at config.max_seq_len

This PR:

  1. Adds max_seq_len to the set of config attributes that may be potentially used to infer the default sequence_length
  2. Adds the behaviour that raises a ValueError if we are not able to infer the default sequence_length (the user should specify it manually then)

)

if sequence_length is None:
if hasattr(config, "max_position_embeddings"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a config have both attributes? If so, is max_position_embeddings preferred over max_seq_len, as coded here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Regarding both keys being simultaneously present- honestly do not know, I would need to go through more example configs. But preferring max_position_embeddings or max_seq_len sounds reasonable to me.

@bfineran bfineran merged commit 8f2296f into main Nov 29, 2023
@bfineran bfineran deleted the feature/damian/fix_default_sequence_length branch November 29, 2023 15:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants