We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_seq_len
What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?
max_position_embeddings
The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)
The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).
And the official blog mentions 16k.
The text was updated successfully, but these errors were encountered:
What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training? The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69) The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json). And the official blog mentions 16k.
And the paper claims an attention span of 131K tokens (Section 2 on "Architectural details" → "Sliding Window Attention").
Sorry, something went wrong.
max_position_embeddings=32768
precompute_freqs_cis
end=128_000
No branches or pull requests
What is the
max_seq_len
(ormax_position_embeddings
) of Mistral-7B-v0.1 when training?The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)
The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).
And the official blog mentions 16k.
The text was updated successfully, but these errors were encountered: