What is the `max_seq_len` in Mistral? #53

ParadoxZW · 2023-10-23T12:15:38Z

What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?

The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)

The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).

And the official blog mentions 16k.

The text was updated successfully, but these errors were encountered:

keyboardAnt · 2023-10-27T21:34:26Z

What is the max_seq_len (or max_position_embeddings) of Mistral-7B-v0.1 when training?

The official code says it is 128_000. (https://github.com/mistralai/mistral-src/blob/147c4e68279b90eb61b19bdea44e16f5539d5a5d/mistral/model.py#L201C69-L201C69)

The config file in huggingface says it is 32768. (https://huggingface.co/mistralai/Mistral-7B-v0.1/blob/main/config.json).

And the official blog mentions 16k.

And the paper claims an attention span of 131K tokens (Section 2 on "Architectural details" → "Sliding Window Attention").

keyboardAnt mentioned this issue Oct 31, 2023

MistralLite: max_position_embeddings=32768 and precompute_freqs_cis with end=128_000 awslabs/extending-the-context-length-of-open-source-llms#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the `max_seq_len` in Mistral? #53

What is the `max_seq_len` in Mistral? #53

ParadoxZW commented Oct 23, 2023

keyboardAnt commented Oct 27, 2023 •

edited

What is the max_seq_len in Mistral? #53

What is the max_seq_len in Mistral? #53

Comments

ParadoxZW commented Oct 23, 2023

keyboardAnt commented Oct 27, 2023 • edited

What is the `max_seq_len` in Mistral? #53

What is the `max_seq_len` in Mistral? #53

keyboardAnt commented Oct 27, 2023 •

edited