Document use of Mistral #521

borisdayma · 2024-03-14T17:58:43Z

It looks like you already support Mistral, though maybe missing sliding window attention.

Would be great to:

add a section about it in https://github.com/google/maxtext#supported-open-models
how to do inference (similar to gemma set-up instructions + use of decode.py)
share the converted weights or add a conversion script

The text was updated successfully, but these errors were encountered:

borisdayma · 2024-03-18T14:48:41Z

Looks like this is actually available: https://github.com/google/maxtext/blob/main/end_to_end/test_mistral.sh

The only thing I had to do was replace tokenizer.mistral with tokenizer.model (is it a typo or did you rename it in your bucket?).
Also I chose to convert the bfloat16 weights to float32 instead of float16 which I think could bring some imprecision.

versae · 2024-03-18T22:55:19Z

Can I ask what kind of TPU are you using for the test, @borisdayma? I do have available a v4-32 that I'd like to use to do continue pre-training on Llama2/Mistral 7B, but other frameworks seem sub-optimal so far to me.

borisdayma · 2024-03-18T23:07:54Z

It should work on a v3-8.
You can also try the decode.py function but for me it worked on the 7b models (gemma or mistral).

rwitten · 2024-03-26T22:50:31Z

Amazing @borisdayma! We don't actually official support Mistral (we do support Llama and Gemma) but we're thrilled things are working for you!

borisdayma · 2024-03-27T00:27:02Z

Yeah your inference test of mistral is correct. I compared with transformers output and was getting the same.

borisdayma · 2024-04-29T20:55:17Z

I'm closing this issue because Mistral seems to already work well after further testing.

borisdayma closed this as completed Apr 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document use of Mistral #521

Document use of Mistral #521

borisdayma commented Mar 14, 2024 •

edited

Loading

borisdayma commented Mar 18, 2024 •

edited

Loading

versae commented Mar 18, 2024

borisdayma commented Mar 18, 2024 •

edited

Loading

rwitten commented Mar 26, 2024

borisdayma commented Mar 27, 2024

borisdayma commented Apr 29, 2024

Document use of Mistral #521

Document use of Mistral #521

Comments

borisdayma commented Mar 14, 2024 • edited Loading

borisdayma commented Mar 18, 2024 • edited Loading

versae commented Mar 18, 2024

borisdayma commented Mar 18, 2024 • edited Loading

rwitten commented Mar 26, 2024

borisdayma commented Mar 27, 2024

borisdayma commented Apr 29, 2024

borisdayma commented Mar 14, 2024 •

edited

Loading

borisdayma commented Mar 18, 2024 •

edited

Loading

borisdayma commented Mar 18, 2024 •

edited

Loading