Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document use of Mistral #521

Closed
borisdayma opened this issue Mar 14, 2024 · 6 comments
Closed

Document use of Mistral #521

borisdayma opened this issue Mar 14, 2024 · 6 comments

Comments

@borisdayma
Copy link

borisdayma commented Mar 14, 2024

It looks like you already support Mistral, though maybe missing sliding window attention.

Would be great to:

@borisdayma
Copy link
Author

borisdayma commented Mar 18, 2024

Looks like this is actually available: https://github.com/google/maxtext/blob/main/end_to_end/test_mistral.sh

The only thing I had to do was replace tokenizer.mistral with tokenizer.model (is it a typo or did you rename it in your bucket?).
Also I chose to convert the bfloat16 weights to float32 instead of float16 which I think could bring some imprecision.

@versae
Copy link

versae commented Mar 18, 2024

Can I ask what kind of TPU are you using for the test, @borisdayma? I do have available a v4-32 that I'd like to use to do continue pre-training on Llama2/Mistral 7B, but other frameworks seem sub-optimal so far to me.

@borisdayma
Copy link
Author

borisdayma commented Mar 18, 2024

It should work on a v3-8.
You can also try the decode.py function but for me it worked on the 7b models (gemma or mistral).

@rwitten
Copy link
Collaborator

rwitten commented Mar 26, 2024

Amazing @borisdayma! We don't actually official support Mistral (we do support Llama and Gemma) but we're thrilled things are working for you!

@borisdayma
Copy link
Author

Yeah your inference test of mistral is correct. I compared with transformers output and was getting the same.

@borisdayma
Copy link
Author

I'm closing this issue because Mistral seems to already work well after further testing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants