-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document use of Mistral #521
Comments
Looks like this is actually available: https://github.com/google/maxtext/blob/main/end_to_end/test_mistral.sh The only thing I had to do was replace |
Can I ask what kind of TPU are you using for the test, @borisdayma? I do have available a v4-32 that I'd like to use to do continue pre-training on Llama2/Mistral 7B, but other frameworks seem sub-optimal so far to me. |
It should work on a v3-8. |
Amazing @borisdayma! We don't actually official support Mistral (we do support Llama and Gemma) but we're thrilled things are working for you! |
Yeah your inference test of mistral is correct. I compared with |
I'm closing this issue because Mistral seems to already work well after further testing. |
It looks like you already support Mistral, though maybe missing sliding window attention.
Would be great to:
decode.py
)The text was updated successfully, but these errors were encountered: