-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Mistral support 💨 #54
Conversation
Imported from transformers sha1: a2ede6667 (current main branch). This allows to use recent static cache support. The only changes are: - fixed the import paths, - added a workaround to avoid having to import SlidingWindowCache or having to modify the file too much.
This will allow using the same example for other models, such as mistralai/Mistral-7B-v0.3
There is no point in using code to sync multiple TPUs when using only one.
This will prevent downloading consolidated weights uselessly, as for the Mistral repo.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@tengomucho I successfully ran TGI server with Mistral-7B-Instruct-v0.2 on TPU v5litepod-8. Does optimum-tpu utilize all the 8 Cores? |
@Bihan yes it does. I filtered only the debug messages from rank 0 to avoid spamming 😄 See here: optimum-tpu/optimum/tpu/xla_logger.py Lines 18 to 20 in eb1d7c9
|
@tengomucho Does this mean models with |
@tengomucho I have a question regarding the Here is the command for reference |
What does this PR do?
Added support for inference on Mistral 7B models. Tested with
Mistral-7B-v0.3
.Before submitting