Add Mistral support 💨 #54

tengomucho · 2024-06-13T12:35:33Z

What does this PR do?

Added support for inference on Mistral 7B models. Tested withMistral-7B-v0.3.

Before submitting

Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Imported from transformers sha1: a2ede6667 (current main branch). This allows to use recent static cache support. The only changes are: - fixed the import paths, - added a workaround to avoid having to import SlidingWindowCache or having to modify the file too much.

This will allow using the same example for other models, such as mistralai/Mistral-7B-v0.3

There is no point in using code to sync multiple TPUs when using only one.

This will prevent downloading consolidated weights uselessly, as for the Mistral repo.

HuggingFaceDocBuilderDev · 2024-06-13T12:40:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Bihan · 2024-07-11T13:17:32Z

@tengomucho I successfully ran TGI server with Mistral-7B-Instruct-v0.2 on TPU v5litepod-8. Does optimum-tpu utilize all the 8 Cores?
In the below DEBUG log it only prints for Rank 0.
2024-07-11T12:48:00.752321Z DEBUG text_generation_launcher: Rank 0 on xla:0 real device ['TPU:0'] ordinal 0 world size 8

tengomucho · 2024-07-11T15:44:51Z

@Bihan yes it does. I filtered only the debug messages from rank 0 to avoid spamming 😄 See here:

optimum-tpu/optimum/tpu/xla_logger.py

Lines 18 to 20 in eb1d7c9

    
           def debug(message: str): 
        
               if xm.get_ordinal() == 0: 
        
                   logger.opt(depth=1).debug(message)

Bihan · 2024-07-14T09:56:29Z

@tengomucho Does this mean models with pickle files may cause issues?

Bihan · 2024-07-15T08:11:01Z

@tengomucho I have a question regarding the --trust-remote-code flag in the text-generation-launcher command. If this flag is ignored, does it mean that optimum-tpu TGI assumes all remote code is safe to execute? Specifically, I'm concerned about potential security implications, such as the risks associated with using pickle for deserialization.

Here is the command for reference
text-generation-launcher --port 8000 --trust-remote-code

tengomucho added 9 commits June 12, 2024 09:40

feat(mistral): add inference sharding on Linear modules

eca1384

feat(examples): generalize text generation example to other models

75d19da

This will allow using the same example for other models, such as mistralai/Mistral-7B-v0.3

feat(inference): use Linear when world_size is 1

18684ee

There is no point in using code to sync multiple TPUs when using only one.

refactor(tests): try to reduce repetition for decode tests

8708dd4

test(tgi): added test for Mistral-7B-v0.3

448181e

feat(tests): delete generator to prevent getting stuck when failing

562a401

chore(doc): updated mention to Mistral

f640a31

feat(model): filter only for model safetensors

7667cc8

This will prevent downloading consolidated weights uselessly, as for the Mistral repo.

tengomucho requested a review from mfuntowicz June 13, 2024 12:35

mfuntowicz approved these changes Jun 17, 2024

View reviewed changes

tengomucho merged commit 3900bd7 into main Jun 17, 2024
3 checks passed

tengomucho deleted the mistral branch June 17, 2024 08:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mistral support 💨 #54

Add Mistral support 💨 #54

tengomucho commented Jun 13, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jun 13, 2024

Bihan commented Jul 11, 2024

tengomucho commented Jul 11, 2024

Bihan commented Jul 14, 2024

Bihan commented Jul 15, 2024

Add Mistral support 💨 #54

Add Mistral support 💨 #54

Conversation

tengomucho commented Jun 13, 2024 • edited Loading

What does this PR do?

Before submitting

HuggingFaceDocBuilderDev commented Jun 13, 2024

Bihan commented Jul 11, 2024

tengomucho commented Jul 11, 2024

Bihan commented Jul 14, 2024

Bihan commented Jul 15, 2024

tengomucho commented Jun 13, 2024 •

edited

Loading