Add support for mistral type Model to use Mistral and Zephyr #1553

manjunathshiva · 2023-11-27T06:00:45Z

Feature request

Using airllm to used 4GB GPU for mistral type Model gives me below error

File "C:\model.py", line 5, in
model = AirLLMLlama2("./modles/zephyr-7b-beta")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\LLM\venv\Lib\site-packages\airllm\airllm.py", line 184, in init
self.init_model()
File "C:\LLM\venv\Lib\site-packages\airllm\airllm.py", line 197, in init_model
self.model = BetterTransformer.transform(self.model) # enable flash attention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\305031856\AppData\Local\Programs\Python\Python311\Lib\contextlib.py", line 81, in inner
return func(*args, **kwds)
^^^^^^^^^^^^^^^^^^^
File "C:\LLM\venv\Lib\site-packages\optimum\bettertransformer\transformation.py", line 228, in transform
raise NotImplementedError(
NotImplementedError: The model type mistral is not yet supported to be used with BetterTransformer. Feel free to open an issue at https://github.com/huggingface/optimum/issues if you would like this model type to be supported. Currently supported models are: dict_keys(['albert', 'bark', 'bart', 'bert', 'bert-generation', 'blenderbot', 'bloom', 'camembert', 'blip-2', 'clip', 'codegen', 'data2vec-text', 'deit', 'distilbert', 'electra', 'ernie', 'fsmt', 'falcon', 'gpt2', 'gpt_bigcode', 'gptj', 'gpt_neo', 'gpt_neox', 'hubert', 'layoutlm', 'llama', 'm2m_100', 'marian', 'markuplm', 'mbart', 'opt', 'pegasus', 'rembert', 'prophetnet', 'roberta', 'roc_bert', 'roformer', 'splinter', 'tapas', 't5', 'vilt', 'vit', 'vit_mae', 'vit_msn', 'wav2vec2', 'whisper', 'xlm-roberta', 'yolos']).

Motivation

Zephyr is currently the leading model in Hugging Face so support is very much needed !

Your contribution

Yes I can help if any help needed! Am a Senior Software Engineer with 17 years of Industry experience,.

Govind-S-B · 2023-11-28T00:05:24Z

I tried looking various listed issues around as well and seems like its been unaddressed for more than a month. I was thinking of finally adding support for mistral architecture on my own , even though I dont know much about it .
Found this resource in the docs which might help : https://huggingface.co/docs/optimum/bettertransformer/tutorials/contribute.
I am also trying to get Air LLM working with mistral , good to see others are working on the same

manjunathshiva · 2023-11-28T03:06:46Z

Thank you very much! Mistral 7B is top model which out performed Llama 13B in few cases. Zephyr-7b-beta from Hugging Face which is finetuned from Mistral is the best one which even beats Llama 70B in few cases. Adding support for Mistral will open up Mistral and zephyr model. Thanks for the link for contribution.

Govind-S-B · 2023-11-28T12:13:40Z

Btw I dont think pursuing performance improvements using airllm is worth it , I tried it with a 34B param model and its really really slow on my 8GB card , the bottleneck is gonna be the processing power. A quantized model loaded straight into card is better imo

manjunathshiva · 2023-12-05T11:12:59Z

Thanks for the update! I think update may not be required until model becomes faster!

fxmarty · 2023-12-13T12:38:48Z

Hi @manjunathshiva, in Transformers 4.36 release we started adding native torch.nn.functional.scaled_dot_product_attention support for decoder models (see https://github.com/huggingface/transformers/releases/tag/v4.36.0 & https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-and-memory-efficient-attention-through-pytorchs-scaleddotproductattention).

As for decoder models we do not use nested tensors and simply rely on SDPA, let's add this directly in Transformers.

I opened the issue huggingface/transformers#28005 in Transformers to track the support. Please continue the discussion there!

jesulo · 2024-01-19T15:25:41Z

Hi, BetterTransformer support Mistral? or Solar Mistral? Regards

pradeepdev-1995 · 2024-01-22T07:17:44Z

any updates on this is BetterTransformer support Mistral?

fxmarty · 2024-01-22T08:30:19Z

Hi @jesulo @pradeepdev-1995, BetterTransformer optimization for Mistral (which in our case is simply calling PyTorch's SDPA op instead of manual attention) has been integrated in Transformers natively, see https://huggingface.co/docs/transformers/v4.37.0/en/perf_infer_gpu_one#bettertransformer and https://huggingface.co/docs/transformers/v4.37.0/en/perf_infer_gpu_one#pytorch-scaled-dot-product-attention, as long as you use torch>=2.1.1.

fxmarty mentioned this issue Dec 13, 2023

Open to contribution: adding torch.nn.functional.scaled_dot_product_attention support for more architectures huggingface/transformers#28005

Open

6 tasks

fxmarty closed this as completed Dec 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for mistral type Model to use Mistral and Zephyr #1553

Add support for mistral type Model to use Mistral and Zephyr #1553

manjunathshiva commented Nov 27, 2023

Govind-S-B commented Nov 28, 2023

manjunathshiva commented Nov 28, 2023

Govind-S-B commented Nov 28, 2023

manjunathshiva commented Dec 5, 2023

fxmarty commented Dec 13, 2023

jesulo commented Jan 19, 2024

pradeepdev-1995 commented Jan 22, 2024

fxmarty commented Jan 22, 2024

Add support for mistral type Model to use Mistral and Zephyr #1553

Add support for mistral type Model to use Mistral and Zephyr #1553

Comments

manjunathshiva commented Nov 27, 2023

Feature request

Motivation

Your contribution

Govind-S-B commented Nov 28, 2023

manjunathshiva commented Nov 28, 2023

Govind-S-B commented Nov 28, 2023

manjunathshiva commented Dec 5, 2023

fxmarty commented Dec 13, 2023

jesulo commented Jan 19, 2024

pradeepdev-1995 commented Jan 22, 2024

fxmarty commented Jan 22, 2024