Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for Mistral models #4111

Closed
kalomaze opened this issue Sep 27, 2023 · 18 comments
Closed

Support for Mistral models #4111

kalomaze opened this issue Sep 27, 2023 · 18 comments
Labels
enhancement New feature or request stale

Comments

@kalomaze
Copy link
Contributor

kalomaze commented Sep 27, 2023

A new set of 7b foundational models that claim to beat all 13b Llama 2 models in benchmarks.

https://huggingface.co/mistralai/Mistral-7B-v0.1
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1

I am unsure if anything special with the tokenizer or with how the context extension is designed make this a unique challenge, as this is not a typical Llama 2 train.

@kalomaze kalomaze added the enhancement New feature or request label Sep 27, 2023
@SrVill
Copy link

SrVill commented Sep 27, 2023

And add the mistral instruction-template please don't forget

@rahimnathwani
Copy link

rahimnathwani commented Sep 28, 2023

The requirements for the instruction template seem to be:

  • <s> at the beginning of the conversation, but not later.
  • [INST] before the user message
  • [/INST] before the bot message
  • Bot ends each message with </s>

I'm using Mistral right now with the settings in the attached screenshot. I put the <s> part in the 'command for chat-instruct mode' box. But that is not saved as part of the instruction template.

image

I'm not sure how to handle the </s> part, but it doesn't show up as part of the bot's responses, at least in chat-instruct mode with the cai-chat theme.

That sentence end character also doesn't show in the notebook view, e.g.

<s>[INST]Tell me the name of Mary J Blige's first album[/INST] The name of Mary J. Blige's first studio album is "What's the 411?" It was released on August 26, 1992, by Puffy Records and became her debut solo album after previously recording with the group Children of the Corn.[INST]Tell me more about that group[/INST] Children of the Corn were an American hip hop group composed of Mary J. Blige, K-Ci and K-Poke, DJ Clue, and Redman. They formed in New York City during the mid-1980s and gained popularity with their debut single "I Can't Quit You Baby" which became a dance club hit in 1987. The group continued to release music throughout the 1990s, including several successful albums, before disbanding in 2000.

@travco
Copy link

travco commented Sep 29, 2023

This should work as a YAML for Mistral-Instruct. Also Mistral appears to work out of the box if you use a GGUF file and llama.cpp as a loader. Though I'm not 100% on if llamacpp is properly doing sliding window and grouped-query attention, haven't been keeping up as much as I'd like with the state of that - but it will take a larger (8K+) context window without breaking.

user: '[INST] '
bot: ' [/INST]'
turn_template: <s><|user|><|user-message|><|bot|>\n<|bot-message|></s>\n

The sentence-end character won't show up on the first "turn" if you export the chat to notebook view, but will be placed before the next chat "turn" when you send it.

@fat-tire
Copy link

fat-tire commented Oct 1, 2023

Just to pipe in here-- TheBloke/Mistral-7B-Instruct-v0.1-AWQ seems to work alright with ooba. There is some occasional discontinuity between the question I asked and the answer. Sometimes it seems to answer questions from earlier and sometimes it gets answers factually wrong... but it works.

@DeSinc
Copy link

DeSinc commented Oct 3, 2023

Just to pipe in here-- TheBloke/Mistral-7B-Instruct-v0.1-AWQ seems to work alright with ooba. There is some occasional discontinuity between the question I asked and the answer. Sometimes it seems to answer questions from earlier and sometimes it gets answers factually wrong... but it works.

How did you get it to work? I tried TheBloke/dolphin-2.0-mistral-7B-AWQ and it gives the explicit message that it's not supported:

TypeError: mistral isn’t supported yet.

Ooba updated and tried that specific model too and same error

@fat-tire
Copy link

fat-tire commented Oct 3, 2023

Apply PR #3999 and it should work. I use n_batch set to 1 and check no_inject_fused_attention to avoid memory errors.

@lijunle
Copy link

lijunle commented Oct 4, 2023

TheBloke just released the GPTQ version.

https://huggingface.co/TheBloke/Mistralic-7B-1-GPTQ

@anammari
Copy link

anammari commented Oct 5, 2023

None of the GPTQ models has loaded successfully with me.

I've tried mistral_7b_instruct_v0.1_gptq and mistralic_7b_1_gptq and tried all available loaders.

I've then applied PR #3999 and downloaded the mistral_7b_instruct_v0.1_awq model which has loaded successfully using AutoAWQ loader. However, the LLM stays unresponsive in both chat and chat-instruct modes.

Moreover, the conda console also throws an error after submitting the prompt in chat:

Traceback (most recent call last):
  File "C:\Users\Ahmad\github\text-generation-webui\modules\callbacks.py", line 56, in gentask
    ret = self.mfunc(callback=_callback, *args, **self.kwargs)
  File "C:\Users\Ahmad\github\text-generation-webui\modules\text_generation.py", line 347, in generate_with_callback
    shared.model.generate(**kwargs)
  File "C:\Users\Ahmad\.conda\envs\textgen\lib\site-packages\awq\models\base.py", line 36, in generate
    return self.model.generate(*args, **kwargs)
  File "C:\Users\Ahmad\.conda\envs\textgen\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\Ahmad\.conda\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 1652, in generate
    return self.sample(
  File "C:\Users\Ahmad\.conda\envs\textgen\lib\site-packages\transformers\generation\utils.py", line 2770, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

@Pozaza
Copy link

Pozaza commented Oct 6, 2023

Solution 1: (speed will be 2x times slower)

I got it to work by modifying 2 files in packages:

  • venv\Lib\site-packages\auto_gptq\modeling\_const.py
  • venv\Lib\site-packages\auto_gptq\modeling\auto.py

modified files: https://gist.github.com/Pozaza/c8335bbcbbd4a73dd3bec1a9644b6865

Solution 2:

  1. activate your virtual environment
  2. pip install git+https://github.com/huggingface/transformers
  3. Start webui, and choose any ExLlama model loader

@Thireus
Copy link
Contributor

Thireus commented Oct 6, 2023

Mistral GPTQ models give me the following error when loaded with exllamav2:

RuntimeError: q_weight and gptq_qzeros have incompatible shapes

Anyone facing the same issue?

@InfernalWraith
Copy link

TheBloke/Mistral-7B-OpenOrca-GPTQ worked for me with the ExLlamav2_HF model loader out of the box. It also worked for the Transformer model loader, though I got the following error so I had to tick the disable_exllama option to get it to work

ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.
You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object

Using an RTX 3070, with ExLlamav2_HF I get about 11.5 tokens/s, whereas with Transformer I get about 4.5.

@yquemener
Copy link

For those who want to try @InfernalWraith solution: if you don't have disable_exllama as an option, "auto-devices" do the trick as well

@github-actions github-actions bot added the stale label Dec 11, 2023
Copy link

This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment.

@controldev
Copy link

Mistral GPTQ models give me the following error when loaded with exllamav2:

RuntimeError: q_weight and gptq_qzeros have incompatible shapes

Anyone facing the same issue?

This is still a problem.

@oliverban
Copy link

I'm getting this as well? :(

@oldmanjk
Copy link

Same problem here:

Traceback (most recent call last):
  File "/home/j/text-generation-webui/modules/ui_model_menu.py", line 245, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/j/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/j/text-generation-webui/modules/models.py", line 380, in ExLlamav2_HF_loader
    return Exllamav2HF.from_pretrained(model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/j/text-generation-webui/modules/exllamav2_hf.py", line 181, in from_pretrained
    return Exllamav2HF(config)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/j/text-generation-webui/modules/exllamav2_hf.py", line 50, in __init__
    self.ex_model.load(split)
  File "/home/j/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 266, in load
    for item in f: x = item
  File "/home/j/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/model.py", line 284, in load_gen
    module.load()
  File "/home/j/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/attn.py", line 189, in load
    self.q_proj.load()
  File "/home/j/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/linear.py", line 55, in load
    self.q_handle = ext.make_q_matrix(w, self.temp_dq)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/j/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2/ext.py", line 236, in make_q_matrix
    return ext_c.make_q_matrix(w["qweight"],
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: q_weight and gptq_qzeros have incompatible shapes

@bryankruman
Copy link

I'm also getting the same problem as many users above...
RuntimeError: q_weight and gptq_qzeros have incompatible shapes

@Ragnarok700
Copy link

Also unable to load with error:
RuntimeError: q_weight and gptq_qzeros have incompatible shapes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request stale
Projects
None yet
Development

No branches or pull requests