New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Mistral models #4111
Comments
And add the mistral instruction-template please don't forget |
The requirements for the instruction template seem to be:
I'm using Mistral right now with the settings in the attached screenshot. I put the <s> part in the 'command for chat-instruct mode' box. But that is not saved as part of the instruction template. I'm not sure how to handle the </s> part, but it doesn't show up as part of the bot's responses, at least in chat-instruct mode with the cai-chat theme. That sentence end character also doesn't show in the notebook view, e.g.
|
This should work as a YAML for Mistral-Instruct. Also Mistral appears to work out of the box if you use a GGUF file and llama.cpp as a loader. Though I'm not 100% on if llamacpp is properly doing sliding window and grouped-query attention, haven't been keeping up as much as I'd like with the state of that - but it will take a larger (8K+) context window without breaking. user: '[INST] '
bot: ' [/INST]'
turn_template: <s><|user|><|user-message|><|bot|>\n<|bot-message|></s>\n The sentence-end character won't show up on the first "turn" if you export the chat to notebook view, but will be placed before the next chat "turn" when you send it. |
Just to pipe in here-- TheBloke/Mistral-7B-Instruct-v0.1-AWQ seems to work alright with ooba. There is some occasional discontinuity between the question I asked and the answer. Sometimes it seems to answer questions from earlier and sometimes it gets answers factually wrong... but it works. |
How did you get it to work? I tried TheBloke/dolphin-2.0-mistral-7B-AWQ and it gives the explicit message that it's not supported:
Ooba updated and tried that specific model too and same error |
Apply PR #3999 and it should work. I use |
TheBloke just released the GPTQ version. |
None of the GPTQ models has loaded successfully with me. I've tried I've then applied PR #3999 and downloaded the Moreover, the conda console also throws an error after submitting the prompt in chat:
|
Solution 1: (speed will be 2x times slower)I got it to work by modifying 2 files in packages:
modified files: https://gist.github.com/Pozaza/c8335bbcbbd4a73dd3bec1a9644b6865 Solution 2:
|
Mistral GPTQ models give me the following error when loaded with exllamav2:
Anyone facing the same issue? |
TheBloke/Mistral-7B-OpenOrca-GPTQ worked for me with the
Using an RTX 3070, with |
For those who want to try @InfernalWraith solution: if you don't have disable_exllama as an option, "auto-devices" do the trick as well |
This issue has been closed due to inactivity for 6 weeks. If you believe it is still relevant, please leave a comment below. You can tag a developer in your comment. |
This is still a problem. |
I'm getting this as well? :( |
Same problem here:
|
I'm also getting the same problem as many users above... |
Also unable to load with error: |
A new set of 7b foundational models that claim to beat all 13b Llama 2 models in benchmarks.
https://huggingface.co/mistralai/Mistral-7B-v0.1
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1
I am unsure if anything special with the tokenizer or with how the context extension is designed make this a unique challenge, as this is not a typical Llama 2 train.
The text was updated successfully, but these errors were encountered: