-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657
Conversation
Definitely need IQ4_XS: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 My only worry is the enum values will change and bugger things up in future. @jmorganca can you or other main dev here have a look at this and confirm the order is likely to stay the same even if this PR isn't used? |
Works perfectly 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works, nice!
WizardLM 2 8x22B IQ3_S on Macbook Pro M2 Max (96GB)
- 56GB ram (with 16K context)
- time to first token: 3.72s
- speed: 11.06 tok/s
I have updated it to fix IQ4_NL |
This is great. Any specific reason IQ3_M isn't included? |
I think there's still something wrong, you can quantize it but not run it with the main release. If you want to keep an eye if something new comes up and give a heads up: Check these constants on the latest release, if there's something new then we can add it once that release is included in ollama. |
Any chance of getting IQ2M, IQ3XS, IQ3M, IQ4XS, IQ4 added? I really would like those. |
@sammcj Do you have to approve it again? |
Nah it's just waiting on someone with contributor level access to merge it. |
Thanks for doing this @mann1x, this looks good. There's another ongoing PR that moves some of this stuff around (#3682) which is going in soon, so I'll get this merged once that is in to prevent conflicts. If you'd like you can use this branch I made to test the changes as a reference for how to rebase this branch once #3682 goes into main, otherwise I can just merge things through for you. |
@BruceMacD |
Any update on getting this merged? Just went to create a model and was reminded these are missing.
|
Legend, thanks Bruce! |
Just wanted to say thanks again for this, it's really amazing being able to run 70b models on a single 24GB GPU at a decent speed without having them degrade* in quality to the point where smaller models make more sense. For example my single RTX3090 server can now run Llama 3 70b with a now supported iq2_xs quant size get achieve 21tk/s with Ollama 🎉
|
This patch adds support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS.
IQ4_NL is using a different format, have to investigate further what are the differences.