Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

mann1x · 2024-04-15T17:33:46Z

This patch adds support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS.

IQ4_NL is using a different format, have to investigate further what are the differences.

jukofyork · 2024-04-15T19:23:15Z

Definitely need IQ4_XS:

https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9

My only worry is the enum values will change and bugger things up in future.

@jmorganca can you or other main dev here have a look at this and confirm the order is likely to stay the same even if this PR isn't used?

jukofyork · 2024-04-15T23:54:51Z

Works perfectly 👍

sammcj

Works, nice!

WizardLM 2 8x22B IQ3_S on Macbook Pro M2 Max (96GB)

56GB ram (with 16K context)
time to first token: 3.72s
speed: 11.06 tok/s

mann1x · 2024-04-17T16:41:57Z

I have updated it to fix IQ4_NL

lowlyocean · 2024-04-19T00:28:44Z

This is great. Any specific reason IQ3_M isn't included?

mann1x · 2024-04-19T06:36:31Z

This is great. Any specific reason IQ3_M isn't included?

I think there's still something wrong, you can quantize it but not run it with the main release.

If you want to keep an eye if something new comes up and give a heads up:
https://github.com/ggerganov/llama.cpp/blob/bca40e98149c7b673558ddd7a3ebeffef789349d/gguf-py/gguf/constants.py#L762

Check these constants on the latest release, if there's something new then we can add it once that release is included in ollama.

zedmango · 2024-04-19T10:46:30Z

Any chance of getting IQ2M, IQ3XS, IQ3M, IQ4XS, IQ4 added? I really would like those.

mann1x · 2024-04-19T10:57:04Z

@sammcj Do you have to approve it again?

sammcj · 2024-04-19T11:01:03Z

Nah it's just waiting on someone with contributor level access to merge it.

BruceMacD · 2024-05-03T23:08:46Z

Thanks for doing this @mann1x, this looks good. There's another ongoing PR that moves some of this stuff around (#3682) which is going in soon, so I'll get this merged once that is in to prevent conflicts.

If you'd like you can use this branch I made to test the changes as a reference for how to rebase this branch once #3682 goes into main, otherwise I can just merge things through for you.
d40497b

mann1x · 2024-05-04T05:15:36Z

@BruceMacD
I'm a bit overloaded lately, if you can do the merge I'd really appreciate it! Thanks

sammcj · 2024-05-10T03:32:55Z

Any update on getting this merged?

Just went to create a model and was reminded these are missing.

ollama create meta-llama-3-70b-instruct-bartowski:iq2_m -f Modelfile-llama3
transferring model data ⠇
transferring model data
Error: invalid file magic

BruceMacD · 2024-05-10T20:26:47Z

@sammcj I've rebased these changes onto the new structure in main in #4322, hoping to get it merged for v0.1.36. Thanks for bringing this to our attention originally.

Closing this PR now to carry the commit in #4322

sammcj · 2024-05-11T00:15:03Z

Legend, thanks Bruce!

sammcj · 2024-05-25T07:23:39Z

Just wanted to say thanks again for this, it's really amazing being able to run 70b models on a single 24GB GPU at a decent speed without having them degrade* in quality to the point where smaller models make more sense.

For example my single RTX3090 server can now run Llama 3 70b with a now supported iq2_xs quant size get achieve 21tk/s with Ollama 🎉

ollama run meta-llama-3-70b-instruct-maziyarpanahi:iq2_xs tell me a short joke --verbose
Here's one:

Why did the computer go to the doctor?

It had a virus!

Hope that made you laugh!

total duration:       1.685537801s
load duration:        552.816µs
prompt eval count:    14 token(s)
prompt eval duration: 455.07ms
prompt eval rate:     30.76 tokens/s
eval count:           25 token(s)
eval duration:        1.188925s
eval rate:            21.03 tokens/s   <---

ollama ps
NAME                                           	ID          	SIZE 	PROCESSOR	UNTIL
meta-llama-3-70b-instruct-maziyarpanahi:iq2_xs	a5fe03111c70	23 GB	100% GPU 	43 minutes from now

*source

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional

58520ca

mann1x mentioned this pull request Apr 15, 2024

Ollama fails to create models when using IQ quantized GGUFs - Error: invalid file magic #3622

Closed

sammcj approved these changes Apr 16, 2024

View reviewed changes

mann1x added 2 commits April 17, 2024 18:39

Merge branch 'ollama:main' into mannix-gguf

6244b88

Update blockSize method to fix IQ4_NL

5ba2eb1

BruceMacD self-assigned this May 3, 2024

BruceMacD mentioned this pull request May 3, 2024

quantize any fp16/fp32 model #3682

Merged

BruceMacD mentioned this pull request May 10, 2024

Add support for IQ quantizarions #4322

Merged

BruceMacD closed this May 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

mann1x commented Apr 15, 2024

jukofyork commented Apr 15, 2024

jukofyork commented Apr 15, 2024

sammcj left a comment •

edited

mann1x commented Apr 17, 2024

lowlyocean commented Apr 19, 2024

mann1x commented Apr 19, 2024

zedmango commented Apr 19, 2024

mann1x commented Apr 19, 2024

sammcj commented Apr 19, 2024

BruceMacD commented May 3, 2024

mann1x commented May 4, 2024

sammcj commented May 10, 2024

BruceMacD commented May 10, 2024

sammcj commented May 11, 2024

sammcj commented May 25, 2024 •

edited

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

Add support for IQ1_S, IQ3_S, IQ2_S, IQ4_XS. IQ4_NL is not functional #3657

Conversation

mann1x commented Apr 15, 2024

jukofyork commented Apr 15, 2024

jukofyork commented Apr 15, 2024

sammcj left a comment • edited

Choose a reason for hiding this comment

mann1x commented Apr 17, 2024

lowlyocean commented Apr 19, 2024

mann1x commented Apr 19, 2024

zedmango commented Apr 19, 2024

mann1x commented Apr 19, 2024

sammcj commented Apr 19, 2024

BruceMacD commented May 3, 2024

mann1x commented May 4, 2024

sammcj commented May 10, 2024

BruceMacD commented May 10, 2024

sammcj commented May 11, 2024

sammcj commented May 25, 2024 • edited

sammcj left a comment •

edited

sammcj commented May 25, 2024 •

edited