hqq JIT Quantization #147

flozi00 · 2023-12-22T09:38:02Z

What does this PR do?

As always not tested yet
Blind coding at the moment

flozi00 · 2023-12-22T11:55:15Z

@tgaddair could build an docker image or test it ?

tgaddair · 2023-12-22T18:22:22Z

@flozi00 docker pull ghcr.io/predibase/lorax:hqq

flozi00 · 2024-01-04T09:40:04Z

Its pretty slow
around 1 token per second on small cards
Not sure if it makes sense then

Alternative for 2bit quant would be Quip#, but its not data free

flozi00 · 2024-01-09T21:09:38Z

@tgaddair what do you think ?

tgaddair · 2024-01-09T21:18:59Z

I'm fine closing this for now if the latency is prohibitive. I'm not familiar with Quip, but open to adding it in if it's useful. Curious if GPT-Q 3-bit would be worth exploring if we're looking for something lower than 4-bit?

michaelfeil · 2024-01-11T15:17:46Z

@flozi00 Maybe you want to avoid quantizing the lm head. Not sure if thats the bottleneck for performance. Also the quantization is propably going to fail bf16. I recently got aware of hqq in this PR and opened a ticket for hqq quantization in hf/transformers, perhaps it will become second class citizen there in a reasonable amount of time.

flozi00 · 2024-01-17T14:59:13Z

did some more testing
I think it could be a good addition for dev purposes, for example like adding larger arches on rtx 4090

tgaddair

@flozi00 I noticed that some of hqq's dependencies like functorch try to install torch<2.1, which overrides our current version of torch. Is this what you're seeing in your environment as well?

flozi00 · 2024-01-17T18:07:45Z

install_requires=['numpy>=1.24.4','tqdm>=4.64.1', 'huggingface_hub', 'accelerate', 'timm', 'transformers>=4.36.1', 'termcolor'],

from the hqq setup.py
maybe its from the timm package ?

I will take a closer look to that, in my env i have not seen this

flozi00 · 2024-01-17T20:33:58Z

Okay i checked the docker build logs
It's not from hqq, there is another package with the requirement torch>=1.13.0
Haven't found which one
In the docker it results in installing torch 2.1.2

flozi00 · 2024-01-17T20:41:13Z

https://github.com/predibase/lorax/actions/runs/7559682310/job/20584021297#step:10:1021

Seems to be coming from peft 0.4.0 requirement

tgaddair · 2024-01-17T22:45:37Z

Interesting, let me try digging into the docker image and see if there are any package differences. If not, then it should be safe to merge.

flozi00 · 2024-01-19T19:01:15Z

should be fixed now by setting torch to the latest version 2.1.2
now pip won't install torch again

tgaddair

LGTM! Thanks for addressing the PyTorch version issue.

flozi00 added 2 commits December 22, 2023 10:36

hqq JIT Quantization

090375d

Update transformers version to 4.36.1

98200dd

flozi00 linked an issue Dec 22, 2023 that may be closed by this pull request

HQQ just in time quantization #124

Closed

Fix initialization of nn.Linear in get_linear function

d2ea523

tgaddair added 2 commits December 22, 2023 09:48

TEST: docker

d35a1bb

Generate rquirements

2278789

flozi00 added 2 commits January 4, 2024 09:22

fix cli

4976889

ttf lora loading

d650c4d

Merge branch 'main' into hqq

214bccb

flozi00 requested a review from tgaddair January 17, 2024 14:58

tgaddair added 2 commits January 17, 2024 09:42

Updated deps

47993a5

Revrt build.yaml

f9600eb

tgaddair reviewed Jan 17, 2024

View reviewed changes

torch version

260443d

tgaddair approved these changes Jan 22, 2024

View reviewed changes

tgaddair merged commit c178802 into main Jan 22, 2024
1 check passed

tgaddair deleted the hqq branch January 22, 2024 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hqq JIT Quantization #147

hqq JIT Quantization #147

flozi00 commented Dec 22, 2023

flozi00 commented Dec 22, 2023

tgaddair commented Dec 22, 2023

flozi00 commented Jan 4, 2024

flozi00 commented Jan 9, 2024

tgaddair commented Jan 9, 2024

michaelfeil commented Jan 11, 2024

flozi00 commented Jan 17, 2024

tgaddair left a comment

flozi00 commented Jan 17, 2024

flozi00 commented Jan 17, 2024

flozi00 commented Jan 17, 2024

tgaddair commented Jan 17, 2024

flozi00 commented Jan 19, 2024

tgaddair left a comment

hqq JIT Quantization #147

hqq JIT Quantization #147

Conversation

flozi00 commented Dec 22, 2023

What does this PR do?

flozi00 commented Dec 22, 2023

tgaddair commented Dec 22, 2023

flozi00 commented Jan 4, 2024

flozi00 commented Jan 9, 2024

tgaddair commented Jan 9, 2024

michaelfeil commented Jan 11, 2024

flozi00 commented Jan 17, 2024

tgaddair left a comment

Choose a reason for hiding this comment

flozi00 commented Jan 17, 2024

flozi00 commented Jan 17, 2024

flozi00 commented Jan 17, 2024

tgaddair commented Jan 17, 2024

flozi00 commented Jan 19, 2024

tgaddair left a comment

Choose a reason for hiding this comment