Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hqq JIT Quantization #147

Merged
merged 11 commits into from
Jan 22, 2024
Merged

hqq JIT Quantization #147

merged 11 commits into from
Jan 22, 2024

Conversation

flozi00
Copy link
Collaborator

@flozi00 flozi00 commented Dec 22, 2023

What does this PR do?

As always not tested yet
Blind coding at the moment

@flozi00 flozi00 linked an issue Dec 22, 2023 that may be closed by this pull request
@flozi00
Copy link
Collaborator Author

flozi00 commented Dec 22, 2023

@tgaddair could build an docker image or test it ?

@tgaddair
Copy link
Contributor

@flozi00 docker pull ghcr.io/predibase/lorax:hqq

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 4, 2024

Its pretty slow
around 1 token per second on small cards
Not sure if it makes sense then

Alternative for 2bit quant would be Quip#, but its not data free

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 9, 2024

@tgaddair what do you think ?

@tgaddair
Copy link
Contributor

tgaddair commented Jan 9, 2024

I'm fine closing this for now if the latency is prohibitive. I'm not familiar with Quip, but open to adding it in if it's useful. Curious if GPT-Q 3-bit would be worth exploring if we're looking for something lower than 4-bit?

@michaelfeil
Copy link
Contributor

@flozi00 Maybe you want to avoid quantizing the lm head. Not sure if thats the bottleneck for performance. Also the quantization is propably going to fail bf16. I recently got aware of hqq in this PR and opened a ticket for hqq quantization in hf/transformers, perhaps it will become second class citizen there in a reasonable amount of time.

@flozi00 flozi00 requested a review from tgaddair January 17, 2024 14:58
@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 17, 2024

did some more testing
I think it could be a good addition for dev purposes, for example like adding larger arches on rtx 4090

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flozi00 I noticed that some of hqq's dependencies like functorch try to install torch<2.1, which overrides our current version of torch. Is this what you're seeing in your environment as well?

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 17, 2024

install_requires=['numpy>=1.24.4','tqdm>=4.64.1', 'huggingface_hub', 'accelerate', 'timm', 'transformers>=4.36.1', 'termcolor'],

from the hqq setup.py
maybe its from the timm package ?

I will take a closer look to that, in my env i have not seen this

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 17, 2024

Okay i checked the docker build logs
It's not from hqq, there is another package with the requirement torch>=1.13.0
Haven't found which one
In the docker it results in installing torch 2.1.2

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 17, 2024

https://github.com/predibase/lorax/actions/runs/7559682310/job/20584021297#step:10:1021

Seems to be coming from peft 0.4.0 requirement

@tgaddair
Copy link
Contributor

Interesting, let me try digging into the docker image and see if there are any package differences. If not, then it should be safe to merge.

@flozi00
Copy link
Collaborator Author

flozi00 commented Jan 19, 2024

should be fixed now by setting torch to the latest version 2.1.2
now pip won't install torch again

Copy link
Contributor

@tgaddair tgaddair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for addressing the PyTorch version issue.

@tgaddair tgaddair merged commit c178802 into main Jan 22, 2024
1 check passed
@tgaddair tgaddair deleted the hqq branch January 22, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HQQ just in time quantization
3 participants