Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPT-J and Pygmalion-6b 4bit #521

Closed
wants to merge 1 commit into from

Conversation

mayaeary
Copy link
Contributor

@mayaeary mayaeary commented Mar 23, 2023

Support 4-bit GPTQ for GPT-J-6b and Pygmalion-6b.

You need my fork of GPTQ-for-LLaMA for it to work. It forked from commit 468c47c01b4fe370616747b6d69a2d3f48bab5e4, so should be compatible with current version.

mkdir repositories
cd repositories
git clone https://github.com/mayaeary/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git checkout gptj
python setup_cuda.py install

To quantize model:

# from repositories/GPTQ-for-LLaMA
CUDA_VISIBLE_DEVICES=0 python gptj.py ../../models/pygmalion-6b_dev c4 --wbits 4 --save ../../models/pygmalion-6b_dev-4bit.pt

It seems to work, but can someone else test it?

UPD. https://huggingface.co/mayaeary/pygmalion-6b-4bit/resolve/main/pygmalion-6b_dev-4bit.pt - quantized checkpoint for pygmalion-6b_dev

@Brawlence
Copy link
Contributor

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

@mayaeary
Copy link
Contributor Author

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

You can install Visual Studio Build Tools, it's only compiler and libraries, without IDE.

I attached this file, but don't know if it'll work.
quant_cuda-0.0.0-py3.10-win-amd64.egg.zip

@Brawlence
Copy link
Contributor

Brawlence commented Mar 23, 2023

Whoa. Something wild is going on with gptj.py. It asked for a HF token (which I provided) and then it failed to quantize. BUT thanks to your egg file and the generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main it actually worked.

Pyg-6b-q4 takes a little shy of 7 GBs 4.5 GBs in memory, as it presumably should.

The file structure I used is the classic one, mimicking the one for LLaMA:
📂pygmalion-6b-gptq:
┣━━ 📄config.json
┣━━ 📄merges.txt
┣━━ 📄README.md
┣━━ 📄special_tokens_map.json
┣━━ 📄tokenizer_config.json
┣━━ 📄vocab.json
┗━━ 📄added_tokens.json
📄pygmalion-6b-gptq-4bit.pt

Side question: does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly.

@mayaeary
Copy link
Contributor Author

mayaeary commented Mar 23, 2023

It asked for a HF token (which I provided) and then it failed to quantize.

c4 dataset requires huggingface authorization, you can use wikitext2 or ptb instead. I'm not sure what the difference, but used c4 as in original gptq repo.

generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main

Somehow .bin file is less in size then my (I've uploaded it on huggingface too, see head message).

does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly

But it shouldn't, webui expects 4bit model to be called exactly as your main model folder + -4bit.pt. Are you sure it loaded? On my GPU it takes 4.5 GB of VRAM, 8bit version takes 7.5 and full 16 bit doesn't fit at all

@Brawlence
Copy link
Contributor

Brawlence commented Mar 23, 2023

I'm pretty sure it works. Let me re-bench.

TEST

state VRAM
idle VRAM load 1.2 GB
model is loaded 4.9 GB
generation is triggered 5.7 GB

Yep, totally works. And you're correct, it's ~ 4.5 Gb as of now, don't know why total VRAM was peaking at 7 Gbs the last time

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Mar 23, 2023

There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :(

also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main)

@8WSR0hX
Copy link

8WSR0hX commented Mar 27, 2023

Can anybody confirm if the Pygmalion-6b-4bit model works with the latest GPTQ repo and this one?

@Ph0rk0z
Copy link
Contributor

Ph0rk0z commented Mar 27, 2023

It's for the old GPTQ.. but it does work. (in the old GPTQ)

@treshphilip
Copy link

It's possible to use Pygmalion-6b-4bit with --gptq-pre-layer option?

@mayaeary
Copy link
Contributor Author

#615 - new version, this PR is outdated for now

@mayaeary mayaeary closed this Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants