GPT-J and Pygmalion-6b 4bit #521

mayaeary · 2023-03-23T20:24:44Z

Support 4-bit GPTQ for GPT-J-6b and Pygmalion-6b.

You need my fork of GPTQ-for-LLaMA for it to work. It forked from commit 468c47c01b4fe370616747b6d69a2d3f48bab5e4, so should be compatible with current version.

mkdir repositories
cd repositories
git clone https://github.com/mayaeary/GPTQ-for-LLaMa
cd GPTQ-for-LLaMa
git checkout gptj
python setup_cuda.py install

To quantize model:

# from repositories/GPTQ-for-LLaMA
CUDA_VISIBLE_DEVICES=0 python gptj.py ../../models/pygmalion-6b_dev c4 --wbits 4 --save ../../models/pygmalion-6b_dev-4bit.pt

It seems to work, but can someone else test it?

UPD. https://huggingface.co/mayaeary/pygmalion-6b-4bit/resolve/main/pygmalion-6b_dev-4bit.pt - quantized checkpoint for pygmalion-6b_dev

Brawlence · 2023-03-23T20:39:59Z

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

mayaeary · 2023-03-23T20:50:58Z

Do you have the compiled wheel for quant_cuda-0.0.0-cp310-cp310-win_amd64 quant kernel?

I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows

You can install Visual Studio Build Tools, it's only compiler and libraries, without IDE.

I attached this file, but don't know if it'll work.
quant_cuda-0.0.0-py3.10-win-amd64.egg.zip

Brawlence · 2023-03-23T21:09:28Z

Whoa. Something wild is going on with gptj.py. It asked for a HF token (which I provided) and then it failed to quantize. BUT thanks to your egg file and the generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main it actually worked.

Pyg-6b-q4 takes a little shy of ~~7 GBs~~ 4.5 GBs in memory, as it presumably should.

The file structure I used is the classic one, mimicking the one for LLaMA:
📂pygmalion-6b-gptq:
┣━━ 📄config.json
┣━━ 📄merges.txt
┣━━ 📄README.md
┣━━ 📄special_tokens_map.json
┣━━ 📄tokenizer_config.json
┣━━ 📄vocab.json
┗━━ 📄added_tokens.json
📄pygmalion-6b-gptq-4bit.pt

Side question: does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly.

mayaeary · 2023-03-23T21:16:45Z

It asked for a HF token (which I provided) and then it failed to quantize.

c4 dataset requires huggingface authorization, you can use wikitext2 or ptb instead. I'm not sure what the difference, but used c4 as in original gptq repo.

generous soul at https://huggingface.co/OccamRazor/pygmalion-6b-gptq-4bit/tree/main

Somehow .bin file is less in size then my (I've uploaded it on huggingface too, see head message).

does it matter if I use pygmalion-6b-gptq-4bit and not pygmalion-6b_dev-4bit? It works and as far as I can tell, correctly

But it shouldn't, webui expects 4bit model to be called exactly as your main model folder + -4bit.pt. Are you sure it loaded? On my GPU it takes 4.5 GB of VRAM, 8bit version takes 7.5 and full 16 bit doesn't fit at all

Brawlence · 2023-03-23T21:24:24Z

I'm pretty sure it works. Let me re-bench.

state	VRAM
idle VRAM load	1.2 GB
model is loaded	4.9 GB
generation is triggered	5.7 GB

Yep, totally works. And you're correct, it's ~ 4.5 Gb as of now, don't know why total VRAM was peaking at 7 Gbs the last time

Ph0rk0z · 2023-03-23T21:38:35Z

There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :(

also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main)

8WSR0hX · 2023-03-27T21:50:20Z

Can anybody confirm if the Pygmalion-6b-4bit model works with the latest GPTQ repo and this one?

Ph0rk0z · 2023-03-27T22:10:29Z

It's for the old GPTQ.. but it does work. (in the old GPTQ)

treshphilip · 2023-03-28T09:32:27Z

It's possible to use Pygmalion-6b-4bit with --gptq-pre-layer option?

mayaeary · 2023-03-28T17:50:02Z

#615 - new version, this PR is outdated for now

GPT-J 4bit

669b15c

mayaeary mentioned this pull request Mar 28, 2023

Generalize GPTQ_loader, support any model #615

Merged

mayaeary closed this Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPT-J and Pygmalion-6b 4bit #521

GPT-J and Pygmalion-6b 4bit #521

mayaeary commented Mar 23, 2023 •

edited

Brawlence commented Mar 23, 2023

mayaeary commented Mar 23, 2023

Brawlence commented Mar 23, 2023 •

edited

mayaeary commented Mar 23, 2023 •

edited

Brawlence commented Mar 23, 2023 •

edited

Ph0rk0z commented Mar 23, 2023 •

edited

8WSR0hX commented Mar 27, 2023 •

edited

Ph0rk0z commented Mar 27, 2023 •

edited

treshphilip commented Mar 28, 2023

mayaeary commented Mar 28, 2023

GPT-J and Pygmalion-6b 4bit #521

GPT-J and Pygmalion-6b 4bit #521

Conversation

mayaeary commented Mar 23, 2023 • edited

Brawlence commented Mar 23, 2023

mayaeary commented Mar 23, 2023

Brawlence commented Mar 23, 2023 • edited

mayaeary commented Mar 23, 2023 • edited

Brawlence commented Mar 23, 2023 • edited

Ph0rk0z commented Mar 23, 2023 • edited

8WSR0hX commented Mar 27, 2023 • edited

Ph0rk0z commented Mar 27, 2023 • edited

treshphilip commented Mar 28, 2023

mayaeary commented Mar 28, 2023

mayaeary commented Mar 23, 2023 •

edited

Brawlence commented Mar 23, 2023 •

edited

mayaeary commented Mar 23, 2023 •

edited

Brawlence commented Mar 23, 2023 •

edited

Ph0rk0z commented Mar 23, 2023 •

edited

8WSR0hX commented Mar 27, 2023 •

edited

Ph0rk0z commented Mar 27, 2023 •

edited