New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GPT-J and Pygmalion-6b 4bit #521
Conversation
Do you have the compiled wheel for I'd love to test Pyg-6B-q4 capability but I absolutely despise installing MSVC build environment and it seems to be the only way on Windows |
You can install Visual Studio Build Tools, it's only compiler and libraries, without IDE. I attached this file, but don't know if it'll work. |
Whoa. Something wild is going on with Pyg-6b-q4 takes a little shy of The file structure I used is the classic one, mimicking the one for LLaMA: Side question: does it matter if I use |
Somehow .bin file is less in size then my (I've uploaded it on huggingface too, see head message).
But it shouldn't, webui expects 4bit model to be called exactly as your main model folder + |
There is now GPT-neoX, GPT-J, 4 bit loras and gpt-neo all with different kernels :( also GPT-J with offload (https://github.com/AlpinDale/gptq-gptj/commits/main) |
Can anybody confirm if the Pygmalion-6b-4bit model works with the latest GPTQ repo and this one? |
It's for the old GPTQ.. but it does work. (in the old GPTQ) |
It's possible to use Pygmalion-6b-4bit with --gptq-pre-layer option? |
#615 - new version, this PR is outdated for now |
Support 4-bit GPTQ for GPT-J-6b and Pygmalion-6b.
You need my fork of GPTQ-for-LLaMA for it to work. It forked from commit 468c47c01b4fe370616747b6d69a2d3f48bab5e4, so should be compatible with current version.
To quantize model:
It seems to work, but can someone else test it?
UPD. https://huggingface.co/mayaeary/pygmalion-6b-4bit/resolve/main/pygmalion-6b_dev-4bit.pt - quantized checkpoint for pygmalion-6b_dev