Generalize GPTQ_loader, support any model #615

mayaeary · 2023-03-28T17:48:30Z

Improved version of #521

Generalized version of quantized loader, now it auto-detecting model from model file. It allows loading GPT-J and Pygmalion-6b without joggling of repositories.

I'll try to make generalized offload version, but for now only llama supports it.

You can quantize models using my fork - https://github.com/mayaeary/GPTQ-for-LLaMa/tree/gptj-v2.

Pre-quantized --wbits 4 --groupsize 128

Pygmalion (original) - https://huggingface.co/mayaeary/pygmalion-6b-4bit-128g
Pygmalion (dev) - https://huggingface.co/mayaeary/pygmalion-6b_dev-4bit-128g
PPO_Pygway-V8p4_Dev-6b - https://huggingface.co/mayaeary/PPO_Pygway-V8p4_Dev-6b-4bit-128g

# Download
python download-model.py https://huggingface.co/mayaeary/pygmalion-6b-4bit-128g

# Launch
python server.py --model pygmalion-6b-4bit-128g --wbits 4 --groupsize 128 --cai-chat

Ph0rk0z · 2023-03-28T19:46:17Z

It needs a PR to GPTQ.. technically GPT-NEOX and NEO can also be done this way. In here they don't want to swap around the GPTQ repo.

The steps are get this merged upstream and then get support for the loras into upstream and PEFT as well.. Otherwise these never get merged.

oobabooga · 2023-03-28T20:24:55Z

This is very impressive.

oobabooga · 2023-03-28T20:57:57Z

I have done some basic sanity tests to check if everything is equivalent to the current code, and the answer is that yes.

@Ph0rk0z I agree that it would be nice to have this functionality merged upstream in GPTQ-for-LLaMa, but I see no reason to use Maya's code until that happens. The only caveat is that we will have to watch for eventual changes in the upstream make_quant and load_quant functions.

For reference, these are the VRAM usages that I have seen for pygmalion:

Soon after loading: 4.5GB
Full context length: 7.8GB

Test results

Prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Who is best waifu, Rei or Asuka?
### Response:

Alpaca-30B-Int4

python server.py --wbits 4 --model Alpaca-30B-Int4 --listen

Maya:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Who is best waifu, Rei or Asuka?
### Response:
 It's a difficult choice, but I think Rei is the best waifu. She is kind, caring, and loyal, and she always puts others before herself. She is also a powerful warrior and a great pilot, making her a great choice for a waifu.

Output generated in 4.48 seconds (13.39 tokens/s, 60 tokens, context 47)

Main:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Who is best waifu, Rei or Asuka?
### Response:
 It's a difficult choice, but I think Rei is the best waifu. She is kind, caring, and loyal, and she always puts others before herself. She is also a powerful warrior and a great pilot, making her a great choice for a waifu.

Output generated in 4.50 seconds (13.33 tokens/s, 60 tokens, context 47)

alpaca-native-4bit

python server.py --model alpaca-native-4bit --wbits 4 --groupsize 128 --listen

Maya:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Who is best waifu, Rei or Asuka?
### Response:
 Personally, I think Rei is the best waifu. She is wise, kind, and always looks out for her friends. She is also strong and courageous, never backing down from a challenge. Asuka is also very powerful, but she can be impulsive and reckless at times. Rei, on the other hand, is more thoughtful and calculated in her decisions.

Output generated in 2.36 seconds (34.81 tokens/s, 82 tokens, context 47)

Main:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Who is best waifu, Rei or Asuka?
### Response:
 Personally, I think Rei is the best waifu. She is wise, kind, and always looks out for her friends. She is also strong and courageous, never backing down from a challenge. Asuka is also very powerful, but she can be impulsive and reckless at times. Rei, on the other hand, is more thoughtful and calculated in her decisions.

Output generated in 2.33 seconds (35.18 tokens/s, 82 tokens, context 47)

treshphilip · 2023-03-31T07:23:49Z

Can someone convert that model https://huggingface.co/TehVenom/PPO_Pygway-V8p4_Dev-6b to 4bit please? I have not enough of GPU memory to do that(

mayaeary · 2023-03-31T11:34:15Z

Can someone convert that model https://huggingface.co/TehVenom/PPO_Pygway-V8p4_Dev-6b to 4bit please? I have not enough of GPU memory to do that(

Done, added to first post

treshphilip · 2023-03-31T12:12:25Z

Can someone convert that model https://huggingface.co/TehVenom/PPO_Pygway-V8p4_Dev-6b to 4bit please? I have not enough of GPU memory to do that(

Done, added to first post

Thank you very much! I will go test that)

…y/feature/gpt-j-4bit-v2) This includes Pygmalion 4bit

mayaeary added 2 commits March 28, 2023 20:38

Generalized load_quantized

c8207d4

Fix typo

1c075d8

mayaeary mentioned this pull request Mar 28, 2023

GPT-J and Pygmalion-6b 4bit #521

Closed

mayaeary and others added 2 commits March 28, 2023 22:30

Merge branch 'oobabooga:main' into feature/gpt-j-4bit-v2

1ac003d

Disable kernel threshold for gpt-j

41ec682

oobabooga added 2 commits March 28, 2023 17:34

Reorder imports

0bec15e

Update documentation

010b259

oobabooga merged commit b2f356a into oobabooga:main Mar 28, 2023

bmoconno mentioned this pull request Mar 29, 2023

not working since commit 31f04dc bitsandbytes problem #614

Closed

1 task

mayaeary deleted the feature/gpt-j-4bit-v2 branch March 29, 2023 08:03

Ph0rk0z pushed a commit to Ph0rk0z/text-generation-webui-testing that referenced this pull request Apr 17, 2023

Generalize GPTQ_loader, support any model (oobabooga#615 from mayaear…

2cf0896

…y/feature/gpt-j-4bit-v2) This includes Pygmalion 4bit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize GPTQ_loader, support any model #615

Generalize GPTQ_loader, support any model #615

mayaeary commented Mar 28, 2023 •

edited

Ph0rk0z commented Mar 28, 2023

oobabooga commented Mar 28, 2023

oobabooga commented Mar 28, 2023

Prompt:

Alpaca-30B-Int4

Maya:

Main:

alpaca-native-4bit

Maya:

Main:

treshphilip commented Mar 31, 2023

mayaeary commented Mar 31, 2023

treshphilip commented Mar 31, 2023

Generalize GPTQ_loader, support any model #615

Generalize GPTQ_loader, support any model #615

Conversation

mayaeary commented Mar 28, 2023 • edited

Ph0rk0z commented Mar 28, 2023

oobabooga commented Mar 28, 2023

oobabooga commented Mar 28, 2023

Prompt:

Alpaca-30B-Int4

Maya:

Main:

alpaca-native-4bit

Maya:

Main:

treshphilip commented Mar 31, 2023

mayaeary commented Mar 31, 2023

treshphilip commented Mar 31, 2023

mayaeary commented Mar 28, 2023 •

edited