Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The results are very different in 4-bit and 16-bit/8-bit modes #81

Closed
oobabooga opened this issue Apr 16, 2023 · 3 comments
Closed

The results are very different in 4-bit and 16-bit/8-bit modes #81

oobabooga opened this issue Apr 16, 2023 · 3 comments

Comments

@oobabooga
Copy link

oobabooga commented Apr 16, 2023

Using this prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write one paragraph
### Response:

I get these results for the tloen/alpaca-lora-7b LoRA applied on top of llama-7b:

Mode Result
4-bit The world is a vast and mysterious place, filled with wonders and surprises. We are all connected to each other and to the universe, and we have the power to shape our own destinies. We can choose to follow our own paths, or to follow the paths of others. We can choose to be brave and take risks, or to be cautious and play it safe. We can choose to be kind and compassionate, or to be selfish and cruel. We can choose to be creative and innovative, or to be complacent and unambitious. We can choose to be optimistic and hopeful, or to be pessimistic and hopeless. We can choose to be open-minded and accepting, or to be closed-minded and judgmental. We can choose to be generous and kind, or to be selfish and greedy. We can choose to be humble and respectful, or
16-bit The world is a beautiful place, full of wonders and surprises. Every day, we are presented with new opportunities to explore and discover. We can learn new things, make new friends, and experience new cultures. Life is a journey, and it is up to us to make the most of it.
8-bit The world is a beautiful place, full of wonders and surprises. From the majestic mountains to the deep blue oceans, there is so much to explore and discover. Nature is full of surprises, from the majestic beauty of a sunrise to the majestic beauty of a sunset. The world is full of surprises, and it is up to us to take advantage of them and make the most of our lives.

In all cases, the generation uses do_sample=False for greedy sampling. The 4-bit model used is llama-7b-4bit-128g.

The code that I am using is the one in this PR oobabooga/text-generation-webui#1200

Is this difference something to worry about? In all my tests, the 4-bit results diverge a lot from the 16-bit/8-bit results.

@johnsmith0031
Copy link
Owner

johnsmith0031 commented Apr 17, 2023

I think it's because of 4-bit quantization loss, so actually they're different models, especially for 4-bit model considering the fact that it is not RTN based. Current SOTA 4-bit model reach 5.85 on wikitext2 while the fp16 version is 5.68, still better than the quantized model.

LLaMA-7B Bits group-size memory(MiB) Wikitext2 checkpoint size(GB)
FP16 16 - 13940 5.68 12.5
RTN 4 - - 6.29 -
GPTQ 4 - 4740 6.09 3.5
GPTQ 4 128 4891 5.85 3.6
RTN 3 - - 25.54 -
GPTQ 3 - 3852 8.07 2.7
GPTQ 3 128 4116 6.61 3.0

@oobabooga
Copy link
Author

they're different models, especially for 4-bit model considering the fact that it is not RTN based

That makes perfect sense, thanks for the reply. I was worried that I might have done something wrong while adapting the web UI to work with the code here. I'll do some final reviewing and will approve/document my own PR.

4-bit+lora seems to be the holy grail of consumer-grade LLM inference at the moment and it's nice to see it working. Thanks for your work.

@leexinyu1204
Copy link

Using this prompt:

Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
Write one paragraph
### Response:

I get these results for the tloen/alpaca-lora-7b LoRA applied on top of llama-7b:

Mode Result
4-bit The world is a vast and mysterious place, filled with wonders and surprises. We are all connected to each other and to the universe, and we have the power to shape our own destinies. We can choose to follow our own paths, or to follow the paths of others. We can choose to be brave and take risks, or to be cautious and play it safe. We can choose to be kind and compassionate, or to be selfish and cruel. We can choose to be creative and innovative, or to be complacent and unambitious. We can choose to be optimistic and hopeful, or to be pessimistic and hopeless. We can choose to be open-minded and accepting, or to be closed-minded and judgmental. We can choose to be generous and kind, or to be selfish and greedy. We can choose to be humble and respectful, or
16-bit The world is a beautiful place, full of wonders and surprises. Every day, we are presented with new opportunities to explore and discover. We can learn new things, make new friends, and experience new cultures. Life is a journey, and it is up to us to make the most of it.
8-bit The world is a beautiful place, full of wonders and surprises. From the majestic mountains to the deep blue oceans, there is so much to explore and discover. Nature is full of surprises, from the majestic beauty of a sunrise to the majestic beauty of a sunset. The world is full of surprises, and it is up to us to take advantage of them and make the most of our lives.
In all cases, the generation uses do_sample=False for greedy sampling. The 4-bit model used is llama-7b-4bit-128g.

The code that I am using is the one in this PR oobabooga/text-generation-webui#1200

Is this difference something to worry about? In all my tests, the 4-bit results diverge a lot from the 16-bit/8-bit results.

may I ask how to use 8 bit?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants