I can not reproduce 7b 6.09 Wiki2 PPL. #78

USBhost · 2023-03-24T21:50:28Z

I can not seem to get that. It's ether smaller or a little bigger. Can you guys provide the command you used to get this?

In my tests I ended up getting Wiki2 6.29,6.25 and 5.9 trying different settings. Also what's the correct why to do these tests? What is the correct way to check? benchmark check or normal saving at the Evaluating stage?

python -u ../repositories/GPTQ-for-LLaMa/llama.py llama-7b wikitext2 --new-eval --wbits 4 --act-order --true-sequential --save_safetensors llama-7b-4bit.safetensors
c4-new: 7.843033313751221
ptb-new: 10.846735000610352
wikitext2: 5.92544412612915

python -u ../repositories/GPTQ-for-LLaMa/llama.py llama-7b wikitext2 --new-eval --wbits 4 --act-order --true-sequential --load llama-7b-4bit.safetensors --benchmark 2048 --check
Median: 0.0950855016708374
PPL: 6.688839912414551
max memory(MiB): 1712.3349609375

qwopqwop200 · 2023-03-25T04:41:56Z

Depending on the GPUs/drivers, there may be a difference in performance, which decreases as the model size increases.(IST-DASLab/gptq#1)

USBhost · 2023-03-25T04:47:31Z

Depending on the GPUs/drivers, there may be a difference in performance, which decreases as the model size increases.(IST-DASLab/gptq#1)

Yes I get that but what command do you use to properly get that number from? benchmark or normal at the Evaluating stage? I am confused on what I should be using to properly gauge what I convert.

Qubitium · 2023-03-25T11:08:03Z

@USBhost Which GPU and Nvidia driver version are you using? Maybe we can track this gpu/driver diff when other users report more scores as well.

Also, your score is better: lower is better, not higher. That's a good thing right?

USBhost · 2023-03-25T14:31:39Z

@USBhost Which GPU and Nvidia driver version are you using? Maybe we can track this gpu/driver diff when other users report more scores as well.

Also, your score is better: lower is better, not higher. That's a good thing right?

GPU: RTX A6000
Driver: 530.30.02

If you notice In the op I ran with wikitext2 vs the default? C4. So If I ran with C4 I got wikitext2 6.29, when I ran with groupsize 128 + --true-sequential I got wikitext2 6.25.

Qubitium · 2023-03-29T17:38:44Z

@USBhost Looking at the code and reading the arvix paper, here are my thoughts on variance.

Default sampling of 128 is chosen. You can increase this but vram usage during quantizing will explode.
This means if you selected c4, it will select randomly 128 rows from that dataset to establish a calibration point/baseline for comparison and also to assist quantization.

The code by design will produce different score every single time since the calibration point is randomized.

USBhost · 2023-03-30T03:12:46Z

@USBhost Looking at the code and reading the arvix paper, here are my thoughts on variance.

Default sampling of 128 is chosen. You can increase this but vram usage during quantizing will explode.

This means if you selected c4, it will select randomly 128 rows from that dataset to establish a calibration point/baseline for comparison and also to assist quantization.

The code by design will produce different score every single time since the calibration point is randomized.

But I am able to reproduce my results though. That random doesn't seem to be doing what we think it does.

Qubitium · 2023-03-30T04:05:35Z

But I am able to reproduce my results though. That random doesn't seem to be doing what we think it does.

Interesting. I never tried to run the same config more than once myself since it takes forever. Are you getting the exact values down to the significant digits on repeat quantizing with same config?

USBhost · 2023-03-30T04:07:08Z

But I am able to reproduce my results though. That random doesn't seem to be doing what we think it does.

Interesting. I never tried to run the same config more than once myself since it takes forever. Are you getting the exact values down to the significant digits on repeat quantizing with same config?

To the exact last digit.

Xiuyu-Li · 2023-04-01T23:28:27Z

I have the same issue using an A6000 GPU.

qwopqwop200 · 2023-04-02T03:58:37Z

This value is obtained from GPTQ. Please ask GPTQ for details.

Xiuyu-Li · 2023-04-02T04:44:47Z

Nvm. I found that the reported results could be reproduced by using --eval --new-eval instead of --benchmark 2048 --check

USBhost · 2023-04-02T04:57:32Z

For 7b really? @Xiuyu-Li ?

new-eval does not affect Wiki2 .

Xiuyu-Li · 2023-04-02T05:02:48Z

For 7b really? @Xiuyu-Li ?

Yes. Got 5.78 when calibrated with wiki2 and 5.83 when calibrated with c4, both using a group size of 128. Here you need to call llama_eval with --eval to be consistent with what GPTQ did. ---benchmark got the model evaluated on a different set of data.

USBhost · 2023-04-02T14:30:38Z

For 7b really? @Xiuyu-Li ?

Yes. Got 5.78 when calibrated with wiki2 and 5.83 when calibrated with c4, both using a group size of 128. Here you need to call llama_eval with --eval to be consistent with what GPTQ did. ---benchmark got the model evaluated on a different set of data.

Oh a group size.

USBhost changed the title ~~I can not preproduce 7b 6.09 Wiki2 PPL.~~ I can not reproduce 7b 6.09 Wiki2 PPL. Mar 24, 2023

USBhost mentioned this issue Mar 25, 2023

Add support for the latest GPTQ models with group-size oobabooga/text-generation-webui#530

Merged

qwopqwop200 closed this as completed Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can not reproduce 7b 6.09 Wiki2 PPL. #78

I can not reproduce 7b 6.09 Wiki2 PPL. #78

USBhost commented Mar 24, 2023 •

edited

Loading

qwopqwop200 commented Mar 25, 2023

USBhost commented Mar 25, 2023

Qubitium commented Mar 25, 2023 •

edited

Loading

USBhost commented Mar 25, 2023

Qubitium commented Mar 29, 2023

USBhost commented Mar 30, 2023

Qubitium commented Mar 30, 2023

USBhost commented Mar 30, 2023

Xiuyu-Li commented Apr 1, 2023

qwopqwop200 commented Apr 2, 2023

Xiuyu-Li commented Apr 2, 2023

USBhost commented Apr 2, 2023 •

edited

Loading

Xiuyu-Li commented Apr 2, 2023

USBhost commented Apr 2, 2023

I can not reproduce 7b 6.09 Wiki2 PPL. #78

I can not reproduce 7b 6.09 Wiki2 PPL. #78

Comments

USBhost commented Mar 24, 2023 • edited Loading

qwopqwop200 commented Mar 25, 2023

USBhost commented Mar 25, 2023

Qubitium commented Mar 25, 2023 • edited Loading

USBhost commented Mar 25, 2023

Qubitium commented Mar 29, 2023

USBhost commented Mar 30, 2023

Qubitium commented Mar 30, 2023

USBhost commented Mar 30, 2023

Xiuyu-Li commented Apr 1, 2023

qwopqwop200 commented Apr 2, 2023

Xiuyu-Li commented Apr 2, 2023

USBhost commented Apr 2, 2023 • edited Loading

Xiuyu-Li commented Apr 2, 2023

USBhost commented Apr 2, 2023

USBhost commented Mar 24, 2023 •

edited

Loading

Qubitium commented Mar 25, 2023 •

edited

Loading

USBhost commented Apr 2, 2023 •

edited

Loading