You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great work!
I'm having a little problem reproducing the PPL results in the paper. I used the code snippet from the gptq repo for measuring ppl and was able to reproduce the fp16 baseline for the llama family in the paper, but I was unable to reproduce the fp16 baseline for mistral-7b using the same test code:
Specifically, I use mistral-7b-v0.1, tried seqlen=8000 as well as seqlen=8192, both slightly lower than the results in the paper, which gave us a bit of trouble.
I would like to ask will you release the code of measuring ppl?
The text was updated successfully, but these errors were encountered:
I was able to reproduce this. The gap in scores depends on what attention implementation is used in Transformers. I measured our PPL numbers with seqlen=8192 using "_attn_implementation": "eager" in the config.json file. If you use newer versions of transformers, by default _attn_implementation": "sdpa" is used instead. When using "sdpa" I get 4.73 instead of 4.76. Let me know if this doesn't fix the issue.
Thanks for the great work!
I'm having a little problem reproducing the PPL results in the paper. I used the code snippet from the gptq repo for measuring ppl and was able to reproduce the fp16 baseline for the llama family in the paper, but I was unable to reproduce the fp16 baseline for mistral-7b using the same test code:
Specifically, I use mistral-7b-v0.1, tried seqlen=8000 as well as seqlen=8192, both slightly lower than the results in the paper, which gave us a bit of trouble.
I would like to ask will you release the code of measuring ppl?
The text was updated successfully, but these errors were encountered: