Skip to content

Latest commit

 

History

History
29 lines (21 loc) · 1.63 KB

gemma-2b-acc.md

File metadata and controls

29 lines (21 loc) · 1.63 KB

Evaluate the model

Install lm-eval-harness from source, lm-eval 0.4.2 is used

pip install auto-gptq

Please note that there is a discrepancy between the baseline result and the official data, which is a known issue within the official model card community. Given that the Gemma model family exhibits inconsistent results between FP16 and BF16 on lm-eval, we recommend converting to FP16 for both tuning and evaluation.

lm_eval --model hf --model_args pretrained="Intel/gemma-2b-int4-inc",autogptq=True,gptq_use_triton=True,dtype=float16 --device cuda:0 --tasks lambada_openai,hellaswag,piqa,winogrande,truthfulqa_mc1,openbookqa,boolq,rte,arc_easy,arc_challenge,mmlu --batch_size 16
Metric BF16 FP16 AutoRound v0.1 AutoRound v0.2
Avg. 0.5263 0.5277 0.5235 0.5248
mmlu 0.3287 0.3287 0.3297 0.3309
lambada_openai 0.6344 0.6375 0.6307 0.6379
hellaswag 0.5273 0.5281 0.5159 0.5184
winogrande 0.6504 0.6488 0.6543 0.6575
piqa 0.7671 0.7720 0.7612 0.7606
truthfulqa_mc1 0.2203 0.2203 0.2203 0.2191
openbookqa 0.2980 0.3020 0.3000 0.3060
boolq 0.6927 0.6936 0.6939 0.6966
arc_easy 0.7420 0.7403 0.7353 0.7357
arc_challenge 0.4019 0.4061 0.3933 0.3857