Describe the bug
GSMPlus is broken because we have this logic in the prompt
if line["perturbation_type"] == "critical thinking":
return None
To Reproduce
lighteval vllm "model_name=Qwen/Qwen3-4B-Instruct-2507,max_model_length=32768,gpu_memory_utilization=0.8,generation_parameters={max_new_tokens:32768,temperature:0.6,top_p:0.95}" "lighteval|gsm_plus|0" --max-samples 4
Expected behavior
GSMPlus works :)
Version info
Please provide your operating system, lighteval version or commit if you installed from main, and pip/conda environment if your problem concerns dependencies.