Release v0.3.5 · EvolvingLMMs-Lab/lmms-eval

What's Changed

pip 0.3.4 by @pufanyi in #697
[Fix] Minor fix on some warning messages by @kcz358 in #704
[FIX] Add macro metric to task xlrs-lite by @nanocm in #700
[Fix] Fix evaluator crash with accelerate backend when num_processes=1 by @miikatoi in #699
[Fix] Enable the ignored API_URL in the MathVista evaluation. by @MoyusiteruIori in #705
Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos by @hanoonaR in #702
Update sentencepiece dependency and add new parameters to mathvista_t… by @Luodian in #716
[fix ] Refactor Accelerator initialization by @Luodian in #717
[Minor] typo fixed in task_guide.md by @JulyanZhu in #720
add mmsi-bench (https://arxiv.org/abs/2505.23764) by @sihany077 in #715
add mmvu task by @pbcong in #713
Dev/tomato by @Devininthelab in #709
[fix] update korean benchmark's post_prompt by @jujeongho0 in #719
[fix] ensure synchronization not be used without distributed execution by @debugdoctor in #714
[FIX] Resolve MMMU-test submission file generation issue by @xyyandxyy in #724
Add CameraBench_VQA by @chancharikmitra in #725
[vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD by @kylesayrs in #728
[fix] cli_evaluate to properly handle Namespace arguments by @Luodian in #733
Fix three bugs in the codebase by @Luodian in #734
[Bug] fix a bug in post processing stage of ScienceQA. by @ashun989 in #723
fix: add max_frames_num to OpenAICompatible by @loongfeili in #740
[Bugfix] Add min image resolution requirement for vLLM Qwen-VL models by @zch42 in #737
Revert "Pass in the 'cache_dir' to use local cache" by @kcz358 in #741
[New Benchmark] Add Video-TT Benchmark by @dongyh20 in #742
Add claude GitHub actions 1752118403023 by @Luodian in #749
[Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64 by @LiamLian0727 in #754
[New Benchmark] Request for supporting TimeScope by @ruili33 in #756
Remove Claude GitHub workflows for code review by @Luodian in #757
[fix] Fixed applying process_* twice on resAns for VQAv2 by @Avelina9X in #760
[fix] update korean benchmark's post_prompt by @jujeongho0 in #759
Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025) by @dunghuynhandy in #744
[fix] vqav2 evaluation yaml by @mletrasdl in #764
[New Task] Add support for benchmark PhyX by @wutaiqiang in #766

New Contributors

@miikatoi made their first contribution in #699
@MoyusiteruIori made their first contribution in #705
@hanoonaR made their first contribution in #702
@sihany077 made their first contribution in #715
@debugdoctor made their first contribution in #714
@xyyandxyy made their first contribution in #724
@chancharikmitra made their first contribution in #725
@loongfeili made their first contribution in #740
@zch42 made their first contribution in #737
@LiamLian0727 made their first contribution in #754
@ruili33 made their first contribution in #756
@Avelina9X made their first contribution in #760
@dunghuynhandy made their first contribution in #744
@mletrasdl made their first contribution in #764
@wutaiqiang made their first contribution in #766

Full Changelog: v0.3.4...v0.3.5

What's Changed

pip 0.3.4 by @pufanyi in #697
[Fix] Minor fix on some warning messages by @kcz358 in #704
[FIX] Add macro metric to task xlrs-lite by @nanocm in #700
[Fix] Fix evaluator crash with accelerate backend when num_processes=1 by @miikatoi in #699
[Fix] Enable the ignored API_URL in the MathVista evaluation. by @MoyusiteruIori in #705
Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos by @hanoonaR in #702
Update sentencepiece dependency and add new parameters to mathvista_t… by @Luodian in #716
[fix ] Refactor Accelerator initialization by @Luodian in #717
[Minor] typo fixed in task_guide.md by @JulyanZhu in #720
add mmsi-bench (https://arxiv.org/abs/2505.23764) by @sihany077 in #715
add mmvu task by @pbcong in #713
Dev/tomato by @Devininthelab in #709
[fix] update korean benchmark's post_prompt by @jujeongho0 in #719
[fix] ensure synchronization not be used without distributed execution by @debugdoctor in #714
[FIX] Resolve MMMU-test submission file generation issue by @xyyandxyy in #724
Add CameraBench_VQA by @chancharikmitra in #725
[vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD by @kylesayrs in #728
[fix] cli_evaluate to properly handle Namespace arguments by @Luodian in #733
Fix three bugs in the codebase by @Luodian in #734
[Bug] fix a bug in post processing stage of ScienceQA. by @ashun989 in #723
fix: add max_frames_num to OpenAICompatible by @loongfeili in #740
[Bugfix] Add min image resolution requirement for vLLM Qwen-VL models by @zch42 in #737
Revert "Pass in the 'cache_dir' to use local cache" by @kcz358 in #741
[New Benchmark] Add Video-TT Benchmark by @dongyh20 in #742
Add claude GitHub actions 1752118403023 by @Luodian in #749
[Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64 by @LiamLian0727 in #754
[New Benchmark] Request for supporting TimeScope by @ruili33 in #756
Remove Claude GitHub workflows for code review by @Luodian in #757
[fix] Fixed applying process_* twice on resAns for VQAv2 by @Avelina9X in #760
[fix] update korean benchmark's post_prompt by @jujeongho0 in #759
Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025) by @dunghuynhandy in #744
[fix] vqav2 evaluation yaml by @mletrasdl in #764
[New Task] Add support for benchmark PhyX by @wutaiqiang in #766

New Contributors

@miikatoi made their first contribution in #699
@MoyusiteruIori made their first contribution in #705
@hanoonaR made their first contribution in #702
@sihany077 made their first contribution in #715
@debugdoctor made their first contribution in #714
@xyyandxyy made their first contribution in #724
@chancharikmitra made their first contribution in #725
@loongfeili made their first contribution in #740
@zch42 made their first contribution in #737
@LiamLian0727 made their first contribution in #754
@ruili33 made their first contribution in #756
@Avelina9X made their first contribution in #760
@dunghuynhandy made their first contribution in #744
@mletrasdl made their first contribution in #764
@wutaiqiang made their first contribution in #766

Full Changelog: v0.3.4...v0.3.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.3.5

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

New Contributors

What's Changed

New Contributors

Contributors

Uh oh!