v0.3.5
What's Changed
- pip 0.3.4 by @pufanyi in #697
- [Fix] Minor fix on some warning messages by @kcz358 in #704
- [FIX] Add macro metric to task xlrs-lite by @nanocm in #700
- [Fix] Fix evaluator crash with accelerate backend when num_processes=1 by @miikatoi in #699
- [Fix] Enable the ignored API_URL in the MathVista evaluation. by @MoyusiteruIori in #705
- Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos by @hanoonaR in #702
- Update sentencepiece dependency and add new parameters to mathvista_t… by @Luodian in #716
- [fix ] Refactor Accelerator initialization by @Luodian in #717
- [Minor] typo fixed in task_guide.md by @JulyanZhu in #720
- add mmsi-bench (https://arxiv.org/abs/2505.23764) by @sihany077 in #715
- add mmvu task by @pbcong in #713
- Dev/tomato by @Devininthelab in #709
- [fix] update korean benchmark's post_prompt by @jujeongho0 in #719
- [fix] ensure synchronization not be used without distributed execution by @debugdoctor in #714
- [FIX] Resolve MMMU-test submission file generation issue by @xyyandxyy in #724
- Add CameraBench_VQA by @chancharikmitra in #725
- [vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD by @kylesayrs in #728
- [fix] cli_evaluate to properly handle Namespace arguments by @Luodian in #733
- Fix three bugs in the codebase by @Luodian in #734
- [Bug] fix a bug in post processing stage of ScienceQA. by @ashun989 in #723
- fix: add
max_frames_numtoOpenAICompatibleby @loongfeili in #740 - [Bugfix] Add min image resolution requirement for vLLM Qwen-VL models by @zch42 in #737
- Revert "Pass in the 'cache_dir' to use local cache" by @kcz358 in #741
- [New Benchmark] Add Video-TT Benchmark by @dongyh20 in #742
- Add claude GitHub actions 1752118403023 by @Luodian in #749
- [Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64 by @LiamLian0727 in #754
- [New Benchmark] Request for supporting TimeScope by @ruili33 in #756
- Remove Claude GitHub workflows for code review by @Luodian in #757
- [fix] Fixed applying process_* twice on resAns for VQAv2 by @Avelina9X in #760
- [fix] update korean benchmark's post_prompt by @jujeongho0 in #759
- Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025) by @dunghuynhandy in #744
- [fix] vqav2 evaluation yaml by @mletrasdl in #764
- [New Task] Add support for benchmark PhyX by @wutaiqiang in #766
New Contributors
- @miikatoi made their first contribution in #699
- @MoyusiteruIori made their first contribution in #705
- @hanoonaR made their first contribution in #702
- @sihany077 made their first contribution in #715
- @debugdoctor made their first contribution in #714
- @xyyandxyy made their first contribution in #724
- @chancharikmitra made their first contribution in #725
- @loongfeili made their first contribution in #740
- @zch42 made their first contribution in #737
- @LiamLian0727 made their first contribution in #754
- @ruili33 made their first contribution in #756
- @Avelina9X made their first contribution in #760
- @dunghuynhandy made their first contribution in #744
- @mletrasdl made their first contribution in #764
- @wutaiqiang made their first contribution in #766
Full Changelog: v0.3.4...v0.3.5
What's Changed
- pip 0.3.4 by @pufanyi in #697
- [Fix] Minor fix on some warning messages by @kcz358 in #704
- [FIX] Add macro metric to task xlrs-lite by @nanocm in #700
- [Fix] Fix evaluator crash with accelerate backend when num_processes=1 by @miikatoi in #699
- [Fix] Enable the ignored API_URL in the MathVista evaluation. by @MoyusiteruIori in #705
- Adds VideoMathQA - Task Designed to Evaluate Mathematical Reasoning in Real-World Educational Videos by @hanoonaR in #702
- Update sentencepiece dependency and add new parameters to mathvista_t… by @Luodian in #716
- [fix ] Refactor Accelerator initialization by @Luodian in #717
- [Minor] typo fixed in task_guide.md by @JulyanZhu in #720
- add mmsi-bench (https://arxiv.org/abs/2505.23764) by @sihany077 in #715
- add mmvu task by @pbcong in #713
- Dev/tomato by @Devininthelab in #709
- [fix] update korean benchmark's post_prompt by @jujeongho0 in #719
- [fix] ensure synchronization not be used without distributed execution by @debugdoctor in #714
- [FIX] Resolve MMMU-test submission file generation issue by @xyyandxyy in #724
- Add CameraBench_VQA by @chancharikmitra in #725
- [vLLM] centralize VLLM_WORKER_MULTIPROC_METHOD by @kylesayrs in #728
- [fix] cli_evaluate to properly handle Namespace arguments by @Luodian in #733
- Fix three bugs in the codebase by @Luodian in #734
- [Bug] fix a bug in post processing stage of ScienceQA. by @ashun989 in #723
- fix: add
max_frames_numtoOpenAICompatibleby @loongfeili in #740 - [Bugfix] Add min image resolution requirement for vLLM Qwen-VL models by @zch42 in #737
- Revert "Pass in the 'cache_dir' to use local cache" by @kcz358 in #741
- [New Benchmark] Add Video-TT Benchmark by @dongyh20 in #742
- Add claude GitHub actions 1752118403023 by @Luodian in #749
- [Bugfix] Fix handling of encode_video output in vllm.py so each frame’s Base64 by @LiamLian0727 in #754
- [New Benchmark] Request for supporting TimeScope by @ruili33 in #756
- Remove Claude GitHub workflows for code review by @Luodian in #757
- [fix] Fixed applying process_* twice on resAns for VQAv2 by @Avelina9X in #760
- [fix] update korean benchmark's post_prompt by @jujeongho0 in #759
- Title: Add Benchmark from "Vision-Language Models Can’t See the Obvious" (ICCV 2025) by @dunghuynhandy in #744
- [fix] vqav2 evaluation yaml by @mletrasdl in #764
- [New Task] Add support for benchmark PhyX by @wutaiqiang in #766
New Contributors
- @miikatoi made their first contribution in #699
- @MoyusiteruIori made their first contribution in #705
- @hanoonaR made their first contribution in #702
- @sihany077 made their first contribution in #715
- @debugdoctor made their first contribution in #714
- @xyyandxyy made their first contribution in #724
- @chancharikmitra made their first contribution in #725
- @loongfeili made their first contribution in #740
- @zch42 made their first contribution in #737
- @LiamLian0727 made their first contribution in #754
- @ruili33 made their first contribution in #756
- @Avelina9X made their first contribution in #760
- @dunghuynhandy made their first contribution in #744
- @mletrasdl made their first contribution in #764
- @wutaiqiang made their first contribution in #766
Full Changelog: v0.3.4...v0.3.5