qwen2.5-vl compat qwen_vl_utils version #6584

Jintao-Huang · 2025-11-13T08:14:10Z

No description provided.

gemini-code-assist · 2025-11-13T08:14:28Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the compatibility and robustness of the system, particularly concerning the integration of Qwen-VL models with varying versions of the qwen_vl_utils library. It introduces a mechanism to dynamically adjust image and video token processing based on the utility library's version, ensuring smoother operation across different setups. Additionally, it refines error messaging for model type detection and improves the handling of metric reduction in distributed training, contributing to overall system stability and user experience.

Highlights

Qwen-VL Utility Compatibility: Implemented version-aware compatibility for the qwen_vl_utils library, specifically addressing how image and video token numbers are calculated for Qwen2VL, Qwen3VL, and Qwen3VLMoe models based on the installed utility version.
Dynamic Environment Variable Configuration: Introduced a new function, compat_qwen_vl_utils, which dynamically sets environment variables like IMAGE_MAX_TOKEN_NUM and VIDEO_MAX_TOKEN_NUM to ensure correct token processing based on the model's image_patch_size and SPATIAL_MERGE_SIZE.
Improved Error Messaging: Enhanced the error message displayed when the model_type cannot be automatically determined, providing clearer guidance and recommended model_types to the user.
Robust Metric Reduction: Refactored the _all_reduce_metric function in the Megatron trainer to use torch.stack for creating the reporting_metric tensor, leading to a more robust and potentially efficient handling of distributed metric aggregation.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist · 2025-11-13T08:25:13Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request focuses on improving the robustness and compatibility of Qwen-VL models within the system. It implements a version-aware mechanism to handle different qwen_vl_utils library versions, ensuring that vision-language models correctly process image and video inputs by dynamically adjusting token limits. Additionally, it refines a user-facing error message to provide more helpful information when model types are ambiguous.

Highlights

Qwen-VL Compatibility: Introduced dynamic version checking for the qwen_vl_utils library to ensure compatibility with different versions, specifically for Qwen2.5-VL, Qwen3-VL, and Qwen3-MoE-VL models.
Vision Token Limit Adjustment: Added a new utility function, compat_qwen_vl_utils, which calculates and sets environment variables for image and video token limits based on image patch size and spatial merge size, crucial for proper vision model operation.
Improved Error Messaging: Enhanced the error message for cases where model_type cannot be automatically determined during model registration, providing clearer guidance to the user.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces compatibility changes for qwen_vl_utils, particularly for qwen2.5-vl. It adds a compatibility function compat_qwen_vl_utils to handle different versions of qwen_vl_utils by setting environment variables based on pixel limits. The changes also include version checking logic and updates to model initialization functions to use this new compatibility layer. Additionally, an error message in swift/llm/model/register.py is improved for clarity, and a minor optimization is made in swift/megatron/trainers/base.py. The changes look good overall. I have one suggestion to refactor some repetitive code for better maintainability.

gemini-code-assist · 2025-11-13T08:30:15Z

swift/llm/model/model/qwen.py

+    spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))
+    image_factor = image_patch_size * spatial_merge_size
+    if os.getenv('MAX_PIXELS'):
+        os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)
+    if os.getenv('MIN_PIXELS'):
+        os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MAX_PIXELS'):
+        os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MIN_PIXELS'):
+        os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)


This function contains repetitive logic for handling different environment variables. To improve readability and maintainability, you can refactor this into a loop using a mapping dictionary. This will make the code more concise and easier to extend in the future.

Suggested change

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor = image_patch_size * spatial_merge_size

if os.getenv('MAX_PIXELS'):

os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)

if os.getenv('MIN_PIXELS'):

os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MAX_PIXELS'):

os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MIN_PIXELS'):

os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor = image_patch_size * spatial_merge_size

image_factor_sq = image_factor**2

env_map = {

'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM',

'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM',

'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM',

'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM',

}

for src_key, dest_key in env_map.items():

value = os.getenv(src_key)

if value:

os.environ[dest_key] = str(int(value) // image_factor_sq)

gemini-code-assist · 2025-11-13T08:30:30Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces crucial compatibility enhancements for Qwen-VL models by intelligently adapting to different versions of the qwen_vl_utils library. It ensures that visual token limits are correctly configured based on the specific qwen_vl_utils version and model type, preventing potential issues. Additionally, it refines an error message in the model registration module to offer clearer instructions to users.

Highlights

qwen_vl_utils Version Compatibility: Implemented dynamic version checking for qwen_vl_utils to ensure compatibility with Qwen-VL models, specifically handling versions >=0.0.14 and <0.0.12.
Dynamic Token Limit Adjustment: Introduced a new utility function, compat_qwen_vl_utils, to adjust image and video token limits based on model patch size and environment variables, improving flexibility for different Qwen-VL model configurations.
Enhanced Model Registration Error Message: Updated the error message in the model registration process to provide more explicit guidance when automatic model_type detection fails.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds compatibility for different versions of qwen_vl_utils by introducing version checks and a new compatibility function. It also includes a minor improvement to an error message and a beneficial refactoring in _all_reduce_metric for better clarity and robustness. The changes are generally good, but I have one suggestion to refactor the new compat_qwen_vl_utils function to reduce code duplication and improve maintainability.

gemini-code-assist · 2025-11-13T08:34:28Z

swift/llm/model/model/qwen.py

+def compat_qwen_vl_utils(image_patch_size: int):
+    spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))
+    image_factor = image_patch_size * spatial_merge_size
+    if os.getenv('MAX_PIXELS'):
+        os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)
+    if os.getenv('MIN_PIXELS'):
+        os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MAX_PIXELS'):
+        os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MIN_PIXELS'):
+        os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)


The function compat_qwen_vl_utils contains several repetitive blocks for setting environment variables. This can be refactored into a loop over a configuration dictionary to reduce code duplication, which improves readability and makes it easier to add or modify environment variable mappings in the future.

Suggested change

def compat_qwen_vl_utils(image_patch_size: int):

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor = image_patch_size * spatial_merge_size

if os.getenv('MAX_PIXELS'):

os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)

if os.getenv('MIN_PIXELS'):

os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MAX_PIXELS'):

os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MIN_PIXELS'):

os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)

def compat_qwen_vl_utils(image_patch_size: int):

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor_sq = (image_patch_size * spatial_merge_size)**2

env_map = {

'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM',

'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM',

'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM',

'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM',

}

for src_env, dest_env in env_map.items():

pixel_val_str = os.getenv(src_env)

if pixel_val_str:

os.environ[dest_env] = str(int(pixel_val_str) // image_factor_sq)

gemini-code-assist

Code Review

This pull request introduces compatibility for newer versions of qwen_vl_utils by adding a compatibility function and updating version checks. The changes also include a more informative error message and a minor code improvement in Megatron. My review includes a suggestion to refactor the new compatibility function for better readability and to avoid redundant environment variable lookups.

gemini-code-assist · 2025-11-13T08:36:12Z

swift/llm/model/model/qwen.py

+def compat_qwen_vl_utils(image_patch_size: int):
+    spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))
+    image_factor = image_patch_size * spatial_merge_size
+    if os.getenv('MAX_PIXELS'):
+        os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)
+    if os.getenv('MIN_PIXELS'):
+        os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MAX_PIXELS'):
+        os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)
+    if os.getenv('VIDEO_MIN_PIXELS'):
+        os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)


The function compat_qwen_vl_utils repeatedly calls os.getenv for the same environment variables. This can be refactored to be more concise and efficient by retrieving each environment variable's value only once and using a loop to process them.

Suggested change

def compat_qwen_vl_utils(image_patch_size: int):

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor = image_patch_size * spatial_merge_size

if os.getenv('MAX_PIXELS'):

os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2)

if os.getenv('MIN_PIXELS'):

os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MAX_PIXELS'):

os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2)

if os.getenv('VIDEO_MIN_PIXELS'):

os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2)

def compat_qwen_vl_utils(image_patch_size: int):

spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2'))

image_factor = image_patch_size * spatial_merge_size

env_vars_to_process = {

'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM',

'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM',

'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM',

'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM',

}

for source_var, target_var in env_vars_to_process.items():

value = os.getenv(source_var)

if value:

os.environ[target_var] = str(int(value) // image_factor**2)

qwen2.5-vl compat qwen_vl_utils version

ce5e3d4

Jintao-Huang linked an issue Nov 13, 2025 that may be closed by this pull request

Support Qwen3-VL and Qwen2.5-VL in the same environment #6432

Closed

revert

2a45a2c

hjh0119 approved these changes Nov 13, 2025

View reviewed changes

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

update

4b653ee

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

Jintao-Huang added 2 commits November 13, 2025 16:36

update

5776f0d

update

70a4f90

Jintao-Huang merged commit f031e4e into modelscope:main Nov 13, 2025
1 of 2 checks passed

Jintao-Huang added a commit that referenced this pull request Nov 16, 2025

qwen2.5-vl compat qwen_vl_utils 0.14.0 (#6584)

9e14588

vx120 pushed a commit to vx120/ms-swift that referenced this pull request Nov 19, 2025

qwen2.5-vl compat qwen_vl_utils 0.14.0 (modelscope#6584)

9d67dbb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

qwen2.5-vl compat qwen_vl_utils version #6584

qwen2.5-vl compat qwen_vl_utils version #6584

Jintao-Huang commented Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qwen2.5-vl compat qwen_vl_utils version #6584

qwen2.5-vl compat qwen_vl_utils version #6584

Conversation

Jintao-Huang commented Nov 13, 2025

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants