-
Notifications
You must be signed in to change notification settings - Fork 998
qwen2.5-vl compat qwen_vl_utils version #6584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qwen2.5-vl compat qwen_vl_utils version #6584
Conversation
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request primarily focuses on enhancing the compatibility and robustness of the system, particularly concerning the integration of Qwen-VL models with varying versions of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on improving the robustness and compatibility of Qwen-VL models within the system. It implements a version-aware mechanism to handle different Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces compatibility changes for qwen_vl_utils, particularly for qwen2.5-vl. It adds a compatibility function compat_qwen_vl_utils to handle different versions of qwen_vl_utils by setting environment variables based on pixel limits. The changes also include version checking logic and updates to model initialization functions to use this new compatibility layer. Additionally, an error message in swift/llm/model/register.py is improved for clarity, and a minor optimization is made in swift/megatron/trainers/base.py. The changes look good overall. I have one suggestion to refactor some repetitive code for better maintainability.
swift/llm/model/model/qwen.py
Outdated
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | ||
| image_factor = image_patch_size * spatial_merge_size | ||
| if os.getenv('MAX_PIXELS'): | ||
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('MIN_PIXELS'): | ||
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MAX_PIXELS'): | ||
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MIN_PIXELS'): | ||
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function contains repetitive logic for handling different environment variables. To improve readability and maintainability, you can refactor this into a loop using a mapping dictionary. This will make the code more concise and easier to extend in the future.
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor = image_patch_size * spatial_merge_size | |
| if os.getenv('MAX_PIXELS'): | |
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('MIN_PIXELS'): | |
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MAX_PIXELS'): | |
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MIN_PIXELS'): | |
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) | |
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor = image_patch_size * spatial_merge_size | |
| image_factor_sq = image_factor**2 | |
| env_map = { | |
| 'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM', | |
| 'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM', | |
| 'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM', | |
| 'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM', | |
| } | |
| for src_key, dest_key in env_map.items(): | |
| value = os.getenv(src_key) | |
| if value: | |
| os.environ[dest_key] = str(int(value) // image_factor_sq) |
Summary of ChangesHello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces crucial compatibility enhancements for Qwen-VL models by intelligently adapting to different versions of the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds compatibility for different versions of qwen_vl_utils by introducing version checks and a new compatibility function. It also includes a minor improvement to an error message and a beneficial refactoring in _all_reduce_metric for better clarity and robustness. The changes are generally good, but I have one suggestion to refactor the new compat_qwen_vl_utils function to reduce code duplication and improve maintainability.
swift/llm/model/model/qwen.py
Outdated
| def compat_qwen_vl_utils(image_patch_size: int): | ||
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | ||
| image_factor = image_patch_size * spatial_merge_size | ||
| if os.getenv('MAX_PIXELS'): | ||
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('MIN_PIXELS'): | ||
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MAX_PIXELS'): | ||
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MIN_PIXELS'): | ||
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function compat_qwen_vl_utils contains several repetitive blocks for setting environment variables. This can be refactored into a loop over a configuration dictionary to reduce code duplication, which improves readability and makes it easier to add or modify environment variable mappings in the future.
| def compat_qwen_vl_utils(image_patch_size: int): | |
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor = image_patch_size * spatial_merge_size | |
| if os.getenv('MAX_PIXELS'): | |
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('MIN_PIXELS'): | |
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MAX_PIXELS'): | |
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MIN_PIXELS'): | |
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) | |
| def compat_qwen_vl_utils(image_patch_size: int): | |
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor_sq = (image_patch_size * spatial_merge_size)**2 | |
| env_map = { | |
| 'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM', | |
| 'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM', | |
| 'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM', | |
| 'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM', | |
| } | |
| for src_env, dest_env in env_map.items(): | |
| pixel_val_str = os.getenv(src_env) | |
| if pixel_val_str: | |
| os.environ[dest_env] = str(int(pixel_val_str) // image_factor_sq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces compatibility for newer versions of qwen_vl_utils by adding a compatibility function and updating version checks. The changes also include a more informative error message and a minor code improvement in Megatron. My review includes a suggestion to refactor the new compatibility function for better readability and to avoid redundant environment variable lookups.
swift/llm/model/model/qwen.py
Outdated
| def compat_qwen_vl_utils(image_patch_size: int): | ||
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | ||
| image_factor = image_patch_size * spatial_merge_size | ||
| if os.getenv('MAX_PIXELS'): | ||
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('MIN_PIXELS'): | ||
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MAX_PIXELS'): | ||
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | ||
| if os.getenv('VIDEO_MIN_PIXELS'): | ||
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function compat_qwen_vl_utils repeatedly calls os.getenv for the same environment variables. This can be refactored to be more concise and efficient by retrieving each environment variable's value only once and using a loop to process them.
| def compat_qwen_vl_utils(image_patch_size: int): | |
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor = image_patch_size * spatial_merge_size | |
| if os.getenv('MAX_PIXELS'): | |
| os.environ['IMAGE_MAX_TOKEN_NUM'] = str(int(os.getenv('MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('MIN_PIXELS'): | |
| os.environ['IMAGE_MIN_TOKEN_NUM'] = str(int(os.getenv('MIN_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MAX_PIXELS'): | |
| os.environ['VIDEO_MAX_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MAX_PIXELS')) // image_factor**2) | |
| if os.getenv('VIDEO_MIN_PIXELS'): | |
| os.environ['VIDEO_MIN_TOKEN_NUM'] = str(int(os.getenv('VIDEO_MIN_PIXELS')) // image_factor**2) | |
| def compat_qwen_vl_utils(image_patch_size: int): | |
| spatial_merge_size = int(os.getenv('SPATIAL_MERGE_SIZE', '2')) | |
| image_factor = image_patch_size * spatial_merge_size | |
| env_vars_to_process = { | |
| 'MAX_PIXELS': 'IMAGE_MAX_TOKEN_NUM', | |
| 'MIN_PIXELS': 'IMAGE_MIN_TOKEN_NUM', | |
| 'VIDEO_MAX_PIXELS': 'VIDEO_MAX_TOKEN_NUM', | |
| 'VIDEO_MIN_PIXELS': 'VIDEO_MIN_TOKEN_NUM', | |
| } | |
| for source_var, target_var in env_vars_to_process.items(): | |
| value = os.getenv(source_var) | |
| if value: | |
| os.environ[target_var] = str(int(value) // image_factor**2) |
No description provided.