Skip to content

Conversation

@Jintao-Huang
Copy link
Collaborator

No description provided.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a targeted bugfix for the Qwen3-VL video processing pipeline. The core change involves refining the data flow within the _encode method of the Qwen template by ensuring that input_ids and attention_mask are properly removed from the media_inputs dictionary once their immediate purpose is served. This adjustment aims to stabilize video handling and prevent errors that might arise from these keys persisting in the media input structure.

Highlights

  • Bugfix for Qwen3-VL Videos: This pull request resolves a bug specifically affecting the processing of videos within the Qwen3-VL model implementation.
  • Media Input Cleanup: The change involves explicitly removing 'input_ids' and 'attention_mask' from the 'media_inputs' dictionary after they have been utilized for token splitting, preventing potential downstream conflicts or incorrect data handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in the _encode method for Qwen3VLTemplate when processing videos. The change correctly removes input_ids and attention_mask from the media_inputs dictionary, preventing them from overwriting the main token sequences. This is a necessary and correct fix. I have one suggestion to improve the code's robustness.

Comment on lines 475 to 476
media_inputs.pop('input_ids')
media_inputs.pop('attention_mask')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To improve robustness, it's better to use pop(key, None) instead of pop(key). This will prevent a KeyError if input_ids or attention_mask are not present in media_inputs for some reason, for example, due to future changes in the processor's behavior. This safer pattern is already used in other parts of the codebase, such as in swift/llm/template/template/glm.py.

Suggested change
media_inputs.pop('input_ids')
media_inputs.pop('attention_mask')
media_inputs.pop('input_ids', None)
media_inputs.pop('attention_mask', None)

@Jintao-Huang Jintao-Huang merged commit 2223b92 into modelscope:main Oct 16, 2025
1 of 2 checks passed
Jintao-Huang added a commit that referenced this pull request Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant