[bugfix] fix qwen3_vl videos #6162

Jintao-Huang · 2025-10-16T03:44:51Z

No description provided.

gemini-code-assist · 2025-10-16T03:45:02Z

Summary of Changes

Hello @Jintao-Huang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a targeted bugfix for the Qwen3-VL video processing pipeline. The core change involves refining the data flow within the _encode method of the Qwen template by ensuring that input_ids and attention_mask are properly removed from the media_inputs dictionary once their immediate purpose is served. This adjustment aims to stabilize video handling and prevent errors that might arise from these keys persisting in the media input structure.

Highlights

Bugfix for Qwen3-VL Videos: This pull request resolves a bug specifically affecting the processing of videos within the Qwen3-VL model implementation.
Media Input Cleanup: The change involves explicitly removing 'input_ids' and 'attention_mask' from the 'media_inputs' dictionary after they have been utilized for token splitting, preventing potential downstream conflicts or incorrect data handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request fixes a bug in the _encode method for Qwen3VLTemplate when processing videos. The change correctly removes input_ids and attention_mask from the media_inputs dictionary, preventing them from overwriting the main token sequences. This is a necessary and correct fix. I have one suggestion to improve the code's robustness.

gemini-code-assist · 2025-10-16T03:46:39Z

swift/llm/template/template/qwen.py

+                    media_inputs.pop('input_ids')
+                    media_inputs.pop('attention_mask')


To improve robustness, it's better to use pop(key, None) instead of pop(key). This will prevent a KeyError if input_ids or attention_mask are not present in media_inputs for some reason, for example, due to future changes in the processor's behavior. This safer pattern is already used in other parts of the codebase, such as in swift/llm/template/template/glm.py.

Suggested change

media_inputs.pop('input_ids')

media_inputs.pop('attention_mask')

media_inputs.pop('input_ids', None)

media_inputs.pop('attention_mask', None)

fix qwen3_vl videos

3afbace

gemini-code-assist bot reviewed Oct 16, 2025

View reviewed changes

Jintao-Huang added 2 commits October 16, 2025 13:08

fix

221535b

Merge branch 'main' into fix_qwen3_vl_video

fe09ee6

Jintao-Huang merged commit 2223b92 into modelscope:main Oct 16, 2025
1 of 2 checks passed

Jintao-Huang added a commit that referenced this pull request Oct 16, 2025

[bugfix] fix qwen3_vl videos (#6162)

cf073f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bugfix] fix qwen3_vl videos #6162

[bugfix] fix qwen3_vl videos #6162

Uh oh!

Jintao-Huang commented Oct 16, 2025

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		media_inputs.pop('input_ids')
		media_inputs.pop('attention_mask')

[bugfix] fix qwen3_vl videos #6162

[bugfix] fix qwen3_vl videos #6162

Uh oh!

Conversation

Jintao-Huang commented Oct 16, 2025

Uh oh!

gemini-code-assist bot commented Oct 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant