Skip to content

support gemma4 vllm multi-modal inference#9105

Merged
hjh0119 merged 5 commits intomodelscope:mainfrom
hjh0119:gemma4-vllm
Apr 15, 2026
Merged

support gemma4 vllm multi-modal inference#9105
hjh0119 merged 5 commits intomodelscope:mainfrom
hjh0119:gemma4-vllm

Conversation

@hjh0119
Copy link
Copy Markdown
Collaborator

@hjh0119 hjh0119 commented Apr 15, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for vllm mode within the Gemma4Template, including environment-based configuration for video and image processors and specialized tag replacement logic for multimodal inputs. It also updates the load_audio utility to handle string paths more effectively. A critical issue was identified in the replace_tag method where load_audio is called unconditionally; this will lead to a crash in vllm mode because the audio data is already pre-processed into a format incompatible with librosa.load.

Comment thread swift/template/templates/gemma.py Outdated
@hjh0119 hjh0119 merged commit 89cfba4 into modelscope:main Apr 15, 2026
2 of 3 checks passed
@hjh0119 hjh0119 deleted the gemma4-vllm branch April 15, 2026 07:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants