support gemma4 vllm multi-modal inference by hjh0119 · Pull Request #9105 · modelscope/ms-swift

hjh0119 · 2026-04-15T02:11:50Z

No description provided.

gemini-code-assist

Code Review

This pull request introduces support for vllm mode within the Gemma4Template, including environment-based configuration for video and image processors and specialized tag replacement logic for multimodal inputs. It also updates the load_audio utility to handle string paths more effectively. A critical issue was identified in the replace_tag method where load_audio is called unconditionally; this will lead to a crash in vllm mode because the audio data is already pre-processed into a format incompatible with librosa.load.

hjh0119 added 2 commits April 14, 2026 22:58

wip

85340cf

fix

bdb5aff

gemini-code-assist bot reviewed Apr 15, 2026

View reviewed changes

Comment thread swift/template/templates/gemma.py Outdated

hjh0119 added 3 commits April 15, 2026 11:17

clean

4b2616d

clean import

73f83bf

clean

0a1ab55

Jintao-Huang approved these changes Apr 15, 2026

View reviewed changes

hjh0119 merged commit 89cfba4 into modelscope:main Apr 15, 2026
2 of 3 checks passed

hjh0119 deleted the gemma4-vllm branch April 15, 2026 07:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support gemma4 vllm multi-modal inference#9105

support gemma4 vllm multi-modal inference#9105
hjh0119 merged 5 commits intomodelscope:mainfrom
hjh0119:gemma4-vllm

hjh0119 commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hjh0119 commented Apr 15, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants