support multi-modal training for gkd teacher api by hjh0119 · Pull Request #9197 · modelscope/ms-swift

hjh0119 · 2026-04-23T12:45:23Z

No description provided.

gemini-code-assist

Code Review

This pull request enhances the GKD (Generalized Knowledge Distillation) trainers to support multimodal inputs when fetching teacher logprobs from an external API. It refactors the logprob fetching logic to handle chat completions and introduces alignment mechanisms to map teacher outputs back to the local sequence grid. The review feedback identifies a significant issue where the alignment logic assumes response tokens are at the end of the sequence, which fails when right padding is used. Additionally, the feedback points out a regression in off-policy OPSD support and several instances of redundant deepcopy calls that should be optimized.

hjh0119 · 2026-04-24T03:12:20Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces multimodal support for Generalized Knowledge Distillation (GKD) by enabling the trainer to handle images, audios, and videos when fetching teacher logprobs. Key changes include updating the Megatron and RLHF trainers to pass raw message data to the teacher API, implementing alignment logic for teacher logprobs when tokenization lengths differ, and refactoring the fetch_teacher_logprobs utility to support chat completion endpoints. The review feedback suggests updating the documentation for the modified utility function and adding a safety check to handle potential null results from failed API calls during tensor batching.

hjh0119 · 2026-04-24T08:25:11Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for multimodal inputs (images, audio, and video) when fetching teacher logprobs from an external API in Generalized Knowledge Distillation (GKD). It updates both Megatron and RLHF trainers to process raw message data and align teacher logprobs with student sequences. However, critical issues were identified in the logprob alignment logic, including a potential IndexError when no response tokens are found and incorrect alignment when padding is present. Furthermore, the changes introduce a regression by removing On-Policy Self-Distillation (OPSD) support in off-policy mode.

hjh0119 · 2026-04-24T09:25:21Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces support for multimodal inputs in the GKD (Generalized Knowledge Distillation) trainer. Key changes include updating _compute_teacher_logits_from_api and fetch_teacher_logprobs to handle multimodal data via chat completion APIs, implementing logic to align teacher logprobs with local sequence grids when lengths differ, and ensuring multimodal messages are correctly tracked and passed through the training pipeline. Feedback highlights a logic error in a conditional check, opportunities to reduce code duplication by refactoring alignment logic into a shared helper, and a performance improvement by moving imports to the top level.

support multi-modal gkd teacher api

f66325d

hjh0119 linked an issue Apr 23, 2026 that may be closed by this pull request

GKD在使用vllm api-server部署教师模型时，似乎对多模态输入的处理有问题 #9011

Closed

1 task

gemini-code-assist Bot reviewed Apr 23, 2026

View reviewed changes

gemini

212864b

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated

Comment thread swift/rlhf_trainers/gkd_trainer.py

fix

566eb12

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread swift/megatron/trainers/gkd_trainer.py

Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated

Comment thread swift/rlhf_trainers/gkd_trainer.py

fix

aebe37f

gemini-code-assist Bot reviewed Apr 24, 2026

View reviewed changes

Comment thread swift/megatron/trainers/gkd_trainer.py Outdated

Comment thread swift/rlhf_trainers/gkd_trainer.py

Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated

fix import

b3800a2

Jintao-Huang approved these changes Apr 25, 2026

View reviewed changes

hjh0119 merged commit 514ce7c into modelscope:main Apr 25, 2026
1 of 3 checks passed

hjh0119 deleted the mm-tea-api branch April 25, 2026 06:41

Jintao-Huang pushed a commit that referenced this pull request Apr 25, 2026

support multi-modal training for gkd teacher api (#9197)

c989704

Conversation

hjh0119 commented Apr 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hjh0119 commented Apr 24, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants