Skip to content

support multi-modal training for gkd teacher api#9197

Merged
hjh0119 merged 5 commits into
modelscope:mainfrom
hjh0119:mm-tea-api
Apr 25, 2026
Merged

support multi-modal training for gkd teacher api#9197
hjh0119 merged 5 commits into
modelscope:mainfrom
hjh0119:mm-tea-api

Conversation

@hjh0119
Copy link
Copy Markdown
Collaborator

@hjh0119 hjh0119 commented Apr 23, 2026

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the GKD (Generalized Knowledge Distillation) trainers to support multimodal inputs when fetching teacher logprobs from an external API. It refactors the logprob fetching logic to handle chat completions and introduces alignment mechanisms to map teacher outputs back to the local sequence grid. The review feedback identifies a significant issue where the alignment logic assumes response tokens are at the end of the sequence, which fails when right padding is used. Additionally, the feedback points out a regression in off-policy OPSD support and several instances of redundant deepcopy calls that should be optimized.

Comment thread swift/megatron/trainers/gkd_trainer.py Outdated
Comment thread swift/rlhf_trainers/gkd_trainer.py
Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated
Comment thread swift/megatron/trainers/gkd_trainer.py
Comment thread swift/rlhf_trainers/gkd_trainer.py
Comment thread swift/rlhf_trainers/gkd_trainer.py
@hjh0119
Copy link
Copy Markdown
Collaborator Author

hjh0119 commented Apr 24, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multimodal support for Generalized Knowledge Distillation (GKD) by enabling the trainer to handle images, audios, and videos when fetching teacher logprobs. Key changes include updating the Megatron and RLHF trainers to pass raw message data to the teacher API, implementing alignment logic for teacher logprobs when tokenization lengths differ, and refactoring the fetch_teacher_logprobs utility to support chat completion endpoints. The review feedback suggests updating the documentation for the modified utility function and adding a safety check to handle potential null results from failed API calls during tensor batching.

Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated
Comment thread swift/rlhf_trainers/gkd_trainer.py
@hjh0119
Copy link
Copy Markdown
Collaborator Author

hjh0119 commented Apr 24, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multimodal inputs (images, audio, and video) when fetching teacher logprobs from an external API in Generalized Knowledge Distillation (GKD). It updates both Megatron and RLHF trainers to process raw message data and align teacher logprobs with student sequences. However, critical issues were identified in the logprob alignment logic, including a potential IndexError when no response tokens are found and incorrect alignment when padding is present. Furthermore, the changes introduce a regression by removing On-Policy Self-Distillation (OPSD) support in off-policy mode.

Comment thread swift/megatron/trainers/gkd_trainer.py
Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated
Comment thread swift/rlhf_trainers/gkd_trainer.py
@hjh0119
Copy link
Copy Markdown
Collaborator Author

hjh0119 commented Apr 24, 2026

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for multimodal inputs in the GKD (Generalized Knowledge Distillation) trainer. Key changes include updating _compute_teacher_logits_from_api and fetch_teacher_logprobs to handle multimodal data via chat completion APIs, implementing logic to align teacher logprobs with local sequence grids when lengths differ, and ensuring multimodal messages are correctly tracked and passed through the training pipeline. Feedback highlights a logic error in a conditional check, opportunities to reduce code duplication by refactoring alignment logic into a shared helper, and a performance improvement by moving imports to the top level.

Comment thread swift/megatron/trainers/gkd_trainer.py Outdated
Comment thread swift/rlhf_trainers/gkd_trainer.py
Comment thread swift/rlhf_trainers/gkd_trainer.py Outdated
@hjh0119 hjh0119 merged commit 514ce7c into modelscope:main Apr 25, 2026
1 of 3 checks passed
@hjh0119 hjh0119 deleted the mm-tea-api branch April 25, 2026 06:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GKD在使用vllm api-server部署教师模型时,似乎对多模态输入的处理有问题

2 participants