support multi-modal training for gkd teacher api#9197
Conversation
There was a problem hiding this comment.
Code Review
This pull request enhances the GKD (Generalized Knowledge Distillation) trainers to support multimodal inputs when fetching teacher logprobs from an external API. It refactors the logprob fetching logic to handle chat completions and introduces alignment mechanisms to map teacher outputs back to the local sequence grid. The review feedback identifies a significant issue where the alignment logic assumes response tokens are at the end of the sequence, which fails when right padding is used. Additionally, the feedback points out a regression in off-policy OPSD support and several instances of redundant deepcopy calls that should be optimized.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces multimodal support for Generalized Knowledge Distillation (GKD) by enabling the trainer to handle images, audios, and videos when fetching teacher logprobs. Key changes include updating the Megatron and RLHF trainers to pass raw message data to the teacher API, implementing alignment logic for teacher logprobs when tokenization lengths differ, and refactoring the fetch_teacher_logprobs utility to support chat completion endpoints. The review feedback suggests updating the documentation for the modified utility function and adding a safety check to handle potential null results from failed API calls during tensor batching.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for multimodal inputs (images, audio, and video) when fetching teacher logprobs from an external API in Generalized Knowledge Distillation (GKD). It updates both Megatron and RLHF trainers to process raw message data and align teacher logprobs with student sequences. However, critical issues were identified in the logprob alignment logic, including a potential IndexError when no response tokens are found and incorrect alignment when padding is present. Furthermore, the changes introduce a regression by removing On-Policy Self-Distillation (OPSD) support in off-policy mode.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces support for multimodal inputs in the GKD (Generalized Knowledge Distillation) trainer. Key changes include updating _compute_teacher_logits_from_api and fetch_teacher_logprobs to handle multimodal data via chat completion APIs, implementing logic to align teacher logprobs with local sequence grids when lengths differ, and ensuring multimodal messages are correctly tracked and passed through the training pipeline. Feedback highlights a logic error in a conditional check, opportunities to reduce code duplication by refactoring alignment logic into a shared helper, and a performance improvement by moving imports to the top level.
No description provided.