[trainer] optimize use_logits_to_keep#9194
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a global version check for transformers >= 5.0.0 and refactors the logic in get_use_logits_to_keep to improve readability and handle multimodal model compatibility. The review feedback suggests caching the result of this logic in self.args.use_logits_to_keep to avoid redundant and expensive inspect.signature calls during training.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a global version check for transformers >= 5.0.0 and refactors the get_use_logits_to_keep logic within SwiftMixin to include more explicit conditional checks for multimodal models and the presence of the logits_to_keep parameter. Feedback suggests that the model should be unwrapped using unwrap_model before inspecting its signature to ensure compatibility with distributed training wrappers like DDP or FSDP. Additionally, it is recommended to use logger.info_once instead of logger.info to avoid duplicate log entries in distributed environments.
No description provided.