Skip to content

GRPO trainer 中的 max length 判断疑似存在逻辑漏洞 #6308

@zhenhaoyong

Description

@zhenhaoyong

在当前 grpo_trainer.py 中当使用 _dynamic_sampling 重采样时没有再次对 max_length 做判断,如果 self.template.truncation_strategy == 'raise',有概率采样到超长 inputs 并在 _prepare_inputs 中报错,应在 inputs = next(self.dynamic_resample_iterator) 后添加 self.resample_encode_failed_inputs(inputs)

inputs = next(self.dynamic_resample_iterator)
inputs = Trainer._prepare_inputs(self, inputs)
inputs = self._generate_completions(inputs)
rewards_per_func = self._score_completions(inputs)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions