GRPO trainer 中的 max length 判断疑似存在逻辑漏洞

在当前 grpo_trainer.py 中当使用 `_dynamic_sampling` 重采样时没有再次对 `max_length` 做判断，如果 `self.template.truncation_strategy == 'raise'`，有概率采样到超长 `inputs` 并在 `_prepare_inputs` 中报错，应在  `inputs = next(self.dynamic_resample_iterator)` 后添加 `self.resample_encode_failed_inputs(inputs)`
```
inputs = next(self.dynamic_resample_iterator)
inputs = Trainer._prepare_inputs(self, inputs)
inputs = self._generate_completions(inputs)
rewards_per_func = self._score_completions(inputs)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GRPO trainer 中的 max length 判断疑似存在逻辑漏洞 #6308

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GRPO trainer 中的 max length 判断疑似存在逻辑漏洞 #6308

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions