Skip to content

The RL loop should print or log a few example prompt/response/targets #187

@allenwang28

Description

@allenwang28

We're using the reward/loss as our proxy for RL correctness, which is understandable. But for understanding whether or not the RL loop isn't reward hacking, it would be helpful to additionally log a few of the prompt/responses/targets per step (?) in wandb and/or logger.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions