Causally Robust Reward Learning from Reason-Augmented Preference Feedback

This repository provides the official implementation of the paper:

Causally Robust Reward Learning from Reason-Augmented Preference Feedback
Minjune Hwang, Yigit Korkmaz, Daniel Seita†, Erdem Bıyık†
ICLR 2026

PbRL is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it vulnerable to causal confusion. We introduce ReCouPLe, a lightweight framework that uses natural language rationales to clarify true causal signals behind preference and to improve generalization, by employing orthogonal decomposition.

Installation

Under Development

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Folders and files

Latest commit

History

Repository files navigation

Causally Robust Reward Learning from Reason-Augmented Preference Feedback

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Packages