This repository provides the official implementation of the paper:
Causally Robust Reward Learning from Reason-Augmented Preference Feedback
Minjune Hwang, Yigit Korkmaz, Daniel Seita†, Erdem Bıyık†
ICLR 2026
PbRL is widely used for shaping agent behavior to match a user's preference, yet its sparse binary feedback makes it vulnerable to causal confusion. We introduce ReCouPLe, a lightweight framework that uses natural language rationales to clarify true causal signals behind preference and to improve generalization, by employing orthogonal decomposition.
Under Development