This repository contains code for the ICAART 2026 paper: "Safe Reward Learning from Human Preferences and Justifications".
Step 1: Learn a world model from real-world trajectories
Step 2: Create dream user trajectories in one-shot
Step 3: Provide human preferences and justifications to build a reward model
Step 4: Deploy with Model Predictive Control
The code was tested with the following versions:
- Python 3.8.19
- PyTorch 1.9.0
- Gym 0.20.0
- box2d-py 2.3.5
- NumPy 1.19.2
- CUDA Toolkit 10.2.89
Note:
box2d-pymay require manual installation on some systems (e.g. via pip or conda-forge). The provideddropj.ymlcontains the full environment for reproducibility.
You may try using gymnasium instead of gym. Minor adjustments might be needed (e.g. file paths and environment naming), but functionality is expected to remain equivalent.
After setting up the environment:
-
For Car Racing: Replace the installed
car_racing.pyfile in your Gym (or Gymnasium) installation withinstall/car_racing.py. -
For Obstacle Car Racing:
- Copy
install/obst_car_racing.pyandinstall/obst_car_dynamics.pyinto thegym/envs/box2d/(or equivalent) directory of your Gym (or Gymnasium) installation. - Register the environment by adding the following line to
gym/envs/box2d/__init__.py:from gym.envs.box2d.obst_car_racing import ObstCarRacing
- Copy
Implements DROP, DROPe, and DROPJ:
- Set any of the flags at the beginning to do a step of DROPJ from scratch:
DO_STEP1_FROM_SCRATCHDO_STEP2_FROM_SCRATCHDO_STEP3_FROM_SCRATCH - If
DO_STEP1_FROM_SCRATCH = True, set the following to extract real-world user trajectories from scratch:EXTRACT_RAW_USER_ROLLOUT. Otherwise, unzip compressedraw_user_rollouts*.pklfile and use pre-collectedraw_user_rollouts*.pkl(Attention with large uncompressed sizes -- see below "Attention - large uncompressed size") - If
DO_STEP3_FROM_SCRATCH = True, set the following to create new preference queries:CREATE_NEW_QUERIES. -
extract_user_trajs()creates real-world trajectories (trajectories$\mathcal{R}$ ), used to train the VAE (encoder) invae_model.pyand MDN-RNN (dynamics model) indynamics_model.py- Recommended to generate in batches to avoid memory issues
-
extract_dream_user_trajs()creates dream trajectories (trajectories$\mathcal{D}$ ) - Preference queries can be created with
create_pref_queries() - Use
pref_gui.pyto provide preferences and justifications - Train the reward model using
reward_model.learn_with_GUI() - The trained world model and reward models are provided for direct evaluation
To select variants:
- DROP:
use_justifications=False,use_equal_in_plain_prefs=False - DROPe:
use_justifications=False,use_equal_in_plain_prefs=True - DROPJ:
use_justifications=True(second flag arbitrary)
GUI for collecting preferences and justifications used in DROP/DROPe/DROPJ.
Implements and evaluates ReQueST:
- Generation of trajectory segments and user feedback in iterative feedback sessions
- Requires
default_init_obses_ep600enc_15.pklas input - Builds reward model using sparse user feedback in
reward_model_sparse.py
Implements and evaluates DROS:
- Segments generated from
dream_user_rollouts.pkl - One-shot collection of responses and reward model training, again from sparse labels via
reward_model_sparse.py
Implements DROPJ with multiple justifications:
- Flags to do Steps from scratch,
extract_user_trajs(),extract_dream_user_trajs(), andcreate_pref_queries()are similar to Car Racing - Use
obst_pref_gui.pyto provide preferences with multiple justifications - Train the reward model using
reward_model.learn_with_GUI_multi_justs()
Selecting obstacle variants:
Set the obstacle mode in obst_dropj.py:
OBST_MODE = 'chuckc' # only chuckholes
# OR
OBST_MODE = 'chuckccar' # chuckholes and carsConfigure the appropriate flags in obst_car_racing.py:
# for 'chuckc' mode
ONLY_CHUCKHOLES = True
CHUCKHOLES_CARS = False
# for 'chuckccar' mode
ONLY_CHUCKHOLES = False
CHUCKHOLES_CARS = TrueUsing justification weights:
Use different justification weights prioritising specific safety aspects.
if OBST_MODE == 'chuckc':
# Prioritising chuckholes avoidance
w_def = 0.75
w_grass = 0.95
w_chuck = 1.0
just_weights = {
'Default': w_def,
'Grass': w_grass,
'Chuckhole': w_chuck
}
# Prioritising grass avoidance
elif OBST_MODE == 'chuckccar':
w_def = 1.0
w_grass = 1.0
w_chuck = 0.98
w_car = 0.98
just_weights = {
'Default': w_def,
'Grass': w_grass,
'Chuckhole': w_chuck,
'Car': w_car
}GUI for collecting preferences and justifications used in DROPJ with multiple justifications.
Example videos corresponding to the paper figures are available in the assets/ folder.
▶ Download and watch the demo with examples during deployment
All data and pretrained models are available in the dataset of the project at Project Dataset (DOI: 10.5258/SOTON/D3749).
You can either use the provided data and pretrained models or extract everything from scratch based on the flags described above.
For more details please refer to README_DROPJ-DATA.txt included in the dataset repository.
The following archives from the dataset are small to download, but expand to tens of gigabytes when extracted:
- raw_user_rollouts_res84_ep600.zip -> ~25 GB after extraction
- raw_user_rollouts_res128_ep600_chuckcobst.zip -> ~59 GB after extraction
- raw_user_rollouts_res128_ep800_chuckccarobst.zip -> ~79 GB after extraction
If you have limited space in disk, you can either use the pretrained VAE and MDN-RNN models, or extract less trajectories following STEP 1 in dropj.py or obst_dropj.py.
Docs note: Docstrings were AI-drafted and then reviewed and tested by the author.
