Skip to content

Code accompanying the paper "Interactively Learning Preference Constraints in Linear Bandits" (ICML 2022).


Notifications You must be signed in to change notification settings


Repository files navigation

Interactively Learning Preference Constraints in Linear Bandits

This repository contains source code to reproduce experiments in the paper "Interactively Learning Preference Constraints in Linear Bandits". The code is provided as is and will not be maintained. Here we provide instructions on how to set up and run the code to reproduce the experiments reported in the paper.


David Lindner, Sebastian Tschiatschek, Katja Hofmann, and Andreas Krause . Interactively Learning Preference Constraints in Linear Bandits. In International Conference on Machine Learning (ICML), 2022.

    title={Interactively Learning Preference Constraints in Linear Bandits},
    author={Lindner, David and Tschiatschek, Sebastian and Hofmann, Katja and Krause, Andreas},
    booktitle={International Conference on Machine Learning (ICML)},


We recommend using Anaconda to set up an environment with the dependencies of this repository. After installing Anaconda, the following commands set up the environment:

conda create -n cbai python=3.9
conda activate cbai
pip install -e .

Reproducing the Experiments

The main code for running experiments is in We use sacred for tracking parameters and results, and we provide all necessary config files to reproduce the experiments presented in the paper. We provide a utility script called to run experiments for a specific config file (the experiments can optionally be parallelized). Results can be plotted using Here, we provide the exact commands that reproduce the results in a specific section, including necessary preparation.

Synthetic Experiments (Section 4.1)

The following four commands reproduce the synthetic experiments:

python --config experiment_configs/synthetic/irrelevant_dimensions_eps.json --n_jobs 1
python --config experiment_configs/synthetic/irrelevant_dimensions_dim.json --n_jobs 1
python --config experiment_configs/synthetic/unit_sphere_arms.json --n_jobs 1
python --config experiment_configs/synthetic/unit_sphere_dim.json --n_jobs 1

Comparing ACOL to Regret Minimization (Section 4.2)

The following command compares G-ACOL to MaxRew-F and MaxRew-U in the 1-dimensional test case:

python --config experiment_configs/synthetic/regret_comparison.json --n_jobs 1

Preference Learning Experiments (Section 4.3)

For the driving experiments, it is necessary to first compute a set of candidate driving policies that correspond to the arms of a CBAI problem. This can be done using the scripts/ script. However, for convenience we also provide a set of precomputed policies in driver_candidate_policies_1000.pkl; so, this step can be skipped.

Once the candidate policies are available, the constraint learning experiments can be run with the following commands:

python --config experiment_configs/driver/driver_target_velocity_theory.json --n_jobs 1
python --config experiment_configs/driver/driver_target_velocity_tuned.json --n_jobs 1
python --config experiment_configs/driver/driver_target_location_theory.json --n_jobs 1
python --config experiment_configs/driver/driver_target_location_tuned.json --n_jobs 1
python --config experiment_configs/driver/driver_blocked_theory.json --n_jobs 1
python --config experiment_configs/driver/driver_blocked_tuned.json --n_jobs 1


Code accompanying the paper "Interactively Learning Preference Constraints in Linear Bandits" (ICML 2022).







No releases published