This repository contains the implementation for the AoS paper "Testing Stationarity and Change Point Detection in Reinforcement Learning" in Python (and R for plotting).
We consider reinforcement learning (RL) methods in offline nonstationary environments. Many existing RL algorithms in the literature rely on the stationarity assumption that requires the system transition and the reward function to be constant over time. However, the stationarity assumption is restrictive in practice and is likely to be violated in a number of applications, including traffic signal control, robotics and mobile health. In this paper, we develop a consistent procedure to test the nonstationarity of the optimal policy based on pre-collected historical data, without additional online data collection. Based on the proposed test, we further develop a sequential change point detection method that can be naturally coupled with existing state-of-the-art RL methods for policy optimisation in nonstationary environments. The usefulness of our method is illustrated by theoretical results, simulation studies, and a real data example from the 2018 Intern Health Study (IHS).
Figure: P-value over different values of kappa (the number of time points from the last time point). The discounted factor equal 0.9 and 0.95, from top plots to bottom plots. The specialties corresponds to emergency, pediatrics and family practice, from left plots to right plots.
-
Folder
functions/: This folder contains all utility Python functions used in numerical experiments including simulation and data analysiscompute_test_statistics.pyimplements the proposed CUSUM-RL test of nonstationarity.evaluation.pyimplements the evaluation procedure. Specifically, it contains functions for estimating the optimal policy and estimating the value of the policy using fitted-Q evaluation.simulate_data_1d.pygenerates data in 1-dimensional simulation. It contains functions to simulate data in 4 scenarios of different transition and reward functions.simulate_data_real.pygenerated data in the IHS study simulation.
-
Folder
simulation_1d/: This folder contains the platform that realizes the 1-dimensional simulation in the paper. Numbers prefixing the names of the .py files indicate the order to realize the simulation scenarios. Files starting with a p in their names contain codes to generate plots in the paper.01_sim_1d_run.pysimulates 1-dimensional data and test for nonstationarity on a specified time interval. Usage:
python 01_sim_1d_run.py {seed} {kappa} {num_thread} {gamma} {trans_setting} {reward_setting} {N} {RBFSampler_random_seed}See the annotation in the script for the meanings of arguments. Example:
python 01_sim_1d_run.py 2 30 5 0.9 homo smooth 25 102_combine_p_values.pyaggregates p-value results from multiple random seeds in RBFSampler, with a specified quantile.03_sim_1d_changept_detection_isoreg.pyestimates change points using isotonic regression.04_sim_1d_changept_optvalue_run.pyestimates the optimal policies and values using different methods.p01_plot_combine_p_values.pycreates Figure 3 in paper of rejection probabilities.p02_plot_changept_dist.pycreates Figure 4 in paper of distribution of the estimated change points.p03_plot_changept_value.pycreates Figure 5 in paper of optimal values of different estimated policies.- To run the 1-dimensional simulation in sequence,
bash run_sim_1d.sh
- To run the 1-dimensional testing on a cluster and submit the simulation jobs using
slurmby sample size, kappa, gamma, and data settings,
python submission_scripts/01_sim_submit.pyNext, run
02_combine_p_values.pyand03_sim_1d_changept_detection_isoreg.py. Finally, split jobs again for evaluation by submittingpython submission_scripts/04_sim_submit.py- Folder
outputcontains raw results and corresponding figures of the simulation in the paper.
-
Folder
simulation_ihs/: This folder contains the platform that realizes the IHS simulation in the paper. Numbers prefixing the names of the .py files indicate the order to realize the simulation scenarios. Files starting with a p in their names contain codes to generate plots in the paper.01_sim_ihs_run.pysimulates IHS data and test for nonstationarity on a specified time interval. Usage:
python 01_sim_ihs_run.py {seed} {kappa} {gamma} {N} {RBFSampler_random_seed}See the annotation in the script for the meanings of arguments. Example:
python 01_sim_ihs_run.py 2 25 0.9 100 102_combine_p_values.pyaggregates p-value results from multiple random seeds in RBFSampler, with a specified quantile.03_sim_ihs_changept_detection_isoreg.pyestimates change points using isotonic regression.04_sim_ihs_changept_optvalue_run.pyestimates the optimal policies and values using different methods.p01_plot_combine_p_values.pycreates Figure 6(a) in paper of rejection probabilities.p02_plot_changept_dist.pycreates Figure 6(b) in paper of distribution of the estimated change points.p03_plot_changept_value.pycreates Table 2 in paper of optimal values of different estimated policies.- To run the IHS simulation in sequence,
bash run_sim_ihs.sh
- To run the IHS testing on a cluster and submit the simulation jobs using
slurmby sample size, kappa, gamma, and data settings, $ python submission_scripts/01_sim_submit.py Next, run02_combine_p_values.pyand03_sim_ihs_changept_detection_isoreg.py. Finally, split jobs again for evaluation by submitting
python submission_scripts/04_sim_submit.py- Folder
outputcontains raw results and corresponding figures of the simulation in the paper.
