Implementation of the VAE-MDP framework introduced in the paper "Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes" (AAAI-22). This framework allows (i) the variational abstraction of environments under which RL agents operate, as well as (ii) the distillation of their policy over the new abstract spaces learned, the two with verifiable bisimulation guarantees. These enable the application of formal methods and tools developped for discrete MDPs such as probabilistic model checkers.
We provide two conda environment files that can be used to re-create our python
environment and reproduce our results:
environment.yml(using TensorFlow CPU)environment_gpu.yml(using TensorFlow GPU)
These files can be found in the conda_environments directory and explicitly list all the dependencies required
for our tool.
Note that these conda environments have been tested with conda 4.10.1, under Ubuntu 20.04.2 LTS.
- Note 1: We additionally provide these environments with build specifications removed from dependencies.
- Note 2:
reverbcurrently only supports Linux based OSes. Our tool can be used withoutreverbif you don't use prioritized replay buffers.
In the following, we detail how to create automatically the conda environment from the environment CPU file,
but you can easily create an environment for GPU by replacing environment.yml by
environment_gpu.yml.
- Create the environment from
environment.yml:cd conda_environments conda env create -f environment.yml - The environment
vae_mdp(orvae_mdp_gpu) is now created. To usereverbreplay buffers, we need to indicate the variableLD_LIBRARY_PATHto conda. We provide the installation scriptset_environment_variables.shthat makes the environment variable become activate when the environment is activated:Theconda activate vae_mdp # or vae_mdp_gpu ./set_environment_variables # reactivate the environment to apply the changes conda deactivate conda activate vae_mdp
vae_mdpenvironment should now work properly on your machine.
We provide the exact set of hyper-parameters used during our experiments in the inputs directory.
- Each individual experiments can be run via:
python train.py --flagfile inputs/[name of the environment]
- Add
--display_progressbarto display a TF progressbar - Display the possible options with
--help - By default, the
logdirectory is created, where training logs are stored. Moreover, logs can be optionally vizualized viaTensorBoardusingtensorboard --logdir=log
- The
Nbest models can be saved during training with the option--evaluation_window_size N(by default set to 0, use 1 to save the best model encountered during training).
We provide a script for each environment in inputs/[environment].sh, containing the exact commands to run, as well as the seeds we used.
You can run all the experiments as follows:
./run_all_experiments.shThen, you can vizualize the experiments via TensorBoard or reproduce the paper plots via:
# plot distortion/rate/elbo, the PAC bounds, and the policy evaluation
python util/io/plot.py --flagfile inputs/plots
# plot the latent space vizualization
python util/io/plot.py --flagfile inputs/plot_histogramsThe plots are stored in evaluation/plots.
-
(Optional) Alternatively, you can indicate manually the environment variable
LD_LIBRARY_PATHto conda as follows:conda activate vae_mdp # or vae_mdp_gpu cd $CONDA_PREFIX mkdir -p ./etc/conda/activate.d mkdir -p ./etc/conda/deactivate.d touch ./etc/conda/activate.d/env_vars.sh touch ./etc/conda/deactivate.d/env_vars.sh
Edit
./etc/conda/activate.d/env_vars.shas follows:#!/bin/sh ENV_NAME='vae_mdp' # or 'vae_mdp_gpu' export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH} export LD_LIBRARY_PATH=${HOME}/anaconda3/envs/${ENV_NAME}/lib/:${LD_LIBRARY_PATH}
Edit
./etc/conda/deactivate.d/env_vars.shas follows:#!/bin/sh export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH} unset OLD_LD_LIBRARY_PATH
If you use this code, please cite it as:
@article{Delgrange22,
title={Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes},
volume={36},
url={https://ojs.aaai.org/index.php/AAAI/article/view/20602},
DOI={10.1609/aaai.v36i6.20602},
number={6},
journal={Proceedings of the AAAI Conference on Artificial Intelligence},
author={Delgrange, Florent and Nowé, Ann and Pérez, Guillermo A.},
year={2022},
month={Jun.},
pages={6497-6505}
}