Skip to content

Implementation of Variational Markov Decision Processes, a framework allowing to (i) distill policies learned through (deep) reinforcement learning and (ii) learn discrete abstractions of continuous environments, the two with bisimulation guarantees.

License

Notifications You must be signed in to change notification settings

florentdelgrange/vae_mdp

Repository files navigation

VAE-MDPs

Implementation of the VAE-MDP framework introduced in the paper "Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes" (AAAI-22). This framework allows (i) the variational abstraction of environments under which RL agents operate, as well as (ii) the distillation of their policy over the new abstract spaces learned, the two with verifiable bisimulation guarantees. These enable the application of formal methods and tools developped for discrete MDPs such as probabilistic model checkers.

Installation

We provide two conda environment files that can be used to re-create our python environment and reproduce our results:

  • environment.yml (using TensorFlow CPU)
  • environment_gpu.yml (using TensorFlow GPU)

These files can be found in the conda_environments directory and explicitly list all the dependencies required for our tool. Note that these conda environments have been tested with conda 4.10.1, under Ubuntu 20.04.2 LTS.

  • Note 1: We additionally provide these environments with build specifications removed from dependencies.
  • Note 2: reverb currently only supports Linux based OSes. Our tool can be used without reverb if you don't use prioritized replay buffers.

In the following, we detail how to create automatically the conda environment from the environment CPU file, but you can easily create an environment for GPU by replacing environment.yml by environment_gpu.yml.

  1. Create the environment from environment.yml:
    cd conda_environments
    conda env create -f environment.yml
  2. The environment vae_mdp (or vae_mdp_gpu) is now created. To use reverb replay buffers, we need to indicate the variable LD_LIBRARY_PATH to conda. We provide the installation script set_environment_variables.sh that makes the environment variable become activate when the environment is activated:
    conda activate vae_mdp  # or vae_mdp_gpu
    ./set_environment_variables
    # reactivate the environment to apply the changes
    conda deactivate
    conda activate vae_mdp
    The vae_mdp environment should now work properly on your machine.

Run the experiments

We provide the exact set of hyper-parameters used during our experiments in the inputs directory.

Quick start

  • Each individual experiments can be run via:
    python train.py --flagfile inputs/[name of the environment] 
  • Add --display_progressbar to display a TF progressbar
  • Display the possible options with --help
  • By default, the log directory is created, where training logs are stored. Moreover, logs can be optionally vizualized via TensorBoard using
    tensorboard --logdir=log
  • The N best models can be saved during training with the option --evaluation_window_size N (by default set to 0, use 1 to save the best model encountered during training).

Reproduce the paper results

We provide a script for each environment in inputs/[environment].sh, containing the exact commands to run, as well as the seeds we used. You can run all the experiments as follows:

./run_all_experiments.sh

Then, you can vizualize the experiments via TensorBoard or reproduce the paper plots via:

# plot distortion/rate/elbo, the PAC bounds, and the policy evaluation
python util/io/plot.py --flagfile inputs/plots
# plot the latent space vizualization
python util/io/plot.py --flagfile inputs/plot_histograms

The plots are stored in evaluation/plots.

Additional installation instructions

  • (Optional) Alternatively, you can indicate manually the environment variable LD_LIBRARY_PATH to conda as follows:

    conda activate vae_mdp  # or vae_mdp_gpu
    cd $CONDA_PREFIX
    mkdir -p ./etc/conda/activate.d
    mkdir -p ./etc/conda/deactivate.d
    touch ./etc/conda/activate.d/env_vars.sh
    touch ./etc/conda/deactivate.d/env_vars.sh

    Edit ./etc/conda/activate.d/env_vars.sh as follows:

    #!/bin/sh
    
    ENV_NAME='vae_mdp'  # or 'vae_mdp_gpu'
    export OLD_LD_LIBRARY_PATH=${LD_LIBRARY_PATH}
    export LD_LIBRARY_PATH=${HOME}/anaconda3/envs/${ENV_NAME}/lib/:${LD_LIBRARY_PATH}

    Edit ./etc/conda/deactivate.d/env_vars.sh as follows:

    #!/bin/sh
    
    export LD_LIBRARY_PATH=${OLD_LD_LIBRARY_PATH}
    unset OLD_LD_LIBRARY_PATH

Cite

If you use this code, please cite it as:

@article{Delgrange22,
   title={Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes},
   volume={36},
   url={https://ojs.aaai.org/index.php/AAAI/article/view/20602},
   DOI={10.1609/aaai.v36i6.20602},
   number={6},
   journal={Proceedings of the AAAI Conference on Artificial Intelligence},
   author={Delgrange, Florent and Nowé, Ann and Pérez, Guillermo A.},
   year={2022},
   month={Jun.},
   pages={6497-6505}
}

About

Implementation of Variational Markov Decision Processes, a framework allowing to (i) distill policies learned through (deep) reinforcement learning and (ii) learn discrete abstractions of continuous environments, the two with bisimulation guarantees.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published