Wasserstein Auto-encoded MDPs

Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

This repository is the official implementation of the Wasserstein Auto-encoded MDP (WAE-MDP) framework, introduced in the paper Wasserstein Auto-encoded MDPs: Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees (ICLR 2023). The implementation is a fork of the VAE-MDP framework.

Installation

We provide a conda environment file environment.yml that can be used to re-create our python environment and reproduce our results. The file explicitely lists all the dependencies required for running our tool.

To create the environment, run:

conda env create -f environment.yml

Experiments

Quick start

Each individual experiment can be run via:

python train.py --flagfile inputs/[name of the environment]

Add --display_progressbar to display a progressbar with useful learning metrics
Display the possible options with --help
By default,
- the saves directory is created, where checkpoints and models learned are stored.
- the log directory is created, where training and evaluation logs are stored.
The logs can be vizualized via TensorBoard using
```
tensorboard --logdir=log
```

Evaluation

The file evaluation.html summarizes the results presented in our paper. It embeds videos, comparing the performance of the input RL policies and their distillation, as well as the metrics related to the formal verification of those distilled policies. The code for using our formal verification tool is also presented in this file.

Reproducing the paper results

We provide the exact hyperparameters used for each individual environment in inputs/[name of the environment].
For each environment, we provide a script (inputs/[environment].sh) to train 5 instances of the WAE-MDP with different seeds. You can further run all the experiments as follows:

./run_all_experiments.sh

Pre-trained Models

Input RL policies are available at reinforcement_learning/saves.
Pre-trained models are available at evaluation/saved_models.

Results

WAE-MDP Losses

Local Losses: Pac Bounds

Distillation

The code to generate the plots of the paper is availbale in the notebook evaluation/plots.ipynb.

Cite

If you use this code, please cite it as:

@inproceedings{
delgrange2023wasserstein,
title={Wasserstein Auto-encoded {MDP}s: Formal Verification of Efficiently Distilled {RL} Policies with Many-sided Guarantees},
author={Florent Delgrange and Ann Nowe and Guillermo Perez},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=JLLTtEdh1ZY}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1,188 Commits
evaluation		evaluation
inputs		inputs
layers		layers
policies		policies
reinforcement_learning		reinforcement_learning
saves		saves
studies		studies
util		util
verification		verification
README.md		README.md
environment.yml		environment.yml
evaluation.html		evaluation.html
hyperparameter_search.py		hyperparameter_search.py
run_all_experiments.sh		run_all_experiments.sh
train.py		train.py
variational_action_discretizer.py		variational_action_discretizer.py
variational_mdp.py		variational_mdp.py
wasserstein_mdp.py		wasserstein_mdp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wasserstein Auto-encoded MDPs

Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Installation

Experiments

Quick start

Evaluation

Reproducing the paper results

Pre-trained Models

Results

WAE-MDP Losses

Local Losses: Pac Bounds

Distillation

Cite

About

Releases

Packages

Contributors 2

Languages

florentdelgrange/wae_mdp

Folders and files

Latest commit

History

Repository files navigation

Wasserstein Auto-encoded MDPs

Formal Verification of Efficiently Distilled RL Policies with Many-sided Guarantees

Installation

Experiments

Quick start

Evaluation

Reproducing the paper results

Pre-trained Models

Results

WAE-MDP Losses

Local Losses: Pac Bounds

Distillation

Cite

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages