simmodel

Simulation environment for Reinforcement Learning experiments in Search & Pursuit-Evasion on Graphs. Thesis work `Predicting plausible escape routes using reinforcement learning and graph representations' for UvA/AI MSc program (Thesis Proposal, Final Thesis)

Installation

Clone this repo

git clone git@github.com:rvdweerd/simmodel.git
cd simmodel

Create conda environment from yml file (check/adjust cudatoolkit version)
```
conda env create -f environment.yml
conda activate rl
```

Use pip to install `Pytorch Geometric (check/adjust cudatoolkit version)

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu102.html
pip install torch-geometric

Install stable-baselines3 contrib

cd ..
git clone https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/
cd stable-baselines3-contrib
pip install  -e .
cd ../simmodel

Training and testing models

Replicate the LSTM experiment with the PPO-GNN-LSTM model (section 'Effect of LSTM positioning', Appendix E2)
```
python Phase3_lstm-gnn-ppo_simp.py --train_on MemTask-U1 --train True --demoruns True --obs_mask freq --obs_rate 0.2 --lstm_type EMB
```
Before training, user interaction (demoruns) with the training graphset is offered, rendered graph states are in /results/test.png Training tensorboard output and results are stored in /results/results_Phase3simp/
Replicate the Scale-up experiment with the DQN-GNN model (section '6.2 Scale-up to real-world road networks')
```
python Phase2b_gnn-dqn.py --train_on NWB_AMS --max_nodes 975 --qnet s2v --train True --demoruns True
```
Note: to train with AMS graphs, 20GB of GPU VRAM is required. If this is not available, training can be performed on smaller graphs, e.g. 'M3M5Mix'. Training tensorboard output and results are stored in /results/results_Phase2/
Replicate the baseline PPO-GNN-LSTM experiment (section '6.3 Extend to partial observability')
```
python Phase3_lstm-gnn-ppo_simp.py --train_on NWB_AMS_mixed_obs --obs_mask mix --train True --demoruns True  --lstm_type EMB --lstm_hdim 64
```
Note: to train with AMS graphs, 20GB of GPU VRAM is required. Training tensorboard output and results are stored in /results/results_Phase3simp/

Background and examples

Goal: predicting escape routes in a passive search scenario with partial observability
Demo: a Graph Neural Net based policy model, trained using PPO with invalid action masking, can generalize and be applied to unseen graphs
Demo: escape agent traverses from Dam Square (Amsterdam) to a target node, while avoiding pursuers that move to observation positions. Escape behavior is based on graph representation learning on smaller graphs, combined with reinforcement learning using PPO
Demo: performance of the Collision Risk Avoidance heuristic benchmark
Demo: performance of GNN-LSTM model, trained using PPO under partial observability

Citation

@mastersthesis{weerd2022spe_rl,
    author = {R. van der Weerd},
    institution = {University of Amsterdam, Graduate School of Informatics},
    pages = 57,
    school = {University of Amsterdam, Graduate School of Informatics, Master's program in artificial intelligence},
    title = {Predicting plausible escape routes using reinforcement learning and graph representations},
    year = 2022
}

Name		Name	Last commit message	Last commit date
Latest commit History 355 Commits
.vscode		.vscode
__pycache__		__pycache__
datasets		datasets
dev		dev
modules		modules
results/results_Phase3simp/ppo/NWB_AMS_mixed_obs/gat2-q/emb64_itT5/lstm_EMB_64_1/NFM_ev_ec_t_dt_at_um_us-BasicDict/omask_mix0.0/bsize2ro4		results/results_Phase3simp/ppo/NWB_AMS_mixed_obs/gat2-q/emb64_itT5/lstm_EMB_64_1/NFM_ev_ec_t_dt_at_um_us-BasicDict/omask_mix0.0/bsize2ro4
scripts		scripts
source_old		source_old
.gitignore		.gitignore
Phase2b_gnn-dqn.py		Phase2b_gnn-dqn.py
Phase2c_gnn-ppo_sb3.py		Phase2c_gnn-ppo_sb3.py
Phase3_lstm-gnn-ppo_simp.py		Phase3_lstm-gnn-ppo_simp.py
README.md		README.md
__init__.py		__init__.py
environment.yml		environment.yml
simdata_utils.py		simdata_utils.py

rvdweerd/simmodel

Folders and files

Latest commit

History

Repository files navigation

simmodel

Installation

Training and testing models

Background and examples

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages