GitHub - galdl/rl_delay_basic: Delayed RL agent for non-Atari tasks, from "Acting in Delayed Environments with Non-Stationary Markov Policies", ICLR 2021.

Acting in Delayed Environments with Non-Stationary Markov Policies

This repository contains the implementation of the Delayed, Agumented, and Oblivious agents from the paper: "Acting in Delayed Environments with Non-Stationary Markov Policies", Esther Derman^*, Gal Dalal^*, Shie Mannor (^*equal contribution), published in ICLR 2021.

The agent here supports the Cartpole and Acrobot environments by OpenAI. The Atari-supported agent can be found here.

Installation

Tested with python3.7. Conda virtual env is encouraged. Other versions of python and/or environments should also be possible.
Clone project and cd to project dir.
Create virtual env:
Option 1 -- Tensorflow 2.2: Run pip install -r requirements.py (other versions of the packages in requirements.py should also be fine).
Option 2 -- Tensorflow 1.14: Run conda env create -f environment. yml to directly create a virtual env called tf_14.
To enable support of the noisy Cartpole and Acrobot experiments, modify the original gym cartpole.py and acrobot.py:
Option 1 -- via pip install:
```
cd third_party
git submodule sync && git submodule update --init --recursive
cd gym
git apply ../gym.patch
pip install -e .
```
Option 2 -- manually:
4a. Find location in site packages. E.g., "/home/username/anaconda3/envs/rl_delay_env/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py"
4b. Overwrite the above file with "rl_delay_basic/gym_modifications/cartpole.py". Repeat the same process for "rl_delay_basic/gym_modifications/acrobot.py".

Hyperparameters:

The parameters used for the experiments in the paper are the default ones appearing in init_main.py. They are the same for all types of agents (delayed, augmented, oblivious), both noisy and non-noisy, and all delay values. The only exception is that for Cartpole epsilon_decay=0.999, while for Acrobot epsilon_decay=0.9999.

Wandb sweep:

Using wandb, you can easily run multiple experiments for different agents, delay values, hyperparameters, etc. An example sweep file is included the in project: example_sweep.yml. A sweep can be created via "wandb sweep example_sweep.yml", and multiple workers can be started with "wandb agent your-sweep-id". For more details see https://docs.wandb.ai/guides/sweeps/quickstart.

Citing the Project

To cite this repository in publications:

@article{derman2021acting,
  title={Acting in delayed environments with non-stationary markov policies},
  author={Derman, Esther and Dalal, Gal and Mannor, Shie},
  journal={International Conference on Learning Representations (ICLR)},
  year={2021}
}

Happy delaying!

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
gym_modifications		gym_modifications
pretrained_agents		pretrained_agents
third_party		third_party
.gitmodules		.gitmodules
LICENSE		LICENSE
NVIDIA_CLA_v1.0.1.pdf		NVIDIA_CLA_v1.0.1.pdf
README.md		README.md
ddqn_main.py		ddqn_main.py
delayed_env.py		delayed_env.py
delayed_q_diagram.png		delayed_q_diagram.png
dqn_agents.py		dqn_agents.py
environment.yml		environment.yml
example_sweep.yml		example_sweep.yml
init_main.py		init_main.py
requirements.txt		requirements.txt

License

galdl/rl_delay_basic

Folders and files

Latest commit

History

Repository files navigation

Acting in Delayed Environments with Non-Stationary Markov Policies

Installation

Hyperparameters:

Wandb sweep:

Citing the Project

About

Resources

License

Stars

Watchers

Forks

Languages