Skip to content

galdl/rl_delay_basic

Repository files navigation

Acting in Delayed Environments with Non-Stationary Markov Policies

This repository contains the implementation of the Delayed, Agumented, and Oblivious agents from the paper: "Acting in Delayed Environments with Non-Stationary Markov Policies", Esther Derman*, Gal Dalal*, Shie Mannor (*equal contribution), published in ICLR 2021.

The agent here supports the Cartpole and Acrobot environments by OpenAI. The Atari-supported agent can be found here.

Installation

  1. Tested with python3.7. Conda virtual env is encouraged. Other versions of python and/or environments should also be possible.
  2. Clone project and cd to project dir.
  3. Create virtual env:
    Option 1 -- Tensorflow 2.2: Run pip install -r requirements.py (other versions of the packages in requirements.py should also be fine).
    Option 2 -- Tensorflow 1.14: Run conda env create -f environment. yml to directly create a virtual env called tf_14.
  4. To enable support of the noisy Cartpole and Acrobot experiments, modify the original gym cartpole.py and acrobot.py:
    Option 1 -- via pip install:
    cd third_party
    git submodule sync && git submodule update --init --recursive
    cd gym
    git apply ../gym.patch
    pip install -e .
    Option 2 -- manually:
    4a. Find location in site packages. E.g., "/home/username/anaconda3/envs/rl_delay_env/lib/python3.7/site-packages/gym/envs/classic_control/cartpole.py"
    4b. Overwrite the above file with "rl_delay_basic/gym_modifications/cartpole.py". Repeat the same process for "rl_delay_basic/gym_modifications/acrobot.py".

Hyperparameters:

The parameters used for the experiments in the paper are the default ones appearing in init_main.py. They are the same for all types of agents (delayed, augmented, oblivious), both noisy and non-noisy, and all delay values. The only exception is that for Cartpole epsilon_decay=0.999, while for Acrobot epsilon_decay=0.9999.

Wandb sweep:

Using wandb, you can easily run multiple experiments for different agents, delay values, hyperparameters, etc. An example sweep file is included the in project: example_sweep.yml. A sweep can be created via "wandb sweep example_sweep.yml", and multiple workers can be started with "wandb agent your-sweep-id". For more details see https://docs.wandb.ai/guides/sweeps/quickstart.

Citing the Project

To cite this repository in publications:

@article{derman2021acting,
  title={Acting in delayed environments with non-stationary markov policies},
  author={Derman, Esther and Dalal, Gal and Mannor, Shie},
  journal={International Conference on Learning Representations (ICLR)},
  year={2021}
}

Happy delaying!

About

Delayed RL agent for non-Atari tasks, from "Acting in Delayed Environments with Non-Stationary Markov Policies", ICLR 2021.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages