🔥 ICLR 2024 (Spotlight)
Wei Wang¹, Dongqi Han *², Xufang Luo², Dongsheng Li *²
¹ Western University, Canada, ² Microsoft Research Asia
* Corresponding Author
Link to the paper on OpenReview: https://openreview.net/forum?id=Z8UfDs4J46
Algorithms to Handle Signal Delay in Deep Reinforcement Learning aims to address the problem of signal delay in continuous robotic control. Signal delay occurs when there is a lag between an agent's perception of the environment and its corresponding actions. Our methods achieve remarkable performance in simulated continuous robotic control tasks with large delays, yielding results comparable to those in non-delayed cases.
Despite the notable advancements in deep reinforcement learning (DRL) in recent years, a prevalent issue that is often overlooked is the impact of signal delay. Signal delay occurs when there is a lag between an agent's perception of the environment and its corresponding actions. In this paper, we first formalize delayed-observation Markov decision processes (DOMDP) by extending the standard MDP framework to incorporate signal delays. Next, we elucidate the challenges posed by the presence of signal delay in DRL, showing that trivial DRL algorithms and generic methods for partially observable tasks suffer greatly from delays. Lastly, we propose effective strategies to overcome these challenges. Our methods achieve remarkable performance in continuous robotic control tasks with large delays, yielding results comparable to those in non-delayed cases. Overall, our work contributes to a deeper understanding of DRL in the presence of signal delays and introduces novel approaches to address the associated challenges.
The algorithms were developed especially for simulated continuous robotic control tasks. This includes tasks with potential future applications like robotic arm manipulation, humanoid robot walking, robot dog running and so on, where precise and timely responses are crucial despite the presence of signal delays.
These algorithms were developed in a simulated environment and have not been tested for use in real robots. Deploying the algorithms in the real world would require additional code development and testing. Key areas needing attention include handling real-world noise, sensor calibration, and ensuring reliable communication. Expertise in robotics, control systems, and integration of hardware with deep reinforcement learning algorithms would be necessary to address these practical challenges.
We tested the algorithms in the MuJoCo robotic control environments to demonstrate their effectiveness. See the paper for details.
These algorithms were developed in a simulated environment and have not been tested for use in real robots. Algorithms may work less effectively when there is an extremely long delay (> 25 steps in MuJoCo, which corresponds to several seconds).
The user is responsible for applying algorithms safely and ethically. Considerations include physical safety risks, privacy and security concerns, and ethical implications of specific robotic applications. Comprehensive testing would be required to safely use these algorithms in the real world and such use is beyond the scope of our research.
Before you begin, please ensure that the following environment variables are correctly set: UDATADIR, UPRJDIR, and UOUTDIR. The operations you perform will only modify these three directories on your device.
Here's an example setup:
# Example directory paths
export UDATADIR=~/Desktop/delay/data # directory for dataset
export UPRJDIR=~/Desktop/delay/code # directory for code
export UOUTDIR=~/Desktop/delay/output # directory for outputs such as logs
# Example API Key and Worker settings
export NUM_WORKERS=0 # number of workers to use
# Create directories if they do not exist
mkdir -p $UDATADIR $UPRJDIR $UOUTDIRNote: Weights & Biases (wandb) is a fantastic tool for visualization. It serves a similar purpose to TensorBoard, but offers additional functionality. Please obtain your API key from your Weights & Biases account to make use of these features.
We provide three methods to set up the Python environment for this project:
- Using the Development Environment in VS Code (Most Recommended)
- Using Docker (Recommended)
- Using Pip or Conda
If you're a Visual Studio Code (VS Code) user, we highly recommend this method for its simplicity. You can set up all the environment requirements for this project in just one step.
- Open this project in VS Code.
- Install the "Dev Containers" extension.
- Press
Cmd/Ctrl+Shift+Pto open the command palette, then selectDev Container: Rebuild and Reopen in Container.
Note: The configuration for Dev Containers is stored in the .devcontainer folder, which is included in our project. You can modify this if you need to add or remove certain libraries.
Further details and instructions about Dev Containers in VS Code can be found here.
If you prefer to use Docker, you can find the Dockerfile in the .devcontainer directory. Please refer to Docker's documentation if you need guidance on building a Docker image and running a container.
If you wish to use Pip or Conda for managing Python dependencies, please refer to their respective documentation for instructions. It's important to ensure that all required dependencies for this project are correctly installed in your Python environment.
- Install Pytorch from official website (link).
pip install \
hydra-core \
hydra-colorlog \
hydra-optuna-sweeper \
torchmetrics \
pyrootutils \
pre-commit \
pytest \
sh \
omegaconf \
rich \
pytorch-lightning==1.7.7 \
jupyter \
wandb \
tensorboardx \
ipdb
pip install \
hydra-joblib-launcher \
gymnasium \
mujoco==2.3.3 \
tianshou==0.4.11 \
ftfy \
regex \
imageiops. We use pytorch/pytorch:1.11.0-cuda11.3-cudnn8-runtime while other versions should also work.
Also, you may need to install MuJoCo, here we provide a simple example
-
Example snippet to install MuJoCo
curl <https://mujoco.org/download/mujoco210-linux-x86_64.tar.gz> --output mujoco210.tar.gz mkdir ~/.mujoco tar -xf mujoco210.tar.gz --directory ~/.mujoco export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/.mujoco/mujoco210/bin
Use the following command to start a simple run:
python src/entry.py \
experiment=sac \
env.name=Ant-v4 \
env.delay=4The "experiment" parameter accepts one of the following values:
dummy(referred to as "vanilla SAC" in the paper)oracle_critic(referred to as "Delay-Reconciled Training" in the paper)cat_mlp(referred to as "State Augmentation - MLP" in the paper)stack_rnn(referred to as "State Augmentation - RNN" in the paper)pred_detach(referred to as "Prediction$^\dagger$" in the paper)pred_nodetach(referred to as "Prediction" in the paper)encoding_detach(referred to as "Encoding" in the paper)encoding_nodetach(referred to as "Encoding$^\dagger$" in the paper)symmetric(referred to as "Symmetric - MLP" in the paper)
The env.name parameter defines the environment and can be any available environment within the "gymnasium" such as:
Ant-v4Walker2d-v4HalfCheetah-v4and many others
The "env.delay" parameter can be set to any non-negative integer.
Here is some examples of a customized run:
-
Using "
oracle_critic" experiment, "Walker2d-v4" environment, and adelayof 5:python src/entry.py \ experiment=oracle_critic \ env.name=Walker2d-v4 \ env.delay=5 -
Using "
cat_mlp" experiment, "HalfCheetah-v4" environment, and adelayof 3:python src/entry.py \ experiment=cat_mlp \ env.name=HalfCheetah-v4 \ env.delay=3 -
Using "
pred_detach" experiment, "Ant-v4" environment, and adelayof 0:python src/entry.py \ experiment=pred_detach \ env.name=Ant-v4 \ env.delay=0
simply add wandb.mode=online in the python executing parameter as the following:
python src/entry.py \
experiment=pred_detach \
env.name=Ant-v4 \
env.delay=0 \
wandb.mode=online
Create a file named as .env in the project root and put the following in it, your wandb key would be automatically read
WANDB_API_KEY="36049{change_to_your_wandb_key}215a1d76"@inproceedings{
wang2024addressing,
title={Addressing Signal Delay in Deep Reinforcement Learning},
author={Wei Wang and Dongqi Han and Xufang Luo and Dongsheng Li},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=Z8UfDs4J46}
}
We would like to express our gratitude for the valuable code snippets provided by the Tianshou project and the Lightning-Hydra template. These resources have significantly contributed to the development of our project.