GitHub - nesl/Time-in-State-RL: Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays, CORL-2020.

Summary

This repo contains the code to train deep reinforcement learning policies for HalfCheetah, Ant, and the 1/18th scale autonomous car (DeepRacer). Slides, Paper

Paper: Sim2Real Transfer for Deep Reinforcement Learning with Stochastic State Transition Delays, CoRL-2020.

The deployment heterogeneities and runtime compute stochasticity results 
in variable timing characteristics of sensor sampling rates and end-to-end delays from sensing to actuation. 
Prior works have used the technique of domain randomization to enable the successful transfer of policies 
across domains having different state transition delays. We show that variation in sampling rates and 
policy execution time leads to degradation in Deep RL policy performance, and that domain randomization 
is insufficient to overcome this limitation. We propose the Time-in-State RL (TS-RL) approach, which includes 
delays and sampling rate as additional agent observations at training time to improve the robustness of Deep RL 
policies. We demonstrate the efficacy of TSRL on HalfCheetah, Ant, and car robot in 
simulation and on a real robot using a 1/18th scale car.

Domain randomization (DR) vs Time-in-State (TS)

Autonomous car at 100 ms latency


DR	TS

HalfCheetah at 20.6 ms latency


DR	TS

Ant at 20.6 ms latency


DR	TS

Demo video

Check out the quick demo of the transfer of policies from simulation to a real car robot.

Code credits

a) The fully connected policy training using PPO code is taken from open AI baselines with modifications done to allow variable timing characteristics during training by fusing the delay observations with the neural network for images/state.

b) The code to train recurrent policies using PPO with variable timing characteristics is modified from the batch PPO code available from Google-Research.

c) The HalfCheetah environment and robot are taken from the Pybullet code with modifications for the variable timing characteristics for simulation steps.

d) The Ant environment and robot are taken from the Pybullet code with modifications for the variable timing characteristics for simulation steps.

e) Deepracer simulator modified by taking a snapshot from the open-source code of deepracer available here. The changes include the track color, the captured camera sampling rate, and the removal of the AWS dependencies. The deepracer environment is modified to allow the variable timing characteristics.

Requirements:

1. Install the following requirements to train all the policies mentioned in the paper:

a) OpenAI gym, OpenAI baselines, batch-PPO

b) Gazebo and Ros for deepracer robotic car policies.

c) Pybullet for Ant and HalfCheetah

2. To train only the fully policies only for Ant and Halfcheetah:

a) OpenAI gym, OpenAI baselines

b) Pybullet for Ant and HalfCheetah

3. To train only the recurrent policies for Halfcheetah:

a) OpenAI gym, batch-PPO

b) Pybullet for HalfCheetah

4. To train only the policies for Deepracer robotic car:

a) OpenAI gym, OpenAI baselines

b) Gazebo and Ros for deepracer robotic car policies.

Usage

The training of policies, benchmarking, and visualization for each task is explained in the respective folders: 'deepracer', 'ant', 'halfcheetah', and 'halfcheetah-recurrent'.

Questions

For any help/issue in running the code, please reachout to sandha.iitr@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
ant		ant
deepracer		deepracer
halfcheetah-recurrent		halfcheetah-recurrent
halfcheetah		halfcheetah
.gitignore		.gitignore
Ant_DR_5x.gif		Ant_DR_5x.gif
Ant_TS_5x.gif		Ant_TS_5x.gif
Ch_DR_5x.gif		Ch_DR_5x.gif
Ch_TS_5x.gif		Ch_TS_5x.gif
LICENSE		LICENSE
Readme.md		Readme.md
Short_Dr_video.gif		Short_Dr_video.gif
Short_TS_video.gif		Short_TS_video.gif
demo_pic.png		demo_pic.png

License

nesl/Time-in-State-RL

Folders and files

Latest commit

History

Repository files navigation

Summary

Domain randomization (DR) vs Time-in-State (TS)

Autonomous car at 100 ms latency

HalfCheetah at 20.6 ms latency

Ant at 20.6 ms latency

Demo video

Code credits

Requirements:

Usage

Questions

About

Resources

License

Stars

Watchers

Forks

Languages