#Deep Reinforcement Learning: TD3 in OpenAI Gym - Martin Baur

OpenAI Gym: https://gym.openai.com/

TD3 Algorithm from stable baselines: https://stable-baselines.readthedocs.io/en/master/modules/td3.html

3D environments from pybullet Gym: https://github.com/benelot/pybullet-gym 


##Installation


In [0]:
!pip install gym pyvirtualdisplay > /dev/null 2>&1
#Workaround to have TD3 available 
!pip uninstall stable-baselines
!pip install stable-baselines[mpi]
!pip install pyglet > /dev/null 2>&1
!pip install pybullet > /dev/null 2>&1

!apt-get update > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1

!pip install tensorboard > /dev/null 2>&1

Uninstalling stable-baselines-2.9.0:
  Would remove:
    /usr/local/lib/python3.6/dist-packages/stable_baselines-2.9.0.dist-info/*
    /usr/local/lib/python3.6/dist-packages/stable_baselines/*
Proceed (y/n)? y
  Successfully uninstalled stable-baselines-2.9.0
Collecting stable-baselines[mpi]
  Using cached https://files.pythonhosted.org/packages/c0/05/f6651855083020c0363acf483450c23e38d96f5c18bec8bded113d528da5/stable_baselines-2.9.0-py3-none-any.whl
Installing collected packages: stable-baselines
Successfully installed stable-baselines-2.9.0
Requirement already up-to-date: setuptools in /usr/local/lib/python3.6/dist-packages (45.1.0)


##Imports


In [0]:
import gym
from gym import logger as gymlogger
from gym.wrappers import Monitor
gymlogger.set_level(40) #error only

import numpy as np
import pybullet_envs

from stable_baselines import TD3
from stable_baselines.td3.policies import MlpPolicy
from stable_baselines.common.vec_env import DummyVecEnv
from stable_baselines.ddpg.noise import NormalActionNoise, OrnsteinUhlenbeckActionNoise

from pyvirtualdisplay import Display

%load_ext tensorboard

The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
  * https://github.com/tensorflow/io (for I/O related ops)
If you depend on functionality not listed there, please file an issue.



##Help functions

First a Display gets initialized so that an error is avoided

Also the help functions for the learning callbacks and Video saving is set up 

In [0]:
#Defining an display to avoid error
display = Display(visible=0, size=(1400, 900))
display.start()

#Keeping track of the steps while learning    
n_steps = 0

def learningCallback(_locals, _globals):
    global n_steps
    # Save model every 1000 calls
    if (n_steps + 1) % 1000 == 0:
        model.save(dir_model)
    n_steps += 1

    return True

xdpyinfo was not found, X start can not be checked! Please install xdpyinfo!


Mounting Google Drive to save model and videos

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


##Defining strings and timesteps

Defining the environment strings also the strings for directories in the Google Drive

Also the timestep count is defined here

In [0]:
#envStrings
envPendulum = 'Pendulum-v0'
envCheetah = 'HalfCheetah'
envAnt = 'AntBulletEnv-v0'
envHuman = 'HumanoidBulletEnv-v0'

model_string = envPendulum

dir = '/content/drive/My Drive/DLSeminar/models/' + model_string + "/"

dir_logs = dir + 'logs/'

dir_model = dir + model_string

model_steps = 100000

##Implementation for the learning 


In [0]:
#Creating gym environment
env = gym.make(model_string)
env = gym.wrappers.Monitor(env, dir + '/video', force=True)
env = DummyVecEnv([lambda: env])

#Action and noise
n_actions = env.action_space.shape[-1]
action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=0.1 * np.ones(n_actions))

#creation of the model
model = TD3(MlpPolicy, env, action_noise=action_noise, verbose=1, tensorboard_log=dir_logs)

#saving model to defined path
model.save(dir_model)

#loading the model
#model = TD3.load(dir_model, env)

#start learning
model.learn(total_timesteps=model_steps, log_interval=10, callback=learningCallback)

obs = env.reset()





Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
Use keras.layers.Dense instead.





Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where





---------------------------------------
| current_lr              | 0.0003    |
| episodes                | 10        |
| fps                     | 101       |
| mean 100 episode reward | -1.57e+03 |
| n_updates               | 1600      |
| qf1_loss                | 2.307422  |
| qf2_loss                | 2.231395  |
| time_elapsed            | 17        |
| total timesteps         | 1800      |
---------------------------------------
---------------------------------------
| current_lr              | 0.0003    |
| episodes                | 20        |
| fps                     | 137       |
| mean 100 episode reward | -1.47e+03 |
| n_updates               | 3600      |
| qf1_loss           

##Launching TensorBoard to evaluate learing

In [0]:
#Launch TensorBoard after learning
#%tensorboard --logdir /content/drive/My\ Drive/DLSeminar/models/AntBulletEnv-v0/logs/
%tensorboard --logdir /content/drive/My\ Drive/DLSeminar/models/Pendulum-v0/logs/

Reusing TensorBoard on port 6006 (pid 5338), started 0:42:30 ago. (Use '!kill 5338' to kill it.)