Let’s code it up!

To setup our code, we need to first install a few things,

Step 1: Install keras-rl library

From terminal, run the following commands:

git clone https://github.com/matthiasplappert/keras-rl.git
cd keras-rl
python setup.py install
 

Step 2: Install dependencies for CartPole environment

Assuming you have pip installed, you need to install the following libraries

pip install h5py
pip install gym
 

Step 3: lets get started!

First we have to import modules that are necessary

In [1]:
%matplotlib inline
import numpy as np
import gym

from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory

Using Theano backend.
[2017-07-18 00:37:29,829] g++ not available, if using conda: `conda install m2w64-toolchain`


Then set the relevant variables

In [2]:
ENV_NAME = 'CartPole-v0'

# Get the environment and extract the number of actions available in the Cartpole problem
env = gym.make(ENV_NAME)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n

[2017-07-18 00:37:40,720] Making new env: CartPole-v0


Next, we build a very simple single hidden layer neural network model.


In [3]:
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(16))
model.add(Activation('relu'))
model.add(Dense(nb_actions))
model.add(Activation('linear'))
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten_1 (Flatten)          (None, 4)                 0         
_________________________________________________________________
dense_1 (Dense)              (None, 16)                80        
_________________________________________________________________
activation_1 (Activation)    (None, 16)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 2)                 34        
_________________________________________________________________
activation_2 (Activation)    (None, 2)                 0         
Total params: 114
Trainable params: 114
Non-trainable params: 0
_________________________________________________________________
None


Next, we configure and compile our agent. We set our policy as Epsilon Greedy and we also set our memory as Sequential Memory because we want to store the result of actions we performed and the rewards we get for each action.


In [4]:
policy = EpsGreedyQPolicy()
memory = SequentialMemory(limit=50000, window_length=1)
dqn = DQNAgent(model=model, nb_actions=nb_actions, memory=memory, nb_steps_warmup=10,
target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

# Okay, now it's time to learn something! We visualize the training here for show, but this slows down training quite a lot. 
#dqn.fit(env, nb_steps=5000, visualize=True, verbose=2)
dqn.fit(env, nb_steps=500, visualize=True, verbose=2) #To make it quick

Training for 500 steps ...
  10/500: episode: 1, duration: 1.360s, episode steps: 10, steps per second: 7, episode reward: 10.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.000 [0.000, 0.000], mean observation: 0.132 [-1.967, 3.014], loss: --, mean_absolute_error: --, mean_q: --




  19/500: episode: 2, duration: 5.514s, episode steps: 9, steps per second: 2, episode reward: 9.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.111 [0.000, 1.000], mean observation: 0.161 [-1.326, 2.303], loss: 0.507259, mean_absolute_error: 0.616917, mean_q: 0.268948
  33/500: episode: 3, duration: 3.087s, episode steps: 14, steps per second: 5, episode reward: 14.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.214 [0.000, 1.000], mean observation: 0.125 [-1.517, 2.553], loss: 0.436648, mean_absolute_error: 0.580347, mean_q: 0.333230
  42/500: episode: 4, duration: 2.032s, episode steps: 9, steps per second: 4, episode reward: 9.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.000 [0.000, 0.000], mean observation: 0.137 [-1.783, 2.784], loss: 0.401373, mean_absolute_error: 0.567380, mean_q: 0.397797
  53/500: episode: 5, duration: 2.412s, episode steps: 11, steps per second: 5, episode reward: 11.000, mean reward: 1.000 [1.000, 1.000], mean action: 0.091 [0.000, 1

<keras.callbacks.History at 0xcbfd908>

Now we test our reinforcement learning model

In [1]:
dqn.test(env, nb_episodes=5, visualize=True)

NameError: name 'dqn' is not defined