# DQN: CartPole-v0 | EPOCH Lab

A pole is attached by an un-actuated joint to a cart, which moves along a frictionless track. The system is controlled by applying a force of +1 or -1 to the cart. The pendulum starts upright, and the goal is to prevent it from falling over. A reward of +1 is provided for every timestep that the pole remains upright. The episode ends when the pole is more than 15 degrees from vertical, or the cart moves more than 2.4 units from the center.

- Python: 3.6.12
- Keras-GPU: 2.3.1
- KerasRL2: 1.0.4

In [1]:
import gym

import numpy as np

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten
from tensorflow.keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

### Build OpenAI Gym Environment

Get the environment and extract the number of states and actions.

In [2]:
ENV_NAME = 'CartPole-v0'

In [3]:
env = gym.make(ENV_NAME)
np.random.seed(123)
env.seed(123)



[123]

In [4]:
states = env.observation_space.shape
actions = env.action_space.n

print('States:', states[0])
print('Actions:', actions)

States: 4
Actions: 2


### Create Deep Learning Model

Build a very simple model regardless of the dueling architecture if you enable dueling network in DQN , DQN will build a dueling network base on your model automatically. Also, you can build a dueling network by yourself and turn off the dueling network in DQN.

In [5]:
def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape = (1, ) + states))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(16, activation='relu'))
    model.add(Dense(actions, activation = 'linear'))
    
    return model

In [6]:
model = build_model(states, actions)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
flatten (Flatten)            (None, 4)                 0         
_________________________________________________________________
dense (Dense)                (None, 16)                80        
_________________________________________________________________
dense_1 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_2 (Dense)              (None, 16)                272       
_________________________________________________________________
dense_3 (Dense)              (None, 2)                 34        
Total params: 658
Trainable params: 658
Non-trainable params: 0
_________________________________________________________________


Configure and compile our agent. You can use every built-in tensorflow.keras optimizer and even the metrics.

In [7]:
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()

dqn = DQNAgent(model=model, nb_actions=actions, memory=memory, nb_steps_warmup=10, target_model_update=1e-2, policy=policy)

dqn.compile(Adam(lr=1e-3), metrics=['mae'])

### Training Loop

In [8]:
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

Training for 50000 steps ...
Interval 1 (0 steps performed)




    1/10000 [..............................] - ETA: 46:42 - reward: 1.0000



108 episodes - episode_reward: 91.111 [8.000, 200.000] - loss: 3.752 - mae: 19.724 - mean_q: 39.703

Interval 2 (10000 steps performed)
51 episodes - episode_reward: 196.647 [137.000, 200.000] - loss: 8.772 - mae: 40.909 - mean_q: 82.295

Interval 3 (20000 steps performed)
51 episodes - episode_reward: 196.706 [142.000, 200.000] - loss: 9.447 - mae: 45.716 - mean_q: 91.892

Interval 4 (30000 steps performed)
50 episodes - episode_reward: 199.900 [196.000, 200.000] - loss: 8.099 - mae: 44.955 - mean_q: 90.094

Interval 5 (40000 steps performed)
done, took 988.823 seconds


<tensorflow.python.keras.callbacks.History at 0x7f6bd23e2650>

In [9]:
# After training is done, we save the final weights.
dqn.save_weights('results/dqn_{}_weights.h5f'.format(ENV_NAME), overwrite=True)

### Inference

In [10]:
# Finally, evaluate our algorithm for 5 episodes.
dqn.test(env, nb_episodes=5, visualize=True)

Testing for 5 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 200.000, steps: 200
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 200.000, steps: 200


<tensorflow.python.keras.callbacks.History at 0x7f6bac248ed0>