___

<a href='http://www.pieriandata.com'>www.pieriandata.com</a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Keras RL DQN on Image Environment - Exercise - 




In thise notebook you will implement a DQN agent on the famous game of Pong:
**Use the Pong-v0 environment**
(https://gym.openai.com/envs/Pong-v0/) <br />

**TASK: Import necessary libraries and create the environment. Also extract the possible actions** <br />

In [15]:
import gym
import numpy as np

from PIL import Image
from gym.utils import play
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten, Conv2D, Permute
from tensorflow.keras.optimizers import Adam

from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
from rl.core import Processor
from rl.callbacks import ModelIntervalCheckpoint

In [2]:
env = gym.make("Pong-v0")
nb_actions = env.action_space.n

**TASK: Play the game manually (keys: a and d to move the bars)** <br />

In [5]:
play.play(env)

**TASK: Define an input size and the window length** <br />

In [6]:
IMG_SHAPE = (84, 84)
WINDOW_LENGTH = 4

**TASK: Create the ImageProcessor** <br />
It needs to:
1. Resize the image
2. Convert it to grayscale
3. Standardize it
4. Be memory efficient

Dont forget the reward clipping

In [8]:
class ImageProcessor(Processor):
    
    def process_observation(self, obs):
        img = Image.fromarray(obs)
        img = img.resize(IMG_SHAPE)
        # to grayscale (The L stands for luminance)
        img = img.convert("L")
        img = np.array(img)
        return img.astype("uint8")
    
    def process_state_batch(self, batch):
        processed_batch = batch.astype("float32") / 255.0
        return processed_batch

**TASK: Design the Convolutional Neural Network** <br />
Hint: Make sure to get the right input shape!

You can try the same architecture than presented in the previous notebook:
1. Conv2D(filters=32, kernel_size=8, stride=4)
2. Conv2D(filters=64, kernel_size=4, stride=2)
3. Conv2D(filters=64, kernel_size=3, stride=1)
4. Dense(512)

Dont forget the activation function

In [9]:
input_shape = (WINDOW_LENGTH, IMG_SHAPE[0], IMG_SHAPE[1])

model = Sequential()
model.add(Permute((2, 3, 1), input_shape=input_shape))
model.add(Conv2D(32, (8, 8), strides=(4, 4), kernel_initializer="he_normal"))
model.add(Activation("relu"))
model.add(Conv2D(64, (4, 4), strides=(2, 2), kernel_initializer="he_normal"))
model.add(Activation("relu"))
model.add(Conv2D(64, (3, 3), strides=(1, 1), kernel_initializer="he_normal"))
model.add(Activation("relu"))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation("relu"))
model.add(Dense(nb_actions))
model.add(Activation("linear"))
model.summary()

Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
permute (Permute)            (None, 84, 84, 4)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 20, 20, 32)        8224      
_________________________________________________________________
activation (Activation)      (None, 20, 20, 32)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 9, 9, 64)          32832     
_________________________________________________________________
activation_1 (Activation)    (None, 9, 9, 64)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          36928     
_________________________________________

**TASK: Create the Replay Memory** <br />


In [10]:
memory = SequentialMemory(limit=1_000_000, window_length=WINDOW_LENGTH)

**TASK: Create the processor** <br />


In [11]:
processor = ImageProcessor()

**TASK: Define the action selection policy.** <br />
Feel free to try all policies you like. (Hint: decaying epsilon greedy also works here)

In [12]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(),
                              attr="eps",
                              value_min=0.1,
                              value_max=1.0,
                              value_test=0.05,
                              nb_steps=1_000_000)

**TASK: Create the agent.** <br />
Dont forget to compile!

In [13]:
dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy,
               memory=memory, processor=processor, nb_steps_warmup=50_000,
               gamma=0.99, target_model_update=10_000,
               train_interval=4, delta_clip=1)
dqn.compile(Adam(learning_rate=0.00025), metrics=["mae"])

**TASK: Define a checkpoint callback to store the weights during training.** <br />
Please name it differently than our provided checkpoint to avoid overwriting it

In [16]:
weights_filename = "dqn_pong_weights_student.h5f"
checkpoint_weights_filename = "dqn_pong_weights_student_{step}.h5f"
checkpoint_callback = ModelIntervalCheckpoint(checkpoint_weights_filename, interval=100_000)

**TASK: Train the agent.** <br />

In [21]:
dqn.fit(env, nb_steps=1_500_000, callbacks=[checkpoint_callback], log_interval=10_000, visualize=False)
dqn.save_weights(checkpoint_weights_filename, overwrite=True)

**TASK: Evaluate the agent.** <br />

In [20]:
dqn.test(env, nb_episodes=5, visualize=True)

Training for 1500000 steps ...
Interval 1 (0 steps performed)
 1296/10000 [==>...........................] - ETA: 1:17 - reward: -0.0170done, took 11.595 seconds


**TASK: Load your weights (or the provided ones) and create an agent from those** <br />

In [24]:
model.load_weights("weights_exercise/dqn_PONG_weights_1500000.h5f")

memory = SequentialMemory(limit=1_000_000, window_length=WINDOW_LENGTH)
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(0.1),
                              attr="eps",
                              value_min=0.1,
                              value_max=1.0,
                              value_test=0.05,
                              nb_steps=100_000)
processor = ImageProcessor()
dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy,
               memory=memory, processor=processor, nb_steps_warmup=50_000,
               gamma=0.99, target_model_update=10_000)
dqn.compile(Adam(learning_rate=0.00025), metrics=["mae"])


Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x00000263FCDBC748> and <tensorflow.python.keras.layers.core.Permute object at 0x00000263FDF59908>).

Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x00000263FEA13BC8> and <tensorflow.python.keras.layers.core.Activation object at 0x00000263FDEC77C8>).

Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.convolutional.Conv2D object at 0x00000263FCC372C8> and <tensorflow.python.keras.layers.core.Activation object at 0x00000263FEA138C8>).

Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.core.Dense object at 0x00000263FEA398C8> and <tensorflow.python.keras.layers.core.Flatten object at 0x00000263FEA39A88>).

Two checkpoint references resolved to different objects (<tensorflow.python.keras.layers.core.Dense object at 0x000

In [None]:
dqn.test(env, nb_episodes=5, visualize=True)