___

<a href='http://www.pieriandata.com'><img src='../Pierian_Data_Logo.png'/></a>
___
<center><em>Copyright by Pierian Data Inc.</em></center>
<center><em>For more information, visit us at <a href='http://www.pieriandata.com'>www.pieriandata.com</a></em></center>

# Keras-RL DQN Exercise - Solutions


In this exercise you are going to implement your first keras-rl agent based on the **Acrobot** environment (https://gym.openai.com/envs/Acrobot-v1/) <br />
The goal of this environment is to maneuver the robot arm upwards above the line with as little steps as possible

**TASK: Import necessary libraries** <br />

In [None]:
import time  # to reduce the game speed when playing manually

import gym  # Contains the game we want to play

# import necessary blocks from keras to build the Deep Learning backbone of our agent
from tensorflow.keras.models import Sequential  # To compose multiple Layers
from tensorflow.keras.layers import Dense  # Fully-Connected layer
from tensorflow.keras.layers import Activation  # Activation functions
from tensorflow.keras.layers import Flatten  # Flatten function

from tensorflow.keras.optimizers import Adam  # Adam optimizer

# Now the keras-rl2 packages. Dont get confused as they are only called rl and not keras-rl

from rl.agents.dqn import DQNAgent  # Use the basic Deep-Q-Network agent

# LinearAnnealedPolicy allows to decay the epsilon for the epsilon greedy strategy
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory  # Sequential Memory for storing observations ( optimized circular buffer)


**TASK: Create the environment** <br />
The name is: *Acrobot-v1*

In [None]:
env_name = ENV_NAME = 'Acrobot-v1'
env = gym.make(env_name)

In [None]:
num_actions = env.action_space.n
num_observations = env.observation_space.shape
print(f"Action Space: {env.action_space.n}")
print(f"Observation Space: {num_observations}")

assert num_actions == 3 and num_observations == (6,) , "Wrong environment!"

**TASK: Create the Neural Network for your Deep-Q-Agent** <br />
Take a look at the size of the action space and the size of the observation space.
You are free to chose any architecture you want! <br />
Hint: It already works with three layers, each having 64 neurons.

In [None]:
model = Sequential()
model.add(Flatten(input_shape=(1,) + num_observations))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))

model.add(Dense(64))
model.add(Activation('relu'))


model.add(Dense(num_actions))
model.add(Activation('linear'))

print(model.summary())

**TASK: Initialize the circular buffer**<br />
Make sure you set the limit appropriately (50000 works well)

In [None]:
memory = SequentialMemory(limit=50000, window_length=1)


**TASK: Use the epsilon greedy action selection strategy with *decaying* epsilon**

In [None]:
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), 
                              attr='eps',
                              value_max=1.,
                              value_min=.1,
                              value_test=.05,
                              nb_steps=150000) 


**TASK: Create the DQNAgent** <br />
Feel free to play with the nb_steps_warump, target_model_update, batch_size and gamma parameters. <br />
Hint:<br />
You can try *nb_steps_warmup*=1000, *target_model_update*=1000, *batch_size*=32 and *gamma*=0.99 as a first guess

In [None]:
dqn = DQNAgent(model=model, nb_actions=num_actions, memory=memory, nb_steps_warmup=1000,
               target_model_update=1000, policy=policy, batch_size=32, gamma=0.99)

 # Use the Adam optimizer with a learning rate of 1e-3 and log the mean absolute error


**TASK: Compile the model** <br />
Feel free to explore the effects of different optimizers and learning rates.
You can try Adam with a learning rate of 1e-3 as a first guess 

In [None]:
dqn.compile(Adam(lr=1e-3), metrics=['mae']) 

**TASK: Fit the model** <br />
150,000 steps should be a very good starting point

In [None]:
dqn.fit(env, nb_steps=150000, visualize=False, verbose=2)

**TASK: Evaluate the model**

In [None]:
dqn.test(env, nb_episodes=5, visualize=True)
env.close()