# **Deep Reinforcement Learning**

In this lab, you will get an introduction to reinforcement learning by solving a real-time decision problem.

Please open [this tutorial](https://youtu.be/cO5g5qLrLSo) to follow the steps in this notebook. This is a 20 minutes tutorial in which you will learn how to define, train and test a reinforcement learning problem. You will also learn some useful databases where you can download similar problems.


**Saturn shortcuts**

Press Ctrl+return to run each section separately. Please note that some sections depend on the previous sections, and run them in order. You can run the whole program at once, buy clicking the Run All button.

---


# Theory: The four concepts of reinforcement learning


***Check point***

What are the four main concepts that make up reinforcement learning? (Hint: Area 51)

Action, reinforcement, enviroment, agent

# 0. Install Dependencies

In [16]:
# All the packages are available in EdStem.
# This code prevents multiple installations on the EdStem operating system
import os 
if not os.getenv("ED_COURSE_ID"):
    !pip install tensorflow gym keras keras-rl2

# 1. Test Random Environment with OpenAI Gym

Import libraries

In [17]:
import sys
sys.path.append('./.local/lib/python3.9/site-packages')
import gym 
import random

Set up environment

In [18]:
# Use the make method to generate the CartPole environment and set it to env
env = gym.make('CartPole-v0')

# Extract the available states and actions
states = env.observation_space.shape[0]
actions = env.action_space.n

# Write code to inspect the number of actions available in this problem


Visualize the random environment

**Note on visualization**: If the display window comes up but the environment is not displayed, please click on the "..." on the top right of the window to select "Full view", then click on the "Remote App" button (with a blue dot) to view the rendering. The environment only shows through the duration of the while loop. If the window closes, you can rerun this section (Ctrl+Enter), and click on the remote app button, to view the display again.

In [19]:
# Trigger Ed's X display
!xdpyinfo

episodes = 10
# Repeat process 10 times
for episode in range(1, episodes+1):
    # Each time, reset the environment
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        # render the environment so that it remains visible on the screen
        env.render()  
        # take a random choice to move left or right     
        action = random.choice([0,1]) 
        # apply the action to the environment and collect feedback
        n_state, reward, done, info = env.step(action) 
        # Add the reward to the cummulative score
        score+=reward 
    # End of loop: print out the maximum score
    print('Episode:{} Score:{}'.format(episode, score))

name of display:    :1.0
version number:    11.0
vendor string:    The X.Org Foundation
vendor release number:    12009000
X.Org version: 1.20.9
maximum request size:  16777212 bytes
motion buffer size:  256
bitmap unit, bit order, padding:    32, LSBFirst, 32
image byte order:    LSBFirst
number of supported pixmap formats:    6
supported pixmap formats:
    depth 1, bits_per_pixel 1, scanline_pad 32
    depth 4, bits_per_pixel 8, scanline_pad 32
    depth 8, bits_per_pixel 8, scanline_pad 32
    depth 16, bits_per_pixel 16, scanline_pad 32
    depth 24, bits_per_pixel 32, scanline_pad 32
    depth 32, bits_per_pixel 32, scanline_pad 32
keycode range:    minimum 8, maximum 255
focus:  window 0x600012, revert to PointerRoot
number of extensions:    23
    BIG-REQUESTS
    Composite
    DAMAGE
    DOUBLE-BUFFER
    GLX
    Generic Event Extension
    MIT-SCREEN-SAVER
    MIT-SHM
    Present
    RANDR
    RECORD
    RENDER
    SHAPE
    SYNC
    VNC-EXTE

# 2. Create a Deep Learning Model with Keras

In [20]:
# Import dependencies needed for this step from numpy and keras
import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.optimizers import Adam

In [21]:
# Define a function that builds a model so that we can reuse it multiple times
# To build a model, the function needs the available states and actions
def build_model(states, actions):
    model = Sequential()
    model.add(Flatten(input_shape=(1,states)))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(24, activation='relu'))
    model.add(Dense(actions, activation='linear'))
    return model

In [22]:
# Create an instance of a model by calling the build_model function
model = build_model(states, actions)
# Write code to inspect the built model by outputting the summary
model.summary()


Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 flatten (Flatten)           (None, 4)                 0         
                                                                 
 dense (Dense)               (None, 24)                120       
                                                                 
 dense_1 (Dense)             (None, 24)                600       
                                                                 
 dense_2 (Dense)             (None, 2)                 50        
                                                                 
Total params: 770
Trainable params: 770
Non-trainable params: 0
_________________________________________________________________


# 3. Build Agent with Keras-RL

In [23]:
# Import dependencies to build an agent
from rl.agents import DQNAgent
from rl.policy import BoltzmannQPolicy
from rl.memory import SequentialMemory

In [24]:
# Define a function to build a DQN agent given the model and the set of actions
def build_agent(model, actions):
    policy = BoltzmannQPolicy()
    memory = SequentialMemory(limit=50000, window_length=1)
    dqn = DQNAgent(model=model, memory=memory, policy=policy, 
                  nb_actions=actions, nb_steps_warmup=10, target_model_update=1e-2)
    return dqn

Use the DQN agent to train the reinforcement learning model.

Note that this step takes a few minutes. Move to the next steps after the button on the left changes from 'stop' to show that the run is complete. Do not worry about the 'too much output' warning halfways through the run.

**To test this step**, please use the *Run All* button instead of running this section alone.

In [25]:
# Create an instance of an agent 
dqn = build_agent(model, actions)
# Compile the model
dqn.compile(Adam(learning_rate=1e-3), metrics=['mae'])
# Fit the model
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

2022-03-16 09:10:07.361574: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-03-16 09:10:07.362077: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
  updates=self.state_updates,


Training for 50000 steps ...
Interval 1 (0 steps performed)
107 episodes - episode_reward: 92.682 [10.000, 200.000] - loss: 2.543 - mae: 19.233 - mean_q: 38.901

Interval 2 (10000 steps performed)
    1/10000 [..............................] - ETA: 37s - reward: 1.0000   18/10000 [..............................] - ETA: 29s - reward: 1.0000   35/10000 [..............................] - ETA: 29s - reward: 1.0000   52/10000 [..............................] - ETA: 29s - reward: 1.0000   69/10000 [..............................] - ETA: 29s - reward: 1.0000   86/10000 [..............................] - ETA: 29s - reward: 1.0000

<keras.callbacks.History at 0x7f06844db310>

Test Agent

In [26]:
scores = dqn.test(env, nb_episodes=100, visualize=False)
print(np.mean(scores.history['episode_reward']))

Testing for 100 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 160.000, steps: 160
Episode 3: reward: 189.000, steps: 189
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 200.000, steps: 200
Episode 6: reward: 189.000, steps: 189
Episode 7: reward: 200.000, steps: 200
Episode 8: reward: 192.000, steps: 192
Episode 9: reward: 200.000, steps: 200
Episode 10: reward: 200.000, steps: 200
Episode 11: reward: 200.000, steps: 200
Episode 12: reward: 200.000, steps: 200
Episode 13: reward: 200.000, steps: 200
Episode 14: reward: 200.000, steps: 200
Episode 15: reward: 200.000, steps: 200
Episode 16: reward: 200.000, steps: 200
Episode 17: reward: 193.000, steps: 193
Episode 18: reward: 200.000, steps: 200
Episode 19: reward: 200.000, steps: 200
Episode 20: reward: 200.000, steps: 200
Episode 21: reward: 200.000, steps: 200
Episode 22: reward: 174.000, steps: 174
Episode 23: reward: 200.000, steps: 200
Episode 24: reward: 200.000, steps: 200
Episode 25: reward: 

Visualize the DQN model

In [27]:
_ = dqn.test(env, nb_episodes=15, visualize=True)

Testing for 15 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 196.000, steps: 196
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 200.000, steps: 200
Episode 6: reward: 200.000, steps: 200
Episode 7: reward: 200.000, steps: 200
Episode 8: reward: 200.000, steps: 200
Episode 9: reward: 200.000, steps: 200
Episode 10: reward: 200.000, steps: 200
Episode 11: reward: 200.000, steps: 200
Episode 12: reward: 200.000, steps: 200
Episode 13: reward: 200.000, steps: 200
Episode 14: reward: 200.000, steps: 200
Episode 15: reward: 200.000, steps: 200


# 4. Reload Agent from Memory

Use the save weights method to save the RL model weights in a file that will be saved under the ReinforcementLearning folder

In [33]:
dqn.save_weights('ReinforcementLearning/dqn_weights.h5f', overwrite=True)

Delete the model, agent and environment now that the weights are saved

In [34]:
del model 
del dqn
del env

Reinstanciate the env, model and dqn

In [35]:
env = gym.make('CartPole-v0')
actions = env.action_space.n
states = env.observation_space.shape[0]
model = build_model(states, actions)
dqn = build_agent(model, actions)
dqn.compile(Adam(lr=1e-3), metrics=['mae'])

Reload the weights into the model

In [40]:
dqn.load_weights('ReinforcementLearning/dqn_weights.h5f')

Testing for 5 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 182.000, steps: 182
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 174.000, steps: 174


# Earn Your Wings
Test the environment again to see if we get similar results as before reloading the agent from memory

In [41]:
# Write code to test the new model with reloaded weights
_ = dqn.test(env, nb_episodes = 15, visualize = False)

Testing for 15 episodes ...
Episode 1: reward: 200.000, steps: 200
Episode 2: reward: 200.000, steps: 200
Episode 3: reward: 200.000, steps: 200
Episode 4: reward: 200.000, steps: 200
Episode 5: reward: 200.000, steps: 200
Episode 6: reward: 200.000, steps: 200
Episode 7: reward: 188.000, steps: 188
Episode 8: reward: 164.000, steps: 164
Episode 9: reward: 200.000, steps: 200
Episode 10: reward: 200.000, steps: 200
Episode 11: reward: 200.000, steps: 200
Episode 12: reward: 200.000, steps: 200
Episode 13: reward: 200.000, steps: 200
Episode 14: reward: 176.000, steps: 176
Episode 15: reward: 200.000, steps: 200
