# Doom-Health: REINFORCE Monte Carlo Policy gradients 🕹️

In this notebook we'll implement an agent <b>that try to survive in Doom environment by using a Policy Gradient architecture.</b> <br>
Our agent playing Doom:

<img src="assets/projectw4.gif" style="max-width: 600px;" alt="Policy Gradient with Doom"/>

# You can follow this notebook with this video tutorial 📹 that will helps you to understand each step:

In [None]:
from IPython.display import HTML
HTML('<iframe width="560" height="315" src="https://www.youtube.com/embed/wLTQRuizVyE?showinfo=0" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>')

# This is a notebook from [Deep Reinforcement Learning Course with Tensorflow](https://simoninithomas.github.io/Deep_reinforcement_learning_Course/)
<img src="https://raw.githubusercontent.com/simoninithomas/Deep_reinforcement_learning_Course/master/docs/assets/img/DRLC%20Environments.png" alt="Deep Reinforcement Course"/>
<br>
<p>  Deep Reinforcement Learning Course is a free series of articles and videos tutorials 🆕 about Deep Reinforcement Learning, where **we'll learn the main algorithms (Q-learning, Deep Q Nets, Dueling Deep Q Nets, Policy Gradients, A2C, Proximal Policy Gradients…), and how to implement them with Tensorflow.**
<br><br>
    
📜The articles explain the architectures from the big picture to the mathematical details behind them.
<br>
📹 The videos explain how to build the agents with Tensorflow </b></p>
<br>
This course will give you a **solid foundation for understanding and implementing the future state of the art algorithms**. And, you'll build a strong professional portfolio by creating **agents that learn to play awesome environments**: Doom© 👹, Space invaders 👾, Outrun, Sonic the Hedgehog©, Michael Jackson’s Moonwalker, agents that will be able to navigate in 3D environments with DeepMindLab (Quake) and able to walk with Mujoco. 
<br><br>
</p> 

## 📚 The complete [Syllabus HERE](https://simoninithomas.github.io/Deep_reinforcement_learning_Course/)


## Any questions 👨‍💻
<p> If you have any questions, feel free to ask me: </p>
<p> 📧: <a href="mailto:hello@simoninithomas.com">hello@simoninithomas.com</a>  </p>
<p> Github: https://github.com/simoninithomas/Deep_reinforcement_learning_Course </p>
<p> 🌐 : https://simoninithomas.github.io/Deep_reinforcement_learning_Course/ </p>
<p> Twitter: <a href="https://twitter.com/ThomasSimonini">@ThomasSimonini</a> </p>
<p> Don't forget to <b> follow me on <a href="https://twitter.com/ThomasSimonini">twitter</a>, <a href="https://github.com/simoninithomas/Deep_reinforcement_learning_Course">github</a> and <a href="https://medium.com/@thomassimonini">Medium</a> to be alerted of the new articles that I publish </b></p>
    
## How to help  🙌
3 ways:
- **Clap our articles and like our videos a lot**:Clapping in Medium means that you really like our articles. And the more claps we have, the more our article is shared Liking our videos help them to be much more visible to the deep learning community.
- **Share and speak about our articles and videos**: By sharing our articles and videos you help us to spread the word. 
- **Improve our notebooks**: if you found a bug or **a better implementation** you can send a pull request.
<br>

## Important note 🤔
<b> You can run it on your computer but it's better to run it on GPU based services</b>, personally I use Microsoft Azure and their Deep Learning Virtual Machine (they offer 170$)
https://azuremarketplace.microsoft.com/en-us/marketplace/apps/microsoft-ads.dsvm-deep-learning
<br>
⚠️ I don't have any business relations with them. I just loved their excellent customer service.

If you have some troubles to use Microsoft Azure follow the explainations of this excellent article here (without last the part fast.ai): https://medium.com/@manikantayadunanda/setting-up-deeplearning-machine-and-fast-ai-on-azure-a22eb6bd6429

## Prerequisites 🏗️
Before diving on the notebook **you need to understand**:
- The foundations of Reinforcement learning (MC, TD, Rewards hypothesis...) [Article](https://medium.freecodecamp.org/an-introduction-to-reinforcement-learning-4339519de419)
- Policy gradients [Article](https://medium.freecodecamp.org/an-introduction-to-policy-gradients-with-cartpole-and-doom-495b5ef2207f)
- We made a [tutorial video](https://youtu.be/wLTQRuizVyE) where we implement a Policy Gradient agent with Tensorflow that learns to play Doom 👹🔫 in a Deathmatch environment.

## Step 1: Import the libraries 📚

In [1]:
%%bash
# Install deps from 
# https://github.com/mwydmuch/ViZDoom/blob/master/doc/Building.md#-linux

sudo apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \
nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \
libopenal-dev timidity libwildmidi-dev unzip

# Boost libraries
sudo apt-get install libboost-all-dev

# Lua binding dependencies
sudo apt-get install liblua5.1-dev

sudo: apt-get: command not found
sudo: apt-get: command not found
sudo: apt-get: command not found


In [None]:
%%bash

sudo yum install gcc gcc-c++ make
sudo yum install boost-devel



In [3]:
!pip install vizdoom

Collecting vizdoom
  Using cached https://files.pythonhosted.org/packages/2d/6c/23565c09387173423883e7881fce53541ff89b5209ca0904c67e577dd6ac/vizdoom-1.1.7.tar.gz
Building wheels for collected packages: vizdoom
  Building wheel for vizdoom (setup.py) ... [?25lerror
[31m  ERROR: Command errored out with exit status 1:
   command: /home/ec2-user/anaconda3/envs/tensorflow_p36/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-rs_cqn3h/vizdoom/setup.py'"'"'; __file__='"'"'/tmp/pip-install-rs_cqn3h/vizdoom/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-vju0rjq4 --python-tag cp36
       cwd: /tmp/pip-install-rs_cqn3h/vizdoom/
  Complete output (30 lines):
  running bdist_wheel
  running build
  CMake Error: No source or binary directory provided
  [1m
  Installation failed, you may be missing some 

In [7]:
import tensorflow as tf      # Deep Learning library
import numpy as np           # Handle matrices
from vizdoom import *        # Doom Environment
import random                # Handling random number generation
import time                  # Handling time calculation
from skimage import transform# Help us to preprocess the frames

from collections import deque# Ordered collection with ends
import matplotlib.pyplot as plt # Display graphs

import warnings # This ignore all the warning messages that are normally printed during the training because of skiimage
warnings.filterwarnings('ignore')

ImportError: No module named 'vizdoom'

## Step 2: Create our environment 🎮
- Now that we imported the libraries/dependencies, we will create our environment.
- Doom environment takes:
    - A `configuration file` that **handle all the options** (size of the frame, possible actions...)
    - A `scenario file`: that **generates the correct scenario** (in our case basic **but you're invited to try other scenarios**).
- Note: We have 3 possible actions `[[0,0,1], [1,0,0], [0,1,0]]` so we don't need to do one hot encoding (thanks to < a href="https://stackoverflow.com/users/2237916/silgon">silgon</a> for figuring out. 

### Our environment
<img src="assets/health_doom.jpg" style="max-width:500px;" alt="Doom health"/>

The purpose of this scenario is to teach the agent **how to survive without knowing what makes him survive.** Agent know only that life is precious and death is bad so he must learn what prolongs his existence and that his health is connected with it.

Map is a rectangle with green, acidic floor which hurts the player periodically. Initially there are some medkits spread uniformly over the map. A new medkit falls from the skies every now and then. **Medkits heal some portions of player's health - to survive agent needs to pick them up. Episode finishes after player's death or on timeout.**

Further configuration:

- living_reward = 1
- 3 available buttons: turn left, turn right, move forward
- 1 available game variable: HEALTH
- death penalty = 100
<br><br>

## Step 3: Define the preprocessing functions ⚙️
### preprocess_frame 🖼️
Preprocessing is an important step, <b>because we want to reduce the complexity of our states to reduce the computation time needed for training.</b>
<br><br>
Our steps:
- Grayscale each of our frames (because <b> color does not add important information </b>). But this is already done by the config file.
- Crop the screen (in our case we remove the roof because it contains no information)
- We normalize pixel values
- Finally we resize the preprocessed frame

### stack_frames
👏 This part was made possible thanks to help of <a href="https://github.com/Miffyli">Anssi</a><br>

As explained in this really <a href="https://danieltakeshi.github.io/2016/11/25/frame-skipping-and-preprocessing-for-deep-q-networks-on-atari-2600-games/">  good article </a> we stack frames.

Stacking frames is really important because it helps us to **give have a sense of motion to our Neural Network.**

- First we preprocess frame
- Then we append the frame to the deque that automatically **removes the oldest frame**
- Finally we **build the stacked state**

This is how work stack:
- For the first frame, we feed 4 frames
- At each timestep, **we add the new frame to deque and then we stack them to form a new stacked frame**
- And so on
<img src="https://raw.githubusercontent.com/simoninithomas/Deep_reinforcement_learning_Course/master/DQN/Space%20Invaders/assets/stack_frames.png" alt="stack">
- If we're done, **we create a new stack with 4 new frames (because we are in a new episode)**.

### discount_and_normalize_rewards 💰
This function is important, because we are in a Monte Carlo situation. <br>

We need to **discount the rewards at the end of the episode**. This function takes, the reward discount it, and **then normalize them** (to avoid a big variability in rewards).

## Step 4: Set up our hyperparameters ⚗️
In this part we'll set up our different hyperparameters. But when you implement a Neural Network by yourself you will **not implement hyperparamaters at once but progressively**.

- First, you begin by defining the neural networks hyperparameters when you implement the model.
- Then, you'll add the training hyperparameters when you implement the training algorithm.

Quick note: Policy gradient methods like reinforce **are on-policy method which can not be updated from experience replay.**

## Step 5: Create our Policy Gradient Neural Network model 🧠

<img src="assets/doomPG.png" alt="Doom PG"/>

## Step 6: Set up Tensorboard 📊
For more information about tensorboard, please watch this <a href="https://www.youtube.com/embed/eBbEDRsCmv4">excellent 30min tutorial</a> <br><br>
To launch tensorboard : `tensorboard --logdir=/tensorboard/pg/1`

## Step 7: Train our Agent 🏃‍♂️

Here we'll create batches.<br>
These batches contains episodes **(their number depends on how many rewards we collect**: for instance if we have episodes with only 10 rewards we can put batch_size/10 episodes
<br>
* Make a batch
    * For each step:
        * Choose action a
        * Perform action a
        * Store s, a, r
        * **If** done:
            * Calculate sum reward
            * Calculate gamma Gt

* Create the Neural Network
* Initialize the weights
* Init the environment
* maxReward = 0 # Keep track of maximum reward
* **For** epochs in range(num_epochs):
    * Get batches
    * Optimize

## Step 8: Watch our Agent play 👀
Now that we trained our agent, we can test it