<a href="https://colab.research.google.com/github/lin3372/252_ML.tutorial/blob/main/RL/03_Hands_on_RL_Tabular_SARSA_220801.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

[Hands-On Reinforcement Learning Course: Part 6 - Hyperparameters in Deep RL](https://medium.com/towards-data-science/hyperparameters-in-deep-rl-f8a9cf264cd6)

by [Pau Labarta Bajo](https://pau-labarta-bajo.medium.com/?source=post_page-----269b50e39d08--------------------------------), Mar 5, 2022.  [[github repo for this lesson]](https://github.com/Paulescu/hands-on-rl)

Summarized and Revised by Ivan H.P. Lin

Previous RL courses:
1. Part 1: Introduction to Reinforcement Learning - [datSci](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08), [Ivan's colab](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08)
2. Part 2: Tabular Q-learning - [datSci](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08), [Ivan's colab](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08)
3. Part 3: Tabular SARSA - [datSci](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08), [Ivan's colab](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-1-269b50e39d08)
4. Part 4: Linear Q-learning [datSci](https://towardsdatascience.com/hands-on-reinforcement-learning-course-part-4-55da5eae851f) [Ivan's colab](https://drive.google.com/file/d/1Q1CWVOjmH46Gf_Xb_u4Ht13U55LsQina/view?usp=sharing)

5. 👉🏻 Part 5: Deep Q-learning [datSci](https://medium.com/towards-data-science/hands-on-reinforcement-learning-course-part-5-bdb2e7fa243c) [Ivan's colab](https://medium.com/towards-data-science/hands-on-reinforcement-learning-course-part-5-bdb2e7fa243c)

6. 👉🏻 Hyperparameters in Deep RL (today) [datSci](https://medium.com/towards-data-science/hyperparameters-in-deep-rl-f8a9cf264cd6)[Ivan's colab](https://medium.com/towards-data-science/hyperparameters-in-deep-rl-f8a9cf264cd6)


In part 5 we built a perfect agent to solve the Cart Pole environment, using Deep Q Learning.

We used a set of hyperparameters that I shared with you. However, I did not explain how I got them.

If you want to become a real PRO in Reinforcement Learning, you need to learn how to tune hyperparameters. And for that, you need to use the right tools.

Today we will use the best open-source library for hyperparameter search in the Python ecosystem: **Optuna**.

All the code for this lesson is in [this Github repo](https://github.com/Paulescu/hands-on-rl). Git clone it to follow along with today’s problem.

And if you like the course, please give it a ⭐ in Github!

# 0. Contents

1. The problem
2. The solution: Bayesian search
3. Hyperparameter search with Optuna
4. Recap ✨
5. Homework 📚
6. What’s next? ❤️

#1.. Let’s go deep! 🕹️

In the previous lesson, we used this linear parameterization to represent the optimal $q$ function.

<figure><center>
<img src="https://miro.medium.com/max/711/0*ziABdoI655gxHn3H.jpeg" width="60%">
<figcaption>linear q function (Image by the author)</figcaption>
</center></figure>

The success (or failure) of a **parametric $Q$-learning** agent strongly depends on the parameterization we use to approximate the optimal $q$ value function.

Linear models are conceptually simple, fast to train, and fast to run. However, they are not very flexible. Given a set of inputs and outputs, linear layers struggle to map inputs to outputs.

And this is when neural networks enter into the game.

Neural network models are the most powerful function approximations we have. They are extremely flexible and can be used to uncover complex patterns between the input features and the target labels.


  > **The Universal Approximation Theorem** 📘 is a mathematical result that essentially says

  > Neural networks are as flexible as you want them to be. If you design a sufficiently large neural network (i.e. with enough parameters), you will find an accuracte mapping between the input features and the target values.

<img src="https://miro.medium.com/max/1465/1*Z_0zd9ld4CaSptmAlrYcdw.png" width="50%">

Today we are going to replace the linear model from part 4 with the most simple neural network architecture out there: a **feed-forward neural network**.

<figure><center>
<img src="https://miro.medium.com/max/1539/1*Ibtr60PwTe51yyJqUqh8Eg.jpeg" width="50%">
<figcaption>Feed-forward neural network (Image by the author)</figcaption>
</center></figure>

Later in the course, we will use other neural networks to deal with more complex states spaces (e.g. **convolutional neural networks**).

Let us warm up our deep learning mastery with the following imitation learning problem I created for today.

There is quite a lot to cover, so arm yourself with deep focus.




### display setup for colab
reference stackoverflow - [How to render OpenAI gym in google Colab? ](https://stackoverflow.com/questions/50107530/how-to-render-openai-gym-in-google-colab)

In [None]:
!apt-get install x11-utils > /dev/null 2>&1 
!pip install pyglet > /dev/null 2>&1 
!apt-get install -y xvfb python-opengl > /dev/null 2>&1

In [None]:
!pip install gym pyvirtualdisplay > /dev/null 2>&1

In [None]:
# then import all your libraries, including *matplotlib* & *ipythondisplay*:

import gym
import numpy as np
import matplotlib.pyplot as plt
from IPython import display as ipythondisplay

In [None]:
from pyvirtualdisplay import Display
display = Display(visible=0, size=(400, 300))
display.start()

<pyvirtualdisplay.display.Display at 0x7f85f64960d0>

### download util files from github

In [None]:
import os
user = "Paulescu"
repo = "hands-on-rl"
src_dir = "03_cart_pole/src/"
pyfiles = ["agent_memory.py", "config.py", "loops.py", "model_factory.py", "optimize_hyperparameters.py",\
           "q_agent.py", "random_agent.py", "supervised_ml.py", "utils.py", "viz.py"]
### Note - For "viz.py" there is an error in get_action(), so I fixed it and put in my respoitory =, needs to download tehe viz.py from my github repository

curr_dir=os.getcwd()
os.makedirs('src', exist_ok=True)

os.chdir('src')

for f_rl in pyfiles:
  url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{f_rl}"
  !wget --no-cache --backups=1 {url}

##########################   
user = "lin3372"
repo = "252_ML.tutorial"
src_dir = "RL/src/03_cart_pole/saved_agents/CartPole-v1/0/"
pyfiles = ["hparams.json", "model"]

os.chdir(curr_dir)
os.makedirs('saved_agents', exist_ok=True)
os.chdir('saved_agents')
os.makedirs('CartPole-v1', exist_ok=True)
os.chdir('CartPole-v1')

#saved_agents/CartPole-v1

for f_rl in pyfiles:
  url = f"https://raw.githubusercontent.com/{user}/{repo}/main/{src_dir}/{f_rl}"
  !wget --no-cache --backups=1 {url}
########################## 

os.chdir(curr_dir)

--2022-08-13 07:20:48--  https://raw.githubusercontent.com/Paulescu/hands-on-rl/main/03_cart_pole/src//agent_memory.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /Paulescu/hands-on-rl/main/03_cart_pole/src/agent_memory.py [following]
--2022-08-13 07:20:48--  https://raw.githubusercontent.com/Paulescu/hands-on-rl/main/03_cart_pole/src/agent_memory.py
Reusing existing connection to raw.githubusercontent.com:443.
HTTP request sent, awaiting response... 200 OK
Length: 581 [text/plain]
Saving to: ‘agent_memory.py’


2022-08-13 07:20:48 (44.6 MB/s) - ‘agent_memory.py’ saved [581/581]

--2022-08-13 07:20:49--  https://raw.githubusercontent.com/Paulescu/hands-on-rl/main/03_cart_pole/src//config.py
Resolving raw.githubusercontent.com (raw.github