# In this project we will solve two simple environments using a Q-table and a Neural Network (Deep Q-learning).

# Subproject 1

Solve [`FrozenLake8x8-v0`](https://gym.openai.com/envs/FrozenLake8x8-v0/) using a Q-table.


1. Import Necessary Packages:

In [170]:
import numpy as np
import random
import gym


2. Instantiate the Environment and Agent

In [171]:
env = gym.make("FrozenLake8x8-v0")
env.render()


[41mS[0mFFFFFFF
FFFFFFFF
FFFHFFFF
FFFFFHFF
FFFHFFFF
FHHFFFHF
FHFFHFHF
FFFHFFFG


3. Set up the QTable:

In [172]:
action_size = env.action_space.n
print("Actions: ", action_size)
state_size = env.observation_space.n
print("States: ", state_size)
obs = env.reset()
print(env.action_space)

Actions:  4
States:  64
Discrete(4)


4. The Q-Learning algorithm training

In [134]:
# Hyperparameters
tot_eps = 50000
tot_test_eps = 20

lr = 0.01
discount = 0.97

epsilon = 1
epsilon_max = 0.9
epsilon_min = 0.01
decay = 0.01

qtable = np.zeros((state_size, action_size))
print(qtable.size)

256


In [135]:
for ep in range(tot_eps):
    # reset variables at start of new episode
    state = env.reset()
    step = 0
    done = False
    reward = 0
    while not done:
        action = env.action_space.sample()
        state_new, reward, done, _ = env.step(action)
        qtable[state, action] = qtable[state, action] + lr * (reward + discount * np.max(qtable[state_new, :]) - qtable[state, action])
        state = state_new
print("Done!")
print(qtable)

Done!
[[1.57371960e-03 1.62248291e-03 1.63482394e-03 1.63752411e-03]
 [1.66163805e-03 1.75584088e-03 1.82161302e-03 1.81161464e-03]
 [1.88142269e-03 2.04875025e-03 2.12988025e-03 2.08283855e-03]
 [2.23532919e-03 2.40802092e-03 2.60767832e-03 2.60944105e-03]
 [2.80624345e-03 3.06907531e-03 3.27801953e-03 3.13699770e-03]
 [3.63935849e-03 3.91919426e-03 4.23649108e-03 3.85392075e-03]
 [4.75905626e-03 4.92472740e-03 5.07369881e-03 4.44967203e-03]
 [5.67761888e-03 5.70032515e-03 5.59628301e-03 4.85035646e-03]
 [1.48324753e-03 1.50460534e-03 1.52135774e-03 1.58274825e-03]
 [1.52538233e-03 1.56858722e-03 1.64000092e-03 1.68695752e-03]
 [1.71478665e-03 1.70437365e-03 1.83882340e-03 1.94041130e-03]
 [1.33614231e-03 1.68848779e-03 1.90894413e-03 2.35652554e-03]
 [2.46183821e-03 2.76545263e-03 3.08062416e-03 3.02586623e-03]
 [3.40283975e-03 4.00722968e-03 4.39379026e-03 4.19848733e-03]
 [5.27545937e-03 6.30889227e-03 6.29613691e-03 5.17394834e-03]
 [7.00221673e-03 8.00516723e-03 7.55925410e-03 5.

5. Evaluate how well your agent performs
* Render output of one episode
* Give an average episode return

In [179]:
rewards = []
for ep in range(tot_test_eps):
    state = env.reset()
    step = 0
    done = False
    tot_rewards = 0
    while not done:
        action = np.argmax(qtable[state, :])
        state_new, reward, done, info = env.step(action)
        tot_rewards += reward
        state = state_new
    rewards.append(tot_rewards)
env.close()
print ("Score over time: " +  str(sum(rewards)/tot_test_eps))
print(rewards)

Score over time: 0.4
[0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0]


In [95]:
print(qtable)

[[5.05611214e-03 1.64281208e-03 1.39869813e-03 1.39673710e-03]
 [1.00563496e-03 5.67034456e-07 2.88536652e-06 3.07583735e-05]
 [6.55521910e-05 0.00000000e+00 6.59519855e-09 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [7.02053403e-03 1.44890168e-03 1.44697200e-03 9.53765272e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.64752434e-03 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [8.74426505e-04 1.19054506e-02 2.24297922e-03 3.14711184e-03]
 [3.31422364e-02 3.25140304e-03 4.13166005e-03 1.31475970e-03]
 [4.76239643e-02 4.69001990e-03 1.30624024e-03 1.31344633e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.49444366e-03 1.00557434e-01 3.31558901e-03 4.29033329e-03]
 [6.80956011e-03 1.38539825e-02 4.98670923e-03 2.85517985e-01]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.000000

# Subproject 2

Solve [MoonLander-v2](https://gym.openai.com/envs/LunarLander-v2/) using DQN.

**1. Import Necessary Packages:**


In [None]:
!pip install box2d-py
#Imports
import gym
import numpy as np
import matplotlib.pyplot as plt
from collections import deque
import tensorflow as tf
from tensorflow import keras
#from keras.models import Sequential
#from keras.layers import Dense
#from keras.optimizers import Adam
import random
from gym import wrappers

[31mERROR: Could not find a version that satisfies the requirement swig (from versions: none)[0m
[31mERROR: No matching distribution found for swig[0m
Collecting box2d-py
  Downloading box2d_py-2.3.8-cp37-cp37m-manylinux1_x86_64.whl (448 kB)
[K     |████████████████████████████████| 448 kB 4.4 MB/s 
[?25hInstalling collected packages: box2d-py
Successfully installed box2d-py-2.3.8


**2. Instantiate the Environment**

In [None]:
env = gym.make('LunarLander-v2')
env.seed(0)
print('State shape: ', env.observation_space.shape)
print('Number of Actions: ', env.action_space.n)

State shape:  (8,)
Number of Actions:  4


**3. Implement and instantiate the agent**



**4. Train the agent with DQN**

4.1 Show the episode return plot
  
  - Is the agent learning to solve the task?

4.2 Save the best model

**5. Load the model from the disk and run it in a loop**
- Hint: if you want to see the agent laning the Moon Lander, type `env.render()` after the `env.step()`.
- Do to Colab not cooperating with the Gym rendering, you might want to download the trained model and run this loop on you computer to visualise the behavior.

**Helper functions**

Save rendered images:

In [None]:
import imageio
import numpy as np

images = []
images.append(img)
img = model.env.render(mode='rgb_array')

imageio.mimwrite('./moonlander.gif',
                [np.array(img) for i, img in enumerate(images) if i%2 == 0],
                fps=29)

Display saved .gif

In [None]:
from pathlib import Path
gifPath = Path("./moonlander.gif")
# Display GIF in Jupyter, CoLab, IPython
with open(gifPath,'rb') as f:
    display.Image(data=f.read(), format='png')