# Simple Reinforcement Learning with Tensorflow: Part 0 - Q-Tables
In this iPython notebook we implement a Q-Table algorithm that solves the FrozenLake problem. To learn more, read here: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

For more reinforcment learning tutorials, see:
https://github.com/awjuliani/DeepRL-Agents

In [2]:
import gym
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

### Load the environment

In [3]:
env = gym.make('FrozenLake-v0')

### Implement Q-Table learning algorithm

In [4]:
# Initialize table with all zeros
Q = np.zeros([env.observation_space.n, env.action_space.n])
# Set learning parameters
lr = .8
y = .95
num_episodes = 2000
# create lists to contain total rewards and steps per episode
# jList = []
rList = []
for i in range(num_episodes):
    # Reset environment and get first new observation
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    # The Q-Table learning algorithm
    while j < 99:
        j += 1
        # C hoose an action by greedily (with noise) picking from Q table
        a = np.argmax(Q[s, :] + np.random.randn(1, env.action_space.n) * (1. / (i + 1)))
        # Get new state and reward from environment
        s1, r, d, _ = env.step(a)
        # Update Q-Table with new knowledge
        Q[s, a] = Q[s, a] + lr * (r + y * np.max(Q[s1, :]) - Q[s, a])
        rAll += r
        s = s1
        if d:
            break
            
    # jList.append(j)
    rList.append(rAll)


In [5]:
print("Score over time: " +  str(sum(rList)/num_episodes))

Score over time: 0.4895


In [6]:
print("Final Q-Table Values")
print(Q)

Final Q-Table Values
[[1.27673841e-01 8.04769169e-03 1.07087894e-02 1.03659145e-02]
 [4.26412069e-03 2.34803535e-03 2.08965604e-03 2.32469796e-01]
 [2.33713394e-03 2.36690374e-03 7.47586550e-03 1.91454626e-01]
 [1.62357245e-03 0.00000000e+00 4.22164452e-04 9.48680773e-02]
 [1.36003833e-01 1.82566149e-03 1.95903470e-04 1.63207895e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [4.86535662e-05 8.30438492e-05 9.36531064e-02 4.11042728e-12]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.70082830e-03 1.29262951e-03 2.02433998e-03 2.02818454e-01]
 [0.00000000e+00 2.88175069e-01 2.01890506e-03 0.00000000e+00]
 [1.64207533e-01 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.82982387e-03 1.12408986e-03 4.96217950e-01 1.78379617e-03]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 9.13545965e-01]
 [0.00000000e+00 0.00000000e+00 0.