# Simple Reinforcement Learning with Tensorflow: Part 0 - Q-Tables
In this iPython notebook we implement a Q-Table algorithm that solves the FrozenLake problem. To learn more, read here: https://medium.com/@awjuliani/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

For more reinforcment learning tutorials, see:
https://github.com/awjuliani/DeepRL-Agents

In [1]:
import gym
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline

### Load the environment

In [2]:
env = gym.make('FrozenLake-v0')

### Implement Q-Table learning algorithm

In [3]:
#Initialize table with all zeros
Q = np.zeros([env.observation_space.n,env.action_space.n])
# Set learning parameters
lr = .8
y = .95
num_episodes = 2000
#create lists to contain total rewards and steps per episode
#jList = []
rList = []
for i in range(num_episodes):
    #Reset environment and get first new observation
    s = env.reset()
    rAll = 0
    d = False
    j = 0
    #The Q-Table learning algorithm
    while j < 99:
        j+=1
        #Choose an action by greedily (with noise) picking from Q table
        a = np.argmax(Q[s,:] + np.random.randn(1,env.action_space.n)*(1./(i+1)))
        #Get new state and reward from environment
        s1,r,d,_ = env.step(a)
        #Update Q-Table with new knowledge
        Q[s,a] = Q[s,a] + lr*(r + y*np.max(Q[s1,:]) - Q[s,a])
        rAll += r
        s = s1
        if d == True:
            break
    #jList.append(j)
    rList.append(rAll)

In [4]:
print("Score over time: " +  str(sum(rList)/num_episodes))

Score over time: 0.469


In [5]:
print("Final Q-Table Values")
print(Q)

Final Q-Table Values
[[4.41880451e-01 7.89006809e-03 1.31148722e-02 5.32467539e-03]
 [2.22880837e-03 9.99044496e-04 1.72326339e-04 1.32121578e-01]
 [2.99159841e-03 6.14358973e-02 1.38689729e-03 8.18069763e-03]
 [1.43860601e-04 8.78675541e-04 1.28558586e-04 3.89363463e-02]
 [5.25385601e-01 1.25159023e-03 7.98981121e-03 2.22032232e-03]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.27946967e-02 1.14706562e-05 6.53322998e-04 2.93130747e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [2.64086109e-04 1.47597602e-03 1.36636650e-03 6.68535535e-01]
 [4.59607560e-04 3.15004127e-01 0.00000000e+00 0.00000000e+00]
 [7.39345266e-02 3.69055572e-04 8.02935478e-04 1.26739469e-03]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [6.66311736e-04 2.85780684e-05 7.46638074e-01 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 9.55780430e-01]
 [0.00000000e+00 0.00000000e+00 0.