Tutorial from: https://medium.com/emergent-future/simple-reinforcement-learning-with-tensorflow-part-0-q-learning-with-tables-and-neural-networks-d195264329d0

# Reinforced Learning tutorial

## Q-Table Learning

In [2]:
import gym
import numpy as np

In [4]:
env = gym.make('FrozenLake-v0')



In [7]:
env.render()


[41mS[0mFFF
FHFH
FFFH
HFFG


Implement Q-Table learning algorithm

In [8]:
# Initialize table with all zeros
# The table has NS x NA where NS is the number of possible states, NA the number of possible actions
# Element i, j represents the reward for the jth action taken at the ith states
# The number of states, in this specific example, is given by all possible positions in the lake, that is 4x4 = 16
Q = np.zeros((env.observation_space.n, env.action_space.n))

In [10]:
# Set learning parameters
lr = 0.8 # learning rate
y = 0.95 # discount factor for future possible rewards

# Number of FULL RUNS of the game
num_episodes = 2000

In [16]:
# Create list to contain total rewards and steps per episode
rList = list()

In [17]:
for i in range(num_episodes):
    
    # Initialization
    s = env.reset() # reset the game. s is the state (in this case, the initial one)
    rAll = 0
    d = False
    j = 0
    
    # The Q-Table learning algorithm
    while j < 99:
        j += 1
        
        # Choose an action by greedily (with noise) picking from Q table 
        a = np.argmax(Q[s, :] + np.random.randn(1, env.action_space.n)*(1.0/(i+1)))
        
        # Get new state and reward from environment
        s1, r, d, _ = env.step(a)
        
        # Update Q-table
        Q[s, a] = Q[s, a] + lr * (r + y*np.max(Q[s1, :]) - Q[s, a])
        
        rAll += r
        s = s1
        
        if d == True:
            break;
            
    rList.append(rAll)

In [18]:
print("Score over time: " +  str(sum(rList)/num_episodes))

Score over time: 0.5555


In [19]:
print("Final Q-Table values:")
print(Q)

Final Q-Table values:
[[5.63268372e-03 3.64510085e-03 2.02619665e-01 3.73829262e-03]
 [1.68449911e-04 1.19712932e-03 3.49793031e-03 1.01781600e-01]
 [1.19090346e-03 1.52830655e-03 1.03781887e-03 4.52994473e-02]
 [7.40634408e-04 6.21576544e-04 3.29059242e-04 3.45818983e-02]
 [3.65596727e-01 6.87201221e-04 7.00902279e-04 3.10110101e-03]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [6.98519779e-02 1.75029707e-06 4.85218564e-05 2.53041929e-05]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [3.31988999e-03 1.45406152e-03 2.93965118e-04 2.60922349e-01]
 [7.48278331e-04 6.37288518e-01 1.45423626e-03 3.12537899e-04]
 [9.13279285e-01 3.33543133e-04 1.18214033e-03 6.68039655e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 6.08815563e-01 0.00000000e+00]
 [0.00000000e+00 9.99414440e-01 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0

In [21]:
import sys
import time

a = 0  
for x in range (0,3):  
    a = a + 1  
    b = ("Loading" + "." * a)
    # \r prints a carriage return first, so `b` is printed on top of the previous line.
    sys.stdout.write('\r'+b)
    time.sleep(0.5)
print (a)

Loading...3
