<h1 align = 'center'>Guessing Games</h1>
<h3 align = 'center'>machine learning, one step at a time</h3>
<h3 align = 'center'>Step 8. A Random Walk While Paying Attention</h3>

**8. A random walk while paying attention.**

When we took our random walk, we ignored everything that happened except for one event: stumbling accross the exit. What if we paid attention, and learned from our mistakes?

For example: what if our random walk remembered which initial steps caused an immediate failure, and eliminated those?

For starters, let's make a __RandomWalker__ class that gives a clear measure of how many attempts are required before it finds the exit. It's not a complicated class, it just has a handful of variables that keep track of its success rate, and a __str__ function to make a pretty printout.

In [9]:
from maze import *

class RandomWalker():
    def __init__(self): 
        self.reset()
        
    def reset(self):
        self.max_attempts = 0       # ====================== #
        self.min_attempts = 2**18   # Create variables to    #
        self.total_steps = 0        # keep track of success  #
        self.total_attempts = 0     # rate across attempts.  #
        self.total_completions = 0  # ====================== #

    def go_for_a_walk(self, goal=1):
        self.reset()
        maze = Maze()
        attempts = 0
        while self.total_completions < goal:       
            done = False
            maze.reset()
            while not done:
                observation, reward, done = maze.step(maze.sample())
            self.total_attempts += 1                                   # ============================ #
            attempts += 1                                              #                              #
            if reward > 0:                                             # This is just a bunch of      #
                self.total_completions += 1                            # accounting... no logic here, #
                self.max_attempts = max(self.max_attempts, attempts)   # move along, move along.      #
                self.min_attempts = min(self.min_attempts, attempts)   #                              #
                attempts = 0                                           # ============================ #
                      
    def __str__(self):
        out =  '\nAttempts     = ' + str(self.total_attempts)
        out += '\nCompletions  = ' + str(self.total_completions)
        out += '\nSuccess rate = ' + str(self.total_completions / self.total_attempts)
        out += '\nAvg Attempts = ' + str(self.total_attempts / self.total_completions)
        out += '\nMax Attempts = ' + str(self.max_attempts)
        out += '\nMin Attempts = ' + str(self.min_attempts)
        return out

All that code just to track some simple statistics... like: what happens if we solve the maze randomly once, versus 100 times?

In [10]:
r = RandomWalker()
r.go_for_a_walk()
print(r)

r.go_for_a_walk(goal=100)
print(r)


Attempts     = 2655
Completions  = 1
Success rate = 0.0003766478342749529
Avg Attempts = 2655.0
Max Attempts = 2655
Min Attempts = 2655

Attempts     = 313472
Completions  = 100
Success rate = 0.0003190077582686811
Avg Attempts = 3134.72
Max Attempts = 16231
Min Attempts = 12


In [39]:
from maze import *

class QPlayer():
    
    EXPLORE = 0.01
    
    N,S,E,W = 0,1,2,3
    
    def __init__(self):
        super().__init__()
        self.q_table = np.zeros([4*4,4])
        
    def run(self, environment, explore=EXPLORE):
        observation = environment.reset()
        done = False
        complete = 0
        while not done:
            state = observation[0] * 4 + observation[1]
            action = np.argmax(self.q_table[state])
            if action == 0 or np.random.random() < explore:
                action = environment.sample()
            observation, reward, done = environment.step(action)
            if done:
                self.q_table[state][action] = reward
                return reward
            else:
                future_state = observation[0] * 4 + observation[1]
                self.q_table[state][action] += reward + np.amax(self.q_table[future_state])

In [40]:
q = QPlayer()
e = Maze()
complete = 0
for n in range(100000):
    complete += 1 if q.run(e) > 0 else 0
    if np.sum(q.q_table) > 100:
        print(q.q_table)
        break

[[-1.  0. 21. -1.]
 [-1. -1. 35.  0.]
 [-1. 35. -1.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0. -1. -1.]
 [ 0.  0.  0.  0.]
 [ 0. -1. 21. -1.]
 [-1.  7. -1.  0.]
 [ 0. -1.  0. -1.]
 [-1. -1. -1.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  1. -1. -1.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [41]:
q.run(e, explore=0.0)
print(e)

         ...  ...  ...  +++ 
enter->  (1)  (2)  (3)  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  (4)  (5) 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  (6) 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  (7)  <-exit
         +++  +++  ...  ... 



In [42]:
q.run(e, explore=1.0)
print(e)

         ...  ...  ...  +++ 
enter->  (1)  ...  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

