<h1 align = 'center'>Guessing Games</h1>
<h3 align = 'center'>machine learning, one step at a time</h3>
<h3 align = 'center'>Step 8. A Random Walk While Paying Attention</h3>

**8. A random walk while paying attention.**

When we took our random walk, we ignored everything that happened except for one event: stumbling accross the exit. What if we paid attention, and learned from our mistakes?

For starters, let's make a __RandomWalker__ class that gives a clear measure of how many attempts are required before it finds the exit. It's not a complicated class, it just has a handful of variables that keep track of its success rate, and a __str__ function to make a pretty printout.

In [1]:
from maze import *

class RandomWalker():
    def __init__(self, empty_maze = False): 
        self.reset(empty_maze)
        
    def reset(self, empty_maze = False):
        self.empty_maze = empty_maze
        self.max_attempts = 0       # ====================== #
        self.min_attempts = 2**18   # Create variables to    #
        self.total_steps = 0        # keep track of success  #
        self.total_attempts = 0     # rate across attempts.  #
        self.total_completions = 0  # ====================== #

    def go_for_a_walk(self, goal=1):
        self.reset()
        maze = Maze(self.empty_maze)
        attempts = 0
        while self.total_completions < goal:       
            done = False
            maze.reset()
            while not done:
                observation, reward, done = maze.step(maze.sample())
            self.total_attempts += 1                                   # ============================ #
            attempts += 1                                              #                              #
            if reward > 0:                                             # This is just a bunch of      #
                self.total_completions += 1                            # accounting... no logic here, #
                self.max_attempts = max(self.max_attempts, attempts)   # move along, move along.      #
                self.min_attempts = min(self.min_attempts, attempts)   #                              #
                attempts = 0                                           # ============================ #
                      
    def __str__(self):
        out =  '\nAttempts     = ' + str(self.total_attempts)
        out += '\nCompletions  = ' + str(self.total_completions)
        out += '\nSuccess rate = ' + str(self.total_completions / self.total_attempts)
        out += '\nAvg Attempts = ' + str(self.total_attempts / self.total_completions)
        out += '\nMax Attempts = ' + str(self.max_attempts)
        out += '\nMin Attempts = ' + str(self.min_attempts)
        return out

All that code just to track some simple statistics... like: what happens if we solve the maze randomly once, versus 100 times?

In [7]:
r = RandomWalker(empty_maze=True)
r.go_for_a_walk()
print(r)

r.go_for_a_walk(goal=100)
print(r)


Attempts     = 160
Completions  = 1
Success rate = 0.00625
Avg Attempts = 160.0
Max Attempts = 160
Min Attempts = 160

Attempts     = 27893
Completions  = 100
Success rate = 0.0035851288853834297
Avg Attempts = 278.93
Max Attempts = 1599
Min Attempts = 3


In machine learning terms, the maze is an __environment__. When the environment is reset(), it provides an initial __observation__, which, in the case of the maze, describes the positon of the player, initially: (0,0).

To advance the __environment__ one time step, call step() and provide an __action__; the __environment__ will respond with 3-part feedback: an __observation__, a __reward__, and a __done__ flag.

The __action__ is any entry from the environment's collection of possible actions (the __action space__). In the case of the maze, the __action space__ includes four actions: move north, move south, move east, or move west. Each __action__ is represented by a constant: Maze.N, Maze.S, Maze.E, Maze.W.

The __reward__ indicates whether the outcome was positive (+1 for reaching the exit), negative (-1 for moving on to a blocked space or moving out of bounds), or zero for moving to an open space. Note that there is no reward for making progress... the only positive reward is acheived by completing the trip and exiting the maze.

The __done__ flag indicates that the attempt is traverse the maze is complete, either because the player succeeded in reaching the exit or failed by either moving onto a blocked space or by moving out of bounds.


In [3]:
from maze import *

class QPlayer():
    
    EXPLORE = 0.01
    
    N,S,E,W = 0,1,2,3
    
    def __init__(self):
        super().__init__()
        self.q_table = np.zeros([4*4,4])
        
    def run(self, environment, explore=EXPLORE):
        observation = environment.reset()
        done = False
        complete = 0
        while not done:
            state = observation[0] * 4 + observation[1]
            action = np.argmax(self.q_table[state])
            if action == 0 or np.random.random() < explore:
                action = environment.sample()
            observation, reward, done = environment.step(action)
            if done:
                self.q_table[state][action] = reward
                return reward
            else:
                future_state = observation[0] * 4 + observation[1]
                self.q_table[state][action] += reward + np.amax(self.q_table[future_state])

In [4]:
q = QPlayer()
e = Maze()
complete = 0
for n in range(100000):
    complete += 1 if q.run(e) > 0 else 0
    if np.sum(q.q_table) > 100:
        print(q.q_table)
        break

[[ 0.  0. 21.  0.]
 [ 0. -1. 35.  0.]
 [ 0. 35. -1.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0. -1.  0.]
 [ 0.  0.  0.  0.]
 [ 0. -1. 21. -1.]
 [-1.  7.  0.  0.]
 [ 0. -1.  0.  0.]
 [-1. -1. -1.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]


In [5]:
q.run(e, explore=0.0)
print(e)

         ...  ...  ...  +++ 
enter->  (1)  (2)  (3)  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  (4)  (5) 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  (6) 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  (7)  <-exit
         +++  +++  ...  ... 



In [6]:
q.run(e, explore=1.0)
print(e)

         ...  ...  ...  +++ 
enter->  (3)  ...  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         (4)  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

