<h1 align = 'center'>Guessing Games</h1>
<h3 align = 'center'>machine learning, one step at a time</h3>
<h3 align = 'center'>Step 7. A Random Walk Through a Maze</h3>

#### 7. A random walk through a maze

Let's take a walk through a 4x4 maze, starting with (1) in the upper left corner:

In [1]:
from maze import Maze

maze = Maze()
print(maze)

         ...  ...  ...  +++ 
enter->  (1)  ...  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 



We can move N,S,E,W by calling the __step()__ function:

In [2]:
maze.step(Maze.E) # take one step to the East
print(maze)

         ...  ...  ...  +++ 
enter->  (1)  (2)  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 



Let's walk through the maze by taking the correct sequence of steps...

In [3]:
maze.step(Maze.E) # (3) take another step to the East
maze.step(Maze.S) # (4) and then one step to the South
maze.step(Maze.E) # then (5) East, (6) South, (7) East
maze.step(Maze.S)
maze.step(Maze.S)
print(maze)

         ...  ...  ...  +++ 
enter->  (1)  (2)  (3)  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  (4)  (5) 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  (6) 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  (7)  <-exit
         +++  +++  ...  ... 



That's easy enough, because we know it's a maze. We know the rules (stay in bounds, don't step on blocked spaces), and we are smart enough to solve the question: what is the only path through the maze?

_But here's the amazing thing: none of that context is really necessary. We don't need to know it's a maze. We don't need to know explicit rules. We don't even need to know we are seeking a path to an exit. We just need to know (a) what state are we in? (b) what actions can we take? and (c) when we take an action in a given state, did anthing good or bad happen? If we know those things, we could learn to traverse the maze by trial and error._

The maze tells us those things. When you take a step, the maze provides feedback. How does that work?

Let's start again by calling __reset()__, and noticing that __reset()__ returns an __observation__, which is just our (x,y) coordinates:

In [4]:
observation = maze.reset()           # store the observation returned by the maze
print(maze)
print('observation =', observation)

         ...  ...  ...  +++ 
enter->  (1)  ...  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

observation = [0 0]


That makes sense. Our initial coordinates are (0,0).

Calling step() returns more feedback... we receive an updated __observation__, along with a __reward__ and a __done__ flag.

Let's take a look at those:

In [5]:
observation, reward, done = maze.step(Maze.E)
print(maze)
print('observation =', observation, 'reward =', reward, ' done =', done)

         ...  ...  ...  +++ 
enter->  (1)  (2)  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

observation = [0 1] reward = 0  done = False


Those values mean:
<pre>
observation = [0 1]  # the player is now located at (0,1), having moved east from (0,0)
reward = 0           # that move resulted in neither success nor failure (so the reward is zero)
done = False         # and the attempt to traverse the maze isn't over yet
</pre>

That's all the feedback we need to guess at what __actions__ we should take. But what if we are so unaware of the details of our task that we don't even know what __actions__ are possible?

The maze has a convenient function called __sample()__, which returns a random selection from the available actions. In this case, there are four: move N,S,E, or W. That information alone allows us to create a really silly, inefficient algorithm to conquer the maze, just by taking random __actions__.

What happens if we ignore the feedback, and just use __sample()__ to take a random walk through the maze?

Here is code that takes one random walk, showing each step along the way. Run it several times. How far can you go?

In [9]:
# take random walks through a 4x4 maze until one attempt succeeds in reaching the exit
maze = Maze()
done = False
while not done:
    observation, reward, done = maze.step(maze.sample())  # sample() provides a random action (N,S,E,W)
    print(maze,observation,reward,done)

         ...  ...  ...  +++ 
enter->  (1)  (2)  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

         ...  ...  ...  +++ 
enter->  (3)  (2)  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++  ...  ... 
         +++  +++  ...  ...  <-exit
         +++  +++  ...  ... 

         ...  ...  ...  +++ 
enter->  (3)  (4)  ...  +++ 
         ...  ...  ...  +++ 

         ...  +++  ...  ... 
         ...  +++  ...  ... 
         ...  +++  ...  ... 

         ...  ...  +++  ... 
         ...  ...  +++  ... 
         ...  ...  +++  ... 

         +++  +++ 

Here is code that takes a random walk until you reach the exit:

In [12]:
# take random walks through a 4x4 maze until one attempt succeeds in reaching the exit
maze = Maze()
attempts = 0             # keep track of attempts
completions = 0          # keep track of completions
while completions < 3:  # stop upon first complete trip through the maze
    attempts += 1
    observation = maze.reset()
    done = False
    while not done:
        observation, reward, done = maze.step(maze.sample())  # sample() provides a random action (N,S,E,W)
    if observation[0] == 3 and observation[1] == 3:           # did the player reach the exit?
        completions += 1
print('attempts =',attempts, 'completions =', completions, ' rate =', completions/attempts)

attempts = 10400 completions = 3  rate = 0.0002884615384615385


The random walk might take anywhere from 100 to 10,000 attempts to exit the maze... run it a few more times and watch the results.

Just for comparison, let's go back to cheating, but this time let's keep an eye on the __observation__, __reward__, and __done__ return values. Here is the series of moves, including the feedback:

In [13]:
print('initial position =', maze.reset())
print('observation, reward, done =', maze.step(Maze.E))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.E))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.S))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.E))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.S))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.S))  # move to the exit

initial position = [0 0]
observation, reward, done = (array([0, 1]), 0, False)
observation, reward, done = (array([0, 2]), 0, False)
observation, reward, done = (array([1, 2]), 0, False)
observation, reward, done = (array([1, 3]), 0, False)
observation, reward, done = (array([2, 3]), 0, False)
observation, reward, done = (array([3, 3]), 1, True)


Way more efficient! ...except we cheated.

There must be a better way... a way to use trial and error to learn the path through the maze, without cheating.

_(there are other better ways, too, like a recursive search... but that's not part of this lesson)_

For this lesson, a __machine learning__ approach creates an __agent__ to discover that sequence on its own, with as little supervision as possible. In fact, we should be able to create an __agent__ that has nothing to do with the specific task of solving a maze... it should just learn from its mistakes, regardless of the context, by receiving __observations__ and __rewards__. That's the next lesson.

<hr>
***Excercises***<p>

Alter the random walk to start at different positions in the maze. What is the relationship between the starting position and the average number of attempts required to reach the exit?

In [None]:
# take random walks through a 4x4 maze until one attempt succeeds in reaching the exit
maze = Maze()
attempts = 0             
completions = 0          
while completions == 0:  
    attempts += 1
    observation = maze.reset()
    
    # once the maze is reset, you can move the player
    # before starting the random walk by inserting some
    # lines here.
      
    done = False
    while not done:
        observation, reward, done = maze.step(maze.sample())  
    if observation[0] == 3 and observation[1] == 3:           
        completions += 1
print('attempts =',attempts, 'completions =', completions, ' rate =', completions/attempts)