<h1 align = 'center'>Guessing Games</h1>
<h3 align = 'center'>machine learning, one step at a time</h3>
<h3 align = 'center'>Step 7. A Random Walk Through a Maze</h3>

#### 4. A random walk through a maze

Let's take a walk through a 4x4 maze, starting with (1) in the upper left corner:

In [None]:
from maze import Maze

maze = Maze()
print(maze)

We can move N,S,E,W by calling the step() function:

In [None]:
maze.step(Maze.E) # take one step to the East
print(maze)

In [None]:
maze.step(Maze.E) # (3) take another step to the East
maze.step(Maze.S) # (4) and then one step to the South
maze.step(Maze.E) # then (5) East, (6) South, (7) East
maze.step(Maze.S)
maze.step(Maze.S)
print(maze)

When you take a step, the maze provides feedback. Let's start again by calling reset(), and print the values returned by reset() and step():

In [None]:
# reset() returns an initial observation, which is the coordinates of the player's position
observation = maze.reset()
print(maze)
print('observation =', observation)

In [None]:
# step() returns an updated observation, along with a reward and the 'done' flag
# an 'observation' is just the player's coordinates, starting at (0,0)
observation, reward, done = maze.step(Maze.E)
print(maze)
print('observation =', observation, 'reward =', reward, ' done =', done)

The maze has a convenient function called sample(), which returns a random selection from the four available actions: move N,S,E, or W. What happens if we ignore the feedback, and just use sample() to take a random walk through the maze?

In [None]:
# take random walks through a 4x4 maze until one attempt succeeds in reaching the exit
maze = Maze()
attempts = 0             # keep track of attempts
completions = 0          # keep track of completions
while completions == 0:  # stop upon first complete trip through the maze
    attempts += 1
    observation = maze.reset()
    done = False
    while not done:
        observation, reward, done = maze.step(maze.sample())  # sample() provides a random action (N,S,E,W)
    if observation[0] == 3 and observation[1] == 3:           # did the player reach the exit?
        completions += 1
print('attempts =',attempts, 'completions =', completions, ' rate =', completions/attempts)

The random walk might take anywhere from 100 to 10,000 attempts to exit the maze... run it a few more times and watch the results.

Maybe it would be better to apply __machine learning__ to exploit the feedback returned by reset() and step().

In machine learning terms, the maze is an __environment__. When the environment is reset(), it provides an initial __observation__, which, in the case of the maze, describes the positon of the player, initially: (0,0).

To advance the __environment__ one time step, call step() and provide an __action__; the __environment__ will respond with 3-part feedback: an __observation__, a __reward__, and a __done__ flag.

The __action__ is any entry from the environment's collection of possible actions (the __action space__). In the case of the maze, the __action space__ includes four actions: move north, move south, move east, or move west. Each __action__ is represented by a constant: Maze.N, Maze.S, Maze.E, Maze.W.

The __reward__ indicates whether the outcome was positive (+1 for reaching the exit), negative (-1 for moving on to a blocked space or moving out of bounds), or zero for moving to an open space. Note that there is no reward for making progress... the only positive reward is acheived by completing the trip and exiting the maze.

The __done__ flag indicates that the attempt is traverse the maze is complete, either because the player succeeded in reaching the exit or failed by either moving onto a blocked space or by moving out of bounds.

Here are a series of moves, including the feedback:

In [None]:
print('initial position =', maze.reset())
print('observation, reward, done =', maze.step(Maze.E))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.E))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.S))  # move to an adjacent open space
print('observation, reward, done =', maze.step(Maze.S))  # move to a blocked space

That's it! ...that's all our agent needs to know to find a solution to the maze.

The idea is that we can:
- keep track of the player's unique state (that's just the coordinates)
- explore the action space at each state, using trial and error (that's just moving N,S,E, or W)
- remember when an action produces a good or bad result for a given state

If we can do that, and find a way to chain the states together, we can find our way through the maze.

<hr>
***Excercises***<p>

Alter the random walk to start at different positions in the maze. What is the relationship between the starting position and the average number of attempts required to reach the exit?

In [None]:
# take random walks through a 4x4 maze until one attempt succeeds in reaching the exit
maze = Maze()
attempts = 0             
completions = 0          
while completions == 0:  
    attempts += 1
    observation = maze.reset()
    
    # once the maze is reset, you can move the player
    # before starting the random walk by inserting some
    # lines here.
      
    done = False
    while not done:
        observation, reward, done = maze.step(maze.sample())  
    if observation[0] == 3 and observation[1] == 3:           
        completions += 1
print('attempts =',attempts, 'completions =', completions, ' rate =', completions/attempts)