<h1 align = 'center'>Guessing Games</h1>
<h3 align = 'center'>machine learning, one step at a time</h3>
<h3 align = 'center'>Step 8. A Random Walk While Paying Attention</h3>

**8. A random walk while paying attention.**

When we took our random walk, we ignored everything that happened except for one event: stumbling accross the exit. What if we paid attention, and learned from our mistakes?

Let's start thinking about the maze in terms of machine learning: _the art of accumulating knowledge by learning from mistakes_.

In machine learning terms, the maze is an __environment__. An __environment__ presents a problem, and gives feedback as we seek a solution.

For any given problem, a basic machine learning __environment__ allows you to:
- start over (and be informed of your starting point)
- examine examples of available actions (...can I move? jump? push a button? play a card?)
- take one step (and be informed of your progress, for better or worse) 
- be informed of the end of your attempt to solve the problem (ie, __game over__)

(you can find lots of environments organizes along these line, like at the Open AI Gym: https://gym.openai.com/envs/#classic_control)

Back to the __Maze__, which provides these functions:<br>
> __reset()__ lets you start over.<br>
> __sample()__ provides an example of an action (move north, move south...)<br>
> __step(action)__ lets you take an action and get feedback... move east!, and let me know how that goes.<br>

Let's try those one at a time:

In [None]:
from maze import *

m = Maze()
m.reset()

q = np.zeros([16,4])

for n in range(100000):
    m.reset()
    done = False
    while not done:
        if np.random.random() < 0.3:
            action = m.sample()
        else:
            action = np.argmax(position[0] * 4 + position[1])
        position, reward, done = m.step(action)
        if min(position) >= 0 and max(position) <= 3:
            q[position[0] * 4 + position[1], action] += reward
    
print(q)


In other words, __you are here: 0,0__.

Great.

The next reasonable question might be: _what can I do from here?_

Or, in typical machine learning terms: what __actions__ can I take?

For the __Maze__, an __action__ is any entry from the environment's collection of possible actions (that's called the __action space__, which includes four actions: move north, move south, move east, or move west).

But let's say we don't know it's a maze, so we are not aware of the possibilities of moving north, south, east, or west. How do we discover what __actions__ we can take?



In [None]:
import numpy as np
a = [1,2,3,4,5,6,5,4,3]
a
np.argmax(a)