# FROZENLAKE - 1

> We'll be implementing a classical dynamic programming algorithms and q-learning to figure out the best action to take in a toy problem environment called slippery frozen lake 👩 

# Slippery Frozen Lake Environment

[Modified version of the description from Open AI Gym](https://gym.openai.com/envs/FrozenLake-v0/)

> The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into holes with exploding 💥 bombs 💥 and die (I know right!) . Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile.

> You want to get to the target 🎯. The lake is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water (where there is a BOMB 💥and you will EXPLODE 💥and DIE 💥). You navigate across the lake and go to the target 🎯. **However, the ice is slippery, so you won't always move in the direction you intend.**

> The episode ends when you reach the goal or fall in the hole with bomb and die!. You receive a reward of **+1** if you reach the goal 🎯, and **zero** otherwise.

## STATES

```
S - 👩🏼 (S: starting point, safe, REWARD = 0)
F - ▫️ (F: frozen surface, safe, REWARD = 0)
H - 💥 (H: hole with bomb, fall to your doom, terminal state, REWARD = 0)
G - 🎯 (G: goal, dartboard target, safe, terminal state, REWARD = +1) 


+-----------------+-----------+-------+-------------------+--------+-------+
| State Condition | Character | Safe? | will episode end? | Reward | Icon  |
+-----------------+-----------+-------+-------------------+--------+-------+
| Starting point  | 'S'       | Yes   | No                | 0      | 👩    |
| You start here  |           |       |                   |        |       |
+-----------------+-----------+-------+-------------------+--------+-------+
| Frozen surface  | 'F'       | Yes   | No                | 0      | ▫️    |
+-----------------+-----------+-------+-------------------+--------+-------+
| Hole with bomb  | 'H'       | No    | Yes               | 0      | 💥    |
| Fall and die    |           |       |                   |        |       |
+-----------------+-----------+-------+-------------------+--------+-------+
| Goal / Target   | 'G'       | Yes   | Yes               | +1     | 🎯    |
| You should end  |           |       |                   |        |       |
| up here.        |           |       |                   |        |       |
+-----------------+-----------+-------+-------------------+--------+-------+
```

## 4x4 Grid World
```
👩▫️▫️▫️ | SFFF 
▫️💥▫️💥 | FHFH 
▫️▫️▫️💥 | FFFH
💥▫️▫️🎯 | HFFG
```

In [1]:
import numpy as np
from frozen_lake import SlipperyFrozenLake, FrozenLakeState, a_few_tests
from pprint import pprint

In [2]:
a_few_tests()

PASSED! :) 


In [3]:
frozen_lake_map = [
    ['S', 'F', 'F', 'F'], 
    ['F', 'H', 'F', 'H'],
    ['F', 'F', 'F', 'H'],
    ['H', 'F', 'F', 'G']]

lake_environment = SlipperyFrozenLake(frozen_lake_map)

# Explore Frozen Lake Environment

## chars ( S, F, H, G) - state condition

```
  0   1   2   3
  +---+---+---+---+
0 | S | F | F | F |
  +---+---+---+---+
1 | F | H | F | H |
  +---+---+---+---+
2 | F | F | F | H |
  +---+---+---+---+
3 | H | F | F | G |
  +---+---+---+---+
```

## icons ( 👩 ▫️  💥🎯)

```

  0   1   2   3
  +---+---+---+---+
0 |👩 |▫️ |▫️ |▫️|
  +---+---+---+---+
1 |▫️ |💥 |▫️ |💥|
  +---+---+---+---+
2 |▫️ |▫️ |▫️ |💥|
  +---+---+---+---+
3 |💥 |▫️ |▫️ |🎯|
  +---+---+---+---+

```

## terminal ( y / n )

```
  0   1   2   3
  +---+---+---+---+
0 | n | n | n | n |
  +---+---+---+---+
1 | n | y | n | y |
  +---+---+---+---+
2 | n | n | n | y |
  +---+---+---+---+
3 | y | n | n | y |
  +---+---+---+---+
```

## state ID 
```
  0   1   2   3
  +---+---+---+---+
0 | 0 | 1 | 2 | 3 |
  +---+---+---+---+
1 | 4 | 5 | 6 | 7 |
  +---+---+---+---+
2 | 8 | 9 |10 |11 |
  +---+---+---+---+
3 |12 |13 |14 |15 |
  +---+---+---+---+
```

## rewards

```
  0   1   2   3
  +---+---+---+---+
0 | 0 | 0 | 0 | 0 |
  +---+---+---+---+
1 | 0 | 0 | 0 | 0 |
  +---+---+---+---+
2 | 0 | 0 | 0 | 0 |
  +---+---+---+---+
3 | 0 | 0 | 0 |+1 |
  +---+---+---+---+
```

In [4]:
print()
print("-->A 4x4 Grid World")
pprint(lake_environment.map)

print()
print("-->Number of states:", lake_environment.number_of_states)
print("-->And potential actions to take:", lake_environment.actions)

print()
print("-->With respective states numbered as follows:")
print()
for r in lake_environment.n_map:
    for c in r: 
        print('{:4d}'.format(c), end="")
    print()

print()
print("-->Total number of states:", lake_environment.number_of_states)
print()

print("--------------------")
print("Possible Conditions for each state")
print("--------------------")

for c in ['S', 'H', 'F', 'G']:
    print()
    print('state condition (char):', c)
    print("-->Reward:", lake_environment.reward[c])
    print("-->Is terminal?:", lake_environment.is_terminal[c])
    print("-->Icon:", lake_environment.icons[c])
    print()



-->A 4x4 Grid World
[['S', 'F', 'F', 'F'],
 ['F', 'H', 'F', 'H'],
 ['F', 'F', 'F', 'H'],
 ['H', 'F', 'F', 'G']]

-->Number of states: 16
-->And potential actions to take: ['left', 'down', 'right', 'up']

-->With respective states numbered as follows:

   0   1   2   3
   4   5   6   7
   8   9  10  11
  12  13  14  15

-->Total number of states: 16

--------------------
Possible Conditions for each state
--------------------

state condition (char): S
-->Reward: 0.0
-->Is terminal?: False
-->Icon: 👩


state condition (char): H
-->Reward: 0.0
-->Is terminal?: True
-->Icon: 💥


state condition (char): F
-->Reward: 0.0
-->Is terminal?: False
-->Icon: ▫️


state condition (char): G
-->Reward: 1.0
-->Is terminal?: True
-->Icon: 🎯



# Transistion probability and one-step dynamics

Recall that the idea of the frozen lake environment is that the surface is slippery, therefore the agent can slide to a location other than the one it wanted.

Dynamic programming assumes that the agent has full knowledge of the Markov Decision Process (MDP). We have the full knowledge of each one step dynamic of each state. 

For example you can run the following: 

```
possibilities = lake_environment.get_possibilities(state_id, action)
```

You get a `list` or `array` of possible next states given you take a particular `action` (`left`, `right`, `up`, `down`) while you are in a particlar state identitfied by `state_id` which is an `int`

This is a `list` of  `FrozenLakeState` objects which each contains:
- `state_id` (an `int`) - The unique identification number the possible next state
- `probability` (a `float`)- The  probability of transitioning to this particular next state given you took the particular action coming from a state identified by the `state_id`
- `reward`() - The corresponding reward of landing to this next state from your current state. 
- `is_terminal` ( a boolean: `True` or `False`)- If this state is a terminal state 
- Among other useful formation

> DEFINITION: the **Transition Probability** of a state `s` (with corresponding current `state_id`) at time `t`, action `a` at timestep `t` and possible state `s'` (with corresponding possible `state_id`) is the probability that the next state at timestep `t+1` is the possible_state `s'` given that at state `s` you do action `a`. 


## Formally,  

```
*

transition(s, a, s') = probability[state(t+1) = s'| state(t) = s, action(t) = a]

*
```

In [5]:
_ = lake_environment.get_possibilities(
    state_id=14, action='down', debug=True)

***
From state ID:  14  do action:  down !
***

# 1
--> next state ID:  14
--> reward: 0.0
--> probability:  0.3333333333333333
--> is terminal:  False


# 2
--> next state ID:  13
--> reward: 0.0
--> probability:  0.3333333333333333
--> is terminal:  False


# 3
--> next state ID:  15
--> reward: 1.0
--> probability:  0.3333333333333333
--> is terminal:  True



In [6]:
possibilities = lake_environment.get_possibilities(
    state_id=0, action='up', debug=False)

for i, state_info in enumerate(possibilities):
    
    print("--")
    print("#", i + 1)
    print("--")

    print(state_info)

--
# 1
--

*FrozenLakeState (TYPE) 
--> State ID: 0
--> Reward: 0.0
--> Not a terminal state. 
--> (Transition) Probability (given state ID and action): 0.3333333333333333
--> Icon: 👩
--> Character representation of state condition: 'S'
--> Location: (0,0) 


--
# 2
--

*FrozenLakeState (TYPE) 
--> State ID: 0
--> Reward: 0.0
--> Not a terminal state. 
--> (Transition) Probability (given state ID and action): 0.3333333333333333
--> Icon: 👩
--> Character representation of state condition: 'S'
--> Location: (0,0) 


--
# 3
--

*FrozenLakeState (TYPE) 
--> State ID: 1
--> Reward: 0.0
--> Not a terminal state. 
--> (Transition) Probability (given state ID and action): 0.3333333333333333
--> Icon: ▫️
--> Character representation of state condition: 'F'
--> Location: (0,1) 




In [7]:
possibilities = lake_environment.get_possibilities(
    state_id=14, action='right', debug=False)

print("You are in state id: 14, and you do action: right")
for i, state_info in enumerate(possibilities):
    
    print("--")
    print("#", i + 1)
    print("--")

    print("state id of possible state we end up in: ", state_info.n)
    print("reward: ", state_info.reward)
    print("this is a terminal state?", state_info.is_terminal)
    print("probability of transitioning to this state", end="")
    print("coming from state number 14 and going right: ", state_info.probability)

You are in state id: 14, and you do action: right
--
# 1
--
state id of possible state we end up in:  15
reward:  1.0
this is a terminal state? True
probability of transitioning to this statecoming from state number 14 and going right:  0.3333333333333333
--
# 2
--
state id of possible state we end up in:  10
reward:  0.0
this is a terminal state? False
probability of transitioning to this statecoming from state number 14 and going right:  0.3333333333333333
--
# 3
--
state id of possible state we end up in:  14
reward:  0.0
this is a terminal state? False
probability of transitioning to this statecoming from state number 14 and going right:  0.3333333333333333


In [8]:
_ = lake_environment.get_possibilities(state_id=7, action='left', debug=True)

***
From state ID:  7  do action:  left !
***

# 1
--> next state ID:  7
--> reward: 0.0
--> probability:  1.0
--> is terminal:  True

