# Frozen Lake on Gym
Gym allows you to exploit pre-built environments, including the Frozen Lake one. If ```is_slippery``` is set to ```True```, the agent will have a 1/3 probability to move left, up or down, regardless of the action chosen.

You can render the environment by passing to the ```gym.make``` function the ```render_mode='human'``` argument. If several episodes need to be run, you can avoid rendering to speed up operations.

In [None]:
import gym
import numpy as np
import random
from gym.envs.toy_text.frozen_lake import generate_random_map

## Random Agent
We start with random actions.

In [None]:
# create Frozen Lake environment
env = gym.make('FrozenLake-v1', is_slippery=True, render_mode='human')

num_episodes = 10
goals_reached = 0

for episode in range(num_episodes):
    env.reset()
    terminated = False
    print(f'Episode: {episode + 1}/{num_episodes}')

    while not terminated:
        
        # sample a random action from the list of available actions
        action = env.action_space.sample()

        # perform this action on the environment
        obs, reward, terminated, truncated, info = env.step(action)
        
        if terminated and obs == 15:
            '''
            The observation is a value representing the agent's current position as
            current_row * nrows + current_col (where both the row and col start at 0).
            For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.
            '''
            
            print('Success!')
            goals_reached += 1
            
# end this instance of the environment
env.close()
print(f'\nSuccess rate:\t{goals_reached / num_episodes * 100}%')

## Keyboard controlled Agent
Now let's try ourselves to move the agent and see if we can do better than the random agent.

WASD keyboard keys should be used, remember to click buttons inside the pygame window, not here!

In [None]:
# create Frozen Lake environment
env = gym.make('FrozenLake-v1', is_slippery=True, render_mode='human')

num_episodes = 10
goals_reached = 0

for episode in range(num_episodes):
    env.reset()
    terminated = False
    print(f'Episode: {episode + 1}/{num_episodes}')

    final_obs = env.simulate()     
        
    if final_obs == 15:
        '''
        The observation is a value representing the agent's current position as
        current_row * nrows + current_col (where both the row and col start at 0).
        For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.
        '''
        print('Success!')
        goals_reached += 1
            
# end this instance of the environment
env.close()
print(f'\nSuccess rate:\t{goals_reached / num_episodes * 100}%')

## Heuristic Agent
Next step is trying to develop a heuristic algorithm to solve the Frozen Lake environment. Here one is proposed, but you can also think about another one on your own 😀

In the Frozen Lake environment the observation is calculated as follows: ```current_row * nrows + current_col```.

We can therefore build a table that represents the observation value for each possible cell:
```
0   1   2   3
4   5   6   7
8   9   10  11
12  13  14  15
```
Where:

Starting point: ```0```

Goal: ```15```

Holes: ```5, 7, 11, 12```

We can create a lookup table that maps each possible observation to an action that let the Agent avoid the holes as much as possible:
```
▶   ▶   ▼   ◀
▼   ✖   ▼   ✖
▶   ▼   ▼   ✖
✖   ▶   ▶   ●      
```


In [None]:
# create Frozen Lake environment
env = gym.make('FrozenLake-v1', is_slippery=True, render_mode='human')

actions = [2, 2, 1, 0, 
           1, -1, 1, -1, 
           2, 1, 1, -1, 
           -1, 2, 2, -1]

num_episodes = 10
goals_reached = 0

for episode in range(num_episodes):
    env.reset()
    terminated = False
    obs = actions[0]
    print(f'Episode: {episode + 1}/{num_episodes}')

    while not terminated:
        
        # choose action basing on current observation
        action = env.action_space.sample()

        # perform this action on the environment
        obs, reward, terminated, truncated, info = env.step(action)
        
        if terminated and obs == 15:
            '''
            The observation is a value representing the agent's current position as
            current_row * nrows + current_col (where both the row and col start at 0).
            For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.
            '''
            
            print('Success!')
            goals_reached += 1
            
# end this instance of the environment
env.close()
print(f'\nSuccess rate:\t{goals_reached / num_episodes * 100}%')

You can try to change the lookup table and run for 100 episodes to see if you are able to achieve better results

## Let's increase the difficulty!
What if we increase the map size to 8x8? Are you able to build another heuristic agent?
```
■   ■   ■   ■   ■   ■   ■   ■
■   ■   ■   ■   ■   ■   ■   ■
■   ■   ■       ■   ■   ■   ■  
■   ■   ■   ■   ■       ■   ■
■   ■   ■       ■   ■   ■   ■
■           ■   ■   ■       ■
■       ■   ■       ■       ■
■   ■   ■       ■   ■   ■   ●
```

In [None]:
# create Frozen Lake environment
env = gym.make('FrozenLake-v1', is_slippery=True, map_name='8x8', render_mode='human')

'''
TODO
actions = ???
'''

num_episodes = 10
goals_reached = 0

for episode in range(num_episodes):
    env.reset()
    terminated = False
    obs = actions[0]
    print(f'Episode: {episode + 1}/{num_episodes}')

    while not terminated:
        
        # choose action basing on current observation
        action = actions[obs]

        # perform this action on the environment
        obs, reward, terminated, truncated, info = env.step(action)
        
        if terminated and obs == 15:
            '''
            The observation is a value representing the agent's current position as
            current_row * nrows + current_col (where both the row and col start at 0).
            For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.
            '''
            
            print('Success!')
            goals_reached += 1
            
# end this instance of the environment
env.close()
print(f'\nSuccess rate:\t{goals_reached / num_episodes * 100}%')

## If the map is random?
As you see, any time the map changes, we have to rebuild our lookup table. 

- Is this a reliable approach?

- How do we deal with random maps that change at each episode? 

- If the problem complexity increases, would we be able to create a valid heuristic policy?

In [None]:
# create Frozen Lake environment
env = gym.make('FrozenLake-v1', desc=generate_random_map(size=8), is_slippery=True, render_mode='human')

num_episodes = 10
goals_reached = 0

for episode in range(num_episodes):
    env.reset()
    terminated = False
    print(f'Episode: {episode + 1}/{num_episodes}')

    while not terminated:
        
        # sample a random action from the list of available actions
        action = env.action_space.sample()

        # perform this action on the environment
        obs, reward, terminated, truncated, info = env.step(action)
        
        if terminated and obs == 15:
            '''
            The observation is a value representing the agent's current position as
            current_row * nrows + current_col (where both the row and col start at 0).
            For example, the goal position in the 4x4 map can be calculated as follows: 3 * 4 + 3 = 15.
            '''
            
            print('Success!')
            goals_reached += 1
            
# end this instance of the environment
env.close()
print(f'\nSuccess rate:\t{goals_reached / num_episodes * 100}%')