# Playing Frozen Lake with Q-learning

Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you'll fall into the freezing water. 

At this time, there's an international frisbee shortage, so it's absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won't always move in the direction you intend.

The surface is described using a grid like the following:

SFFF       
FHFH       
FFFH       
HFFG

S: starting point, safe  
F: frozen surface, safe  
H: hole, fall to your doom  
G: goal, where the frisbee is located

The episode ends when you reach the goal or fall in a hole.  
You receive a reward of 1 if you reach the goal, and 0 otherwise.

https://gym.openai.com/envs/FrozenLake-v0/

In [None]:
import numpy as np
import gym
import random
import time
from IPython.display import clear_output

In [None]:
env = gym.make("FrozenLake-v1")

In [None]:
num_actions = env.action_space.n
num_states = env.observation_space.n

q_table = np.zeros((num_states, num_actions))
print(f'q_table gas shape: {q_table.shape} ')

## Hyper-parameters for training

In [None]:
num_episodes = 10000
max_steps_per_episode = 100

lr = 0.05
discount_rate = 0.99

exploration_rate = 1
exploration_decay_rate = 0.00005

### Helper functions

In [None]:
def compute_new_Q_value( newState, reward):
    return reward + discount_rate*np.max(q_table[newState,:])

def render():
    clear_output(wait=True)
    env.render()

In [None]:
newState, reward, done, _, _=env.step(0)
newState

## Training

In [None]:
from curses import curs_set
from socket import getfqdn


ep_rewards=[]
ep1000_rewards=[]
for ep in range(num_episodes):
    rewards=0. 
    exploration_rate=1. 
    curState=env.reset()
    curState=curState[0]
    action=-1
    for step in range(max_steps_per_episode):
        exploration_rate-=(exploration_decay_rate*step*ep )
        dice=np.random.uniform(0, 1)
        if dice < exploration_rate:
            action = env.action_space.sample()
        else:
            action=np.argmax(q_table[curState, :])
        
        newState, reward, done, _,_ = env.step(action)
        newQVal=compute_new_Q_value( newState, reward)
        q_table[curState, action]=(1-lr)*q_table[curState, action] + (lr*newQVal        )
        rewards+=reward 
        curState=newState
        if done or step==max_steps_per_episode:
            ep_rewards.append(rewards)
            if (ep+1) % 1000 == 0:
                ep1000_rewards.append(np.sum(ep_rewards))
                print(f'ep {ep+1}: {ep1000_rewards[-1] } ')
                ep_rewards.clear()
            break

In [None]:
import matplotlib.pyplot as plt 

plt.plot(ep1000_rewards)

## Let's see how the game plays out
### Now we run through a game with actions defined by the Q table we trained. It should find the optimal path.

In [None]:
env.close()
env = gym.make("FrozenLake-v1", render_mode='human')

In [None]:
curState=env.reset()
curState=curState[0]
for step in range(max_steps_per_episode):
    render()
    time.sleep(1)
    # Get best action from q-table for current state 
    action=np.argmax(q_table[curState, : ])
    newState, reward, done, _, _ = env.step(action)
    curState=newState
    if done or (step+1)==max_steps_per_episode:
        render()        
        if reward==1:
            print(f'\n you won after {step} steps')
        else:
            print(f'you lost after {step} steps')
        break    

In [None]:
# close the env after playing
env.close()