<a href="https://colab.research.google.com/github/krshnapriy/Q-Learning/blob/main/Frozen_lake.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Q-LEARNING WITH FROZEN LAKE**

In this Notebook, we'll implement an agent <b>that plays Frozen Lake.</b>

![alt text](http://simoninithomas.com/drlc/Qlearning/frozenlake4x4.png)

The goal of this game is <b>to go from the starting state (S) to the goal state (G) by walking only on frozen tiles (F) and avoid holes (H)</b>. However, the ice is slippery, **so you won't always move in the direction you intend (stochastic environment)**


# **Libraries**
First we will be importing all the libraries we'll be using. 

In [2]:
import numpy as np
import gym
import random
import time
from IPython.display import clear_output

# **Creating The Environment**


OpenAI Gym is a library <b> composed of many environments that we can use to train our agents.</b>

We'll be using the environment "FrozenLake-v0". 

In [3]:
env = gym.make("FrozenLake-v0")

## **Creating The Q-Table**

Now, we'll create our Q-table. To know how much rows (states) and columns (actions) we need, we need to calculate the action_size and the state_size.
OpenAI Gym provides us a way to do that: 
**env.action_space.n** and **env.observation_space.n**

In [4]:
action_space_size = env.action_space.n
state_space_size = env.observation_space.n

q_table = np.zeros((state_space_size, action_space_size))

print(q_table)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


## **Initializing Q-Learning Parameters**

In [5]:
num_episodes = 10000
max_steps_per_episode = 100

learning_rate = 0.1
discount_rate = 0.99

exploration_rate = 1
max_exploration_rate = 1
min_exploration_rate = 0.01

exploration_decay_rate = 0.001

# Coding The Q-Learning Algorithm Training Loop

In [6]:
rewards_all_episodes = []

# Q-Learning algorithm
for episode in range(num_episodes):
    # initialize new episode params
    state = env.reset()
    done = False
    rewards_current_episode = 0
    
    for step in range(max_steps_per_episode):
        
        # Exploration -exploitation trade-off
        exploration_rate_threshold = random.uniform(0,1)
        if exploration_rate_threshold > exploration_rate: 
            action = np.argmax(q_table[state,:])
        else:
            action = env.action_space.sample()

        # Taking action    
        new_state, reward, done, info = env.step(action)
        
        # Update Q-table for Q(s,a)
        q_table[state, action] = (1 - learning_rate) * q_table[state, action] + \
            learning_rate * (reward + discount_rate * np.max(q_table[new_state,:]))

        # Transition to the next state    
        state = new_state
        rewards_current_episode += reward
        
        if done == True: 
            break
            
    # Exploration rate decay
    exploration_rate = min_exploration_rate + \
        (max_exploration_rate - min_exploration_rate) * np.exp(-exploration_decay_rate * episode)
    
    rewards_all_episodes.append(rewards_current_episode)


# After All Episodes Complete

In [7]:
# Calculate and print the average reward per thousand episodes
rewards_per_thousand_episodes = np.split(np.array(rewards_all_episodes), num_episodes / 1000)
count = 1000
print("********** Average  reward per thousand episodes **********\n")

for r in rewards_per_thousand_episodes:
    print(count, ": ", str(sum(r / 1000)))
    count += 1000

********** Average  reward per thousand episodes **********

1000 :  0.05500000000000004
2000 :  0.20300000000000015
3000 :  0.3970000000000003
4000 :  0.5580000000000004
5000 :  0.6080000000000004
6000 :  0.6300000000000004
7000 :  0.6490000000000005
8000 :  0.6450000000000005
9000 :  0.6750000000000005
10000 :  0.6560000000000005


## **Updated Q-table**

In [8]:
# Print updated Q-table
print("\n\n********** Q-table **********\n")
print(q_table)



********** Q-table **********

[[0.52829507 0.46685627 0.50206932 0.4711143 ]
 [0.31459182 0.30310353 0.21217845 0.46564173]
 [0.38380821 0.27603089 0.26820703 0.24461804]
 [0.08668233 0.08361181 0.05532128 0.14610036]
 [0.55050092 0.29175352 0.38617253 0.24677306]
 [0.         0.         0.         0.        ]
 [0.19663999 0.12618528 0.17744843 0.12538836]
 [0.         0.         0.         0.        ]
 [0.39009384 0.37403464 0.36233695 0.58040087]
 [0.5409068  0.56643213 0.44155062 0.27189508]
 [0.53977855 0.33058426 0.31805379 0.30970483]
 [0.         0.         0.         0.        ]
 [0.         0.         0.         0.        ]
 [0.42488923 0.47906703 0.75508444 0.29723675]
 [0.69636126 0.88682903 0.75868539 0.68581278]
 [0.         0.         0.         0.        ]]


# The Code To Watch The Agent Play The Game

In [9]:
#For Each Episode
for episode in range(3):
    state = env.reset()
    done = False
    print("***** EPISODE ", episode + 1, " *****\n\n\n")
    time.sleep(1)
    
    #For Each Time-Step
    for step in range(max_steps_per_episode):
        clear_output(wait = True)
        env.render()
        time.sleep(0.3)
        
        action = np.argmax(q_table[state,:])
        new_state, reward, done, info = env.step(action)
        
        if done: 
            clear_output(wait = True)
            env.render()
            if reward == 1: 
                print("*****You reached your goal!*****")
                time.sleep(3)
            else:
                print("*****You fall through a hole!*****")
                time.sleep(3)
            clear_output(wait = True)
            break
            
        state = new_state
        
env.close()


  (Down)
SFFF
FHFH
FFFH
HFF[41mG[0m
*****You reached your goal!*****
