# Lab 4: Q-table based reinforcement learning



Solve [`FrozenLake8x8-v1`](https://www.gymlibrary.dev/environments/toy_text/frozen_lake/) using a Q-table.


1. Import Necessary Packages (e.g. `gym`, `numpy`):

In [22]:
import gym
import numpy as np
import random
from gym.utils.play import play


2. Instantiate the Environment and Agent

In [23]:
env = gym.make('FrozenLake8x8-v1', desc=None, map_name="8x8", is_slippery=False)
action_size = env.action_space.n
state_size = env.observation_space.n

3. Set up the QTable:

In [24]:
learning_rate = 0.1
discount_rate = 0.9

q_table = np.zeros((state_size, action_size))

def update_q_table(reward, prev_state, action, state):
    q_table[prev_state, action] = q_table[prev_state, action] + learning_rate * \
                                   (reward + discount_rate * np.max(q_table[state]) - q_table[prev_state, action])

def get_best_action(state):
    return np.argmax(q_table[state])

4. The Q-Learning algorithm training

In [28]:

def train_q_table(env, max_steps=40, episodes_count=10000, exploration=0.9, explore_reduction=0.99, explore_min=0.01):

    for i in range(0, episodes_count):
        last_observation,_ = env.reset()
        terminated = False
        steps = 0
    
        while not terminated:
        
            # Get action either randomly or from the q table:
            if exploration < random.uniform(0, 1):
                action = get_best_action(last_observation)
            else:
                action = env.action_space.sample()
    
            # Take action in the environment
            observation, reward, terminated, truncated, info = env.step(action)
    
            if terminated or truncated or steps >= max_steps:
                
                # If reward is 0 and game has terminated, it means the agent stepped into a lake.
                if reward == 0:
                    # Q table is updated with reward -1 to penalize stepping into lakes
                    update_q_table(-1, last_observation, action, observation)
    
                # Reduce exploration rate after a run
                if exploration > explore_min:
                    exploration *= explore_reduction

                break
            
            update_q_table(reward, last_observation, action, observation)
    
            last_observation = observation
            steps += 1

train_q_table(env)

5. Evaluate how well your agent performs
* Render output of one episode
* Give an average episode return

In [30]:
def run(env):
    state = env.observation_space.sample()
    state, _ = env.reset()
    terminated = False
    truncated = False
    while not terminated and not truncated:
        # Get action either randomly or from the q table
        action = get_best_action(state)

        # Take action in the environment
        state, reward, terminated, truncated, _ = env.step(action)
    return reward


# Run one episode
def q_table_statistics(episode_count=100):
    env2 = gym.make('FrozenLake8x8-v1', desc=None, map_name="8x8", is_slippery=False, render_mode='human')
    run(env2)
    env2.close()

    score_sum = 0

    for i in range(episode_count):
        score = run(env)
        score_sum += score
    
    print("Average episode score: ", float(score_sum)/ float(episode_count))

q_table_statistics()

Average episode score:  1.0


6. (<i>Optional</i>) Adapt code for one of the continuous [Classical Control](https://www.gymlibrary.dev/environments/classic_control/) problems. Think/talk about how you could use our  `Model` class from last Thursday to decide actions.