### What is Q Learning?

Q-learning is a model-free, value-based, off-policy algorithm that will find the best series of actions based on the agent's current state. The “Q” stands for quality. Quality represents how valuable the action is in maximizing future rewards.  


In [1]:
import numpy as np
import gym
import random
import imageio
#Progress pars
from tqdm.notebook import trange

### Frozen Lake Gym Environment 
We are going to create a non-slippery 4x4 environment using the Frozen Lake gym library. 

There are two grid versions, “4x4” and “8x8”.
If the `is_slippery=True`, the agent may not move in the intended direction due to the slippery nature of the frozen lake. 
After initializing the environment, we will do an environmental analysis. 

In [2]:
#Create our environment
env = gym.make("FrozenLake-v1", map_name="4x4", is_slippery = False)
print("Observation Space ", env.observation_space)
print("Sample observation", env.observation_space.sample()) #Display a random observation

Observation Space  Discrete(16)
Sample observation 15


  deprecation(
  deprecation(


In [3]:
#Print action space shape and sample
print("Action Space Shape", env.action_space.n)
print("Action Space Sample", env.action_space.sample())

Action Space Shape 4
Action Space Sample 3


### Create and Initialize the Q-table

The Q-Table has columns as actions, and rows as states. We can use OpenAI Gym to find action space and state space. We will then use this information to create the Q-Table. 

In [4]:
state_space = env.observation_space.n
print("There are ", state_space, " possible states")

action_space = env.action_space.n
print("There are ", action_space, " possible actions")

There are  16  possible states
There are  4  possible actions


In [5]:
#Create a numpy array of 16 x 4
def initialize_q_table(state_space, action_space):
    Qtable = np.zeros((state_space, action_space))
    return Qtable

Qtable_frozenlake = initialize_q_table(state_space, action_space)