# Q-Learning

This notebook implements the Q-Learning algorithm for the [FrozenLake](https://gym.openai.com/envs/FrozenLake-v0/) game.
See `../ReinforcementLearning_Guide.md` for theory and intuition.

According to the OpenAI environment page of FrozenLake: "The agent controls the movement of a character in a grid world. Some tiles of the grid are walkable, and others lead to the agent falling into the water. Additionally, the movement direction of the agent is uncertain and only partially depends on the chosen direction. The agent is rewarded for finding a walkable path to a goal tile."

The surface is described using a grid:

    SFFF       (S: starting point, safe)
    FHFH       (F: frozen surface, safe)
    FFFH       (H: hole, fall to your doom)
    HFFG       (G: goal, where the frisbee is located)

The game episode ends when the agent reaches the goal or it falls in a hole. You receive a reward of 1 if you reach the goal, zero otherwise.

We are going to disable the slippery property; otherwise taken actions are not carried out necessarily, and the learning process takes longer. That change is after this post:

[https://github.com/openai/gym/issues/565](https://github.com/openai/gym/issues/565)

Overview of sections:

1. Basic setup of FrozenLake
2. 

## 1. Basic setup of FrozenLake

In [6]:
import numpy as np
import time
import matplotlib.pyplot as plt
%matplotlib notebook

In [2]:
import gym

  for external in metadata.entry_points().get(self.group, []):


In [3]:
# In order to remove the slippery tiles, we need to create/register a new environment
# with custom properties.
# That can be done as explained on this link
# https://github.com/openai/gym/issues/565
from gym.envs.registration import register
try:
    register(
        id='FrozenLakeNotSlippery-v0', # our custom name
        entry_point='gym.envs.toy_text:FrozenLakeEnv', # take the FrozenLakeEnv as the template
        kwargs={'map_name' : '4x4', 'is_slippery': False}, # changes we apply; look at Github
        max_episode_steps=100, # default 100; 100 steps allowed in an episode
        # the reward_threshold makes sense for games with continuous rewards
        # such as the cart pole; but not really here
        # we leave the default, though
        reward_threshold=.8196, # optimum = .8196
    )
except:
    print('A new env can be registered only once.')

In [21]:
# We create an env and play in it
# Note that the environment is completely rendered after each step/action
env = gym.make('FrozenLakeNotSlippery-v0')
env.reset()

for step in range(5):
    env.render()
    action = env.action_space.sample()
    env.step(action)
    #time.sleep(0.5)
env.close()


[41mS[0mFFF
FHFH
FFFH
HFFG
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG
  (Down)
SFFF
[41mF[0mHFH
FFFH
HFFG
  (Left)
SFFF
[41mF[0mHFH
FFFH
HFFG
  (Up)
[41mS[0mFFF
FHFH
FFFH
HFFG


In [16]:
# To clear/flush the text display we can use clear_output on Jupyter
from IPython.display import clear_output
# On python scripts:
# import os
# os.system('clear') # 'cls' ono Windows

In [20]:
# We create an env and play in it
env = gym.make('FrozenLakeNotSlippery-v0')
env.reset()

for step in range(10):
    env.render()
    action = env.action_space.sample()
    observation,reward,done,info = env.step(action)
    time.sleep(0.2)
    clear_output(wait=True)
    if done:
        env.reset()
env.close()

  (Left)
SFFF
FHFH
[41mF[0mFFH
HFFG
