<a href="https://colab.research.google.com/github/maviverosp/PUC-Rio/blob/main/Q_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
#Biblioteca GYM
#Install the dependencies on Google Colab

!pip install numpy
!pip install gym



##Etapa 0: Importar las dependencias
Usamos tres bibliotecas
- **"Numpy"** para nuestra Qtable
- **"OpenAI GyM"** para nuestro ambiente FrozenLake
- **"Random"** para generar numeros aleatrotios

In [3]:
import numpy as np
import gym
import random


##Etapa 1: Crear el ambiente
1. Aqui vamos a crear el ambiente FrozenLake 8x8.
2. OpenAi GyM es una biblioteca compuesta por varios ambientes que podemos usar para entrenar nuestros agentes.
3. En nuestro caso optamos por usar Frozen Lake.

In [4]:
env = gym.make("FrozenLake-v1")

  and should_run_async(code)
  deprecation(
  deprecation(


##Etapa 2: Crear la tabla Q e iniciela
- Ahora, vamos a crear nuestra tabla Q, para saber de cuantas lineas (estados) e columnas (acciones) necesitamos, se requiere calcular el action_size o state_size.
- OpenAI GyM nos entrega una manera de hacer eso: `env.action_space.n` y `env.observation_space.n`

In [5]:
action_size = env.action_space.n
state_size = env.observation_space.n

In [6]:
#Vamos a crear nuestro tabla Q con state_size (lineas) y action_size columns (64x4)
qtable = np.zeros((state_size, action_size))
print(qtable)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]


##Etapa 3: Crear los hiperparámetros
Aqui las especificaremos los hiperparámetros.



In [7]:
total_episodes = 50000 #Total Episodes
learning_rate = 0.7 #Learning Rate
max_steps = 99 #Max Steps per episodes
gamma = 0.95 #Discount rate

#Exploration parametres
epsilon = 1.0 #Exploration rate
max_epsilon = 1.0 #Exploration probability at start
min_epsilon = 0.01 #Minimun exploration probability
decay_rate = 0.005 # Exponential decay rate for exploration prob.

#Etapa 4: El algoritmo de aprendizaje Q
Ahora implementaremos el algoritmo de aprendizaje Q:
  ![alt text](http://simoninithomas.com/drlc/Qlearning//qtable_algo.png)

In [8]:
# Lista de rewards
rewards = []

# 2 for life or until learning is stopped
for episode in range(total_episodes):
  #Reset de environment
  state = env.reset()
  step = 0
  done = False
  total_rewards = 0

  for step in range(max_steps):
    # 3. Choose an action a in the corrent world state (s)
    ##Firt we randomize a number
    exp_exp_tradeoff = random.uniform(0, 1)

    ## If this number > greater than epsilon --> exploration (taking the biggest Q value for this state)
    if exp_exp_tradeoff > epsilon:
      action = np.argmax(qtable[state,:])
      #print(exp_exp_tradeoff, "action", action)

    #Else doing a random choice --> exploration
    else:
      action = env.action_space.sample()
      #print("action random", action)

    #Takke the action (a) and observe the outcome state(s') and reward (r)
    new_state, reward, done, info = env.step(action)

    #Update Q(s,a) := Q(s,a) + lr [R(s,a)] + gamma * max Q(s', a') - Q(s,a)]
    #Qtable [new_state,:] : all teh actions we can take from new state
    qtable[state, action] = qtable[state, action] + learning_rate * (reward + gamma * np.max(qtable[new_state,:]) - qtable[state, action])

    total_rewards += reward

    #Our new state is state
    state = new_state

    #If done (if we're dead) : finish episode
    if done == True:
      break

  #Reduce epsilon (Because we need less and less exploration)
  epsilon = min_epsilon + (max_epsilon - min_epsilon) * np.exp(-decay_rate*episode)
  rewards.append(total_rewards)

print("Score over time: " + str(sum(rewards)/total_episodes))

print(qtable)



  if not isinstance(terminated, (bool, np.bool8)):


Score over time: 0.50328
[[2.06281740e-01 1.10534205e-01 1.18549231e-01 1.39500812e-01]
 [1.07521374e-02 2.37662494e-02 1.02187560e-02 7.45196401e-02]
 [9.94001123e-03 4.28311598e-01 2.18054973e-02 2.80760114e-02]
 [2.02938987e-03 4.09266244e-03 1.31207876e-02 1.05352466e-01]
 [2.62133301e-01 6.43447753e-02 4.64037477e-02 3.87782741e-02]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [2.21100320e-04 2.94774127e-06 1.36709504e-01 3.44607818e-05]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [2.25027229e-02 1.58870643e-02 2.65886648e-02 2.89130058e-01]
 [1.17004770e-02 5.72184580e-01 9.37820393e-04 1.53286727e-02]
 [6.05082751e-03 8.46973483e-01 3.89494362e-03 5.60471728e-04]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00]
 [1.14875726e-01 1.37896293e-04 4.93883418e-01 1.60445654e-01]
 [5.33320225e-01 6.23059641e-01 9.36968086e-01 2.91523354e-01]
 [0.00000000e+00 0.00000000e+0

In [9]:
env.reset()

for episode in range(10):
    state = env.reset()
    step = 0
    done = False
    print("*************************************************")
    print("Episode: ", episode)

    for step in range(max_steps):

        #Take the action (index) that have the maximum expected future reward given that state
        action = np.argmax(qtable[state,:])

        new_state, reward, done, infor = env.step(action)

        if done:
            #Here, we decide to only print the last state (to see of our agent is on the goal or fall into an hole)
            env.render()
            if new_state == 15:
                print("We reached our Goal 🏆 ")
            else:
                print("We fell into a hole ☠️")

            # We print the number of step it took.
            print("Number of steps", step)

            break
        state = new_state
env.close()


*************************************************
Episode:  0


If you want to render in human mode, initialize the environment in this way: gym.make('EnvName', render_mode='human') and don't call the render method.
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(


We reached our Goal 🏆 
Number of steps 15
*************************************************
Episode:  1
We fell into a hole ☠️
Number of steps 30
*************************************************
Episode:  2
We fell into a hole ☠️
Number of steps 39
*************************************************
Episode:  3
We fell into a hole ☠️
Number of steps 21
*************************************************
Episode:  4
We fell into a hole ☠️
Number of steps 51
*************************************************
Episode:  5
We fell into a hole ☠️
Number of steps 29
*************************************************
Episode:  6
We reached our Goal 🏆 
Number of steps 21
*************************************************
Episode:  7
We reached our Goal 🏆 
Number of steps 33
*************************************************
Episode:  8
We fell into a hole ☠️
Number of steps 8
*************************************************
Episode:  9
We fell into a hole ☠️
Number of steps 23
