# Actividad - Proyecto práctico
> La actividad se desarrollará en grupos pre-definidos de 2-3 alumnos. Se debe indicar los nombres en orden alfabético (de apellidos). Recordad que esta actividad se corresponde con un 30% de la nota final de la asignatura. Se debe entregar entregar el trabajo en la presente notebook.
*   Alumno 1: Granizo, Mateo
*   Alumno 2: Maiolo, Pablo
*   Alumno 3: Miglino, Diego

## **PARTE 1** - Instalación y requisitos previos

### 1.2. Localizar entorno de trabajo: Google colab o local

In [1]:
try:
  from google.colab import drive
  IN_COLAB=True
except:
  IN_COLAB=False
print(IN_COLAB)

False


### 1.4. Instalar librerías necesarias

In [2]:
# %pip install -r requirements.txt

## **PARTE 3**. Desarrollo y preguntas

#### Importar librerías

In [3]:
from __future__ import division
import numpy as np
import gym
import os
import tensorflow as tf
from PIL import Image
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Flatten, Conv2D, Permute
from tensorflow.keras.optimizers import Adam
import tensorflow.keras.backend as K
from rl.agents.dqn import DQNAgent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy
from rl.memory import SequentialMemory
from rl.callbacks import FileLogger, ModelIntervalCheckpoint
from rl.core import Processor



#### Configuración base

In [4]:
INPUT_SHAPE = (84, 84)
WINDOW_LENGTH = 4
env_name = 'SpaceInvaders-v0'
env = gym.make(env_name)
np.random.seed(123)
env.seed(123)
nb_actions = env.action_space.n
print(f"Numero de acciones disponibles: {nb_actions}")

Numero de acciones disponibles: 6


In [5]:
class AtariProcessor(Processor):
    def process_observation(self, observation):
        assert observation.ndim == 3
        img = Image.fromarray(observation)
        img = img.resize(INPUT_SHAPE).convert('L')
        processed_observation = np.array(img)
        assert processed_observation.shape == INPUT_SHAPE
        return processed_observation.astype('uint8')

    def process_state_batch(self, batch):
        processed_batch = batch.astype('float32') / 255.
        return processed_batch

    def process_reward(self, reward):
        return np.clip(reward, -1., 1.)

### 1. Implementación de la red neuronal

In [6]:
input_shape = (WINDOW_LENGTH,) + INPUT_SHAPE
model = Sequential()
if K.image_data_format() == 'channels_last':
    model.add(Permute((2, 3, 1), input_shape=input_shape))
elif K.image_data_format() == 'channels_first':
    model.add(Permute((1, 2, 3), input_shape=input_shape))
else:
    raise RuntimeError('Unknown image_dim_ordering.')
model.add(Conv2D(32, (8, 8), strides=(4, 4), activation='relu'))
model.add(Conv2D(64, (4, 4), strides=(2, 2), activation='relu'))
model.add(Conv2D(64, (3, 3), strides=(1, 1), activation='relu'))
model.add(Flatten())
model.add(Dense(512, activation='relu'))
model.add(Dense(nb_actions, activation='linear'))
print(model.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
permute (Permute)            (None, 84, 84, 4)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 20, 20, 32)        8224      
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 9, 9, 64)          32832     
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          36928     
_________________________________________________________________
flatten (Flatten)            (None, 3136)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               1606144   
_________________________________________________________________
dense_1 (Dense)              (None, 6)                 3

### 2. Implementación de la solución DQN

In [7]:
memory = SequentialMemory(limit=500000, window_length=WINDOW_LENGTH)
processor = AtariProcessor()
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps',
                              value_max=1.0, value_min=0.1, value_test=0.01,
                              nb_steps=150000)

dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy,
               memory=memory, processor=processor,
               nb_steps_warmup=50000, gamma=0.99, train_interval=4, delta_clip=1.0)
dqn.compile(Adam(learning_rate=1e-4), metrics=['mae'])

### 3. Entrenamiento del agente

In [None]:
weights_filename = 'dqn_{}_weights.h5f'.format(env_name)
checkpoint_weights_filename = 'dqn_' + env_name + '_weights_{step}.h5f'
log_filename = 'dqn_{}_log.json'.format(env_name)
callbacks = [ModelIntervalCheckpoint(checkpoint_weights_filename, interval=25000)]
callbacks += [FileLogger(log_filename, interval=1000)]

dqn.fit(env, callbacks=callbacks, nb_steps=250000, log_interval=10000, visualize=False)

dqn.save_weights(weights_filename, overwrite=True)

Training for 250000 steps ...
Interval 1 (0 steps performed)
   10/10000 [..............................] - ETA: 59s - reward: 0.0000e+00  



15 episodes - episode_reward: 8.000 [3.000, 16.000] - ale.lives: 2.145

Interval 2 (10000 steps performed)
15 episodes - episode_reward: 9.533 [4.000, 19.000] - ale.lives: 2.083

Interval 3 (20000 steps performed)
14 episodes - episode_reward: 9.643 [5.000, 15.000] - ale.lives: 2.120

Interval 4 (30000 steps performed)
13 episodes - episode_reward: 12.077 [5.000, 23.000] - ale.lives: 2.175

Interval 5 (40000 steps performed)
13 episodes - episode_reward: 12.615 [5.000, 23.000] - ale.lives: 2.050

Interval 6 (50000 steps performed)
15 episodes - episode_reward: 8.533 [3.000, 17.000] - loss: 0.007 - mae: 0.032 - mean_q: 0.045 - mean_eps: 0.670 - ale.lives: 1.898

Interval 7 (60000 steps performed)
16 episodes - episode_reward: 9.250 [1.000, 22.000] - loss: 0.007 - mae: 0.052 - mean_q: 0.071 - mean_eps: 0.610 - ale.lives: 2.019

Interval 8 (70000 steps performed)
11 episodes - episode_reward: 14.727 [4.000, 25.000] - loss: 0.007 - mae: 0.073 - mean_q: 0.099 - mean_eps: 0.550 - ale.lives: 

### 4. Test y visualización

In [None]:
# Test de n episodios para calcular la recompensa final
# NOTA: 'visualize=True' intentará abrir una ventana emergente.
# Si no funciona, asegúrate de tener las dependencias de renderizado instaladas.
# Para entornos Atari, prueba a ejecutar en tu terminal: pip install pyglet

weights_filename = 'dqn_{}_weights.h5f'.format(env_name)
dqn.load_weights(weights_filename)
dqn.test(env, nb_episodes=10, visualize=True)