# S1. Red convolucional para MNIST



Como en la anterior práctica, primero importamos el conjunto de MNIST y lo normalizamos, pero sin convertir las imágenes en vectores unidimensionales, ya que vamos a trabajar con redes convolucionales que explotan la estructura 2D de las imágenes.

In [2]:
## Importar y normalizar datos

from tensorflow import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

print('training set', x_train.shape)
print('test set', x_test.shape)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize [0..255]-->[0..1]
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
num_classes=10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

print('training set', x_train.shape)
print('val set', x_val.shape)

training set (60000, 28, 28)
test set (10000, 28, 28)
training set (48000, 28, 28)
val set (12000, 28, 28)


## Modelo base
 Partiremos de una topología base que toma la red MLP de la última sesión de la primera práctica (dos capas densas de 1024 neuronas), y le incorpora un par de capas convolucionales cada una seguida por average pooling inspirada en la arquitectura LeNet (1998) propuesta por Yann LeCun para MNIST. 
 

In [4]:
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

epochs=25
batch_size=128
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])  

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')

Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.96925, saving model to best_model.h5
Epoch 2/25
Epoch 2: val_accuracy improved from 0.96925 to 0.97875, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy improved from 0.97875 to 0.98475, saving model to best_model.h5
Epoch 4/25
Epoch 4: val_accuracy improved from 0.98475 to 0.98500, saving model to best_model.h5
Epoch 5/25
Epoch 5: val_accuracy improved from 0.98500 to 0.98675, saving model to best_model.h5
Epoch 6/25
Epoch 6: val_accuracy improved from 0.98675 to 0.98817, saving model to best_model.h5
Epoch 7/25
Epoch 7: val_accuracy improved from 0.98817 to 0.98942, saving model to best_model.h5
Epoch 8/25
Epoch 8: val_accuracy did not improve from 0.98942
Epoch 9/25
Epoch 9: val_accuracy improved from 0.98942 to 0.99058, saving model to best_model.h5
Epoch 10/25
Epoch 10: val_accuracy improved from 0.99058 to 0.99183, saving model to best_model.h5
Epoch 11/25
Epoch 11: val_accuracy improved from 0.99183 to 0.9920

## Ejercicio:

Probar las técnicas presentadas en la práctica 1 (dropout, batchnorm y aumento de datos) para obtener un acierto en test > 99%, incluso mejor que la obtenida con redes MLP.

## Solución:

## Regularización l2 (o l1)

La regularización l2 consiste en añadir a la función de coste una penalización proporcional a la norma l2 de los pesos del modelo. De esta forma, se penaliza a los pesos que tengan un valor alto, forzando a que los pesos tengan valores pequeños. Esto se conoce como regularización l2. También podríamos hacer lo mismo con regularización l1 o con ambas (lo que se conoce como *Elastic net*)


In [5]:
## Teniendo en cuenta el modelo base añade regularización L2 a las capas densas
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint
from keras.regularizers import l2

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dense(units=1024, activation='relu', kernel_regularizer=l2(0.01)))
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

epochs=25
batch_size=128
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])  

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')

Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.93367, saving model to best_model.h5
Epoch 2/25
Epoch 2: val_accuracy improved from 0.93367 to 0.95400, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy improved from 0.95400 to 0.96458, saving model to best_model.h5
Epoch 4/25
Epoch 4: val_accuracy improved from 0.96458 to 0.97208, saving model to best_model.h5
Epoch 5/25
Epoch 5: val_accuracy did not improve from 0.97208
Epoch 6/25
Epoch 6: val_accuracy improved from 0.97208 to 0.97650, saving model to best_model.h5
Epoch 7/25
Epoch 7: val_accuracy did not improve from 0.97650
Epoch 8/25
Epoch 8: val_accuracy improved from 0.97650 to 0.97850, saving model to best_model.h5
Epoch 9/25
Epoch 9: val_accuracy did not improve from 0.97850
Epoch 10/25
Epoch 10: val_accuracy did not improve from 0.97850
Epoch 11/25
Epoch 11: val_accuracy improved from 0.97850 to 0.97933, saving model to best_model.h5
Epoch 12/25
Epoch 12: val_accuracy did not improve from 0.97933
Epoch 13

## Dropout

El dropout es una técnica de regularización que consiste en eliminar aleatoriamente un porcentaje de las neuronas de la red durante el entrenamiento. De esta forma, se evita que la red se sobreajuste a los datos de entrenamiento y se mejora la generalización del modelo.


In [19]:
## Teniendo en cuenta el modelo base añade regularización de tipo dropout a las capas densas
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense, Dropout
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=1024, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

epochs=25
batch_size=128
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])  

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')


Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.96983, saving model to best_model.h5
Epoch 2/25
 53/375 [===>..........................] - ETA: 0s - loss: 0.1666 - accuracy: 0.9562

  saving_api.save_model(


Epoch 2: val_accuracy improved from 0.96983 to 0.97633, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy did not improve from 0.97633
Epoch 4/25
Epoch 4: val_accuracy improved from 0.97633 to 0.97950, saving model to best_model.h5
Epoch 5/25
Epoch 5: val_accuracy did not improve from 0.97950
Epoch 6/25
Epoch 6: val_accuracy did not improve from 0.97950
Epoch 7/25
Epoch 7: val_accuracy improved from 0.97950 to 0.98508, saving model to best_model.h5
Epoch 8/25
Epoch 8: val_accuracy improved from 0.98508 to 0.98725, saving model to best_model.h5
Epoch 9/25
Epoch 9: val_accuracy improved from 0.98725 to 0.98792, saving model to best_model.h5
Epoch 10/25
Epoch 10: val_accuracy did not improve from 0.98792
Epoch 11/25
Epoch 11: val_accuracy did not improve from 0.98792
Epoch 12/25
Epoch 12: val_accuracy improved from 0.98792 to 0.98892, saving model to best_model.h5
Epoch 13/25
Epoch 13: val_accuracy did not improve from 0.98892
Epoch 14/25
Epoch 14: val_accuracy improved from 

## Normalización BatchNorm

La normalización BatchNorm consiste en normalizar la salida de una capa de la red neuronal para que tenga media 0 y varianza 1. De esta forma, se consigue que la red neuronal pueda entrenarse más rápido y que sea más robusta a cambios en los pesos de las capas anteriores.


In [20]:
## Teniendo en cuenta el modelo base añade normalización BatchNorm
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense, BatchNormalization
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

epochs=25
batch_size=128
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])  

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')

Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.97050, saving model to best_model.h5
Epoch 2/25
Epoch 2: val_accuracy improved from 0.97050 to 0.97633, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy improved from 0.97633 to 0.98158, saving model to best_model.h5
Epoch 4/25
Epoch 4: val_accuracy improved from 0.98158 to 0.98425, saving model to best_model.h5
Epoch 5/25
Epoch 5: val_accuracy did not improve from 0.98425
Epoch 6/25
Epoch 6: val_accuracy did not improve from 0.98425
Epoch 7/25
Epoch 7: val_accuracy improved from 0.98425 to 0.99150, saving model to best_model.h5
Epoch 8/25
Epoch 8: val_accuracy improved from 0.99150 to 0.99167, saving model to best_model.h5
Epoch 9/25
Epoch 9: val_accuracy improved from 0.99167 to 0.99208, saving model to best_model.h5
Epoch 10/25
Epoch 10: val_accuracy improved from 0.99208 to 0.99292, saving model to best_model.h5
Epoch 11/25
Epoch 11: val_accuracy did not improve from 0.99292
Epoch 12/25
Epoch 12: val_accuracy di

## Aumentado de datos

El aumentado de datos consiste en generar nuevos datos de entrenamiento a partir de los datos de entrenamiento originales. De esta forma, se consigue que el modelo sea más robusto y que se generalice mejor a datos que no ha visto durante el entrenamiento.

En nuestro caso para los dígitos de la MNIST vamos a realizar un aumento de datos de la siguiente forma:

- Rotación aleatoria de la imagen entre -30 y 30 grados.
- Traslación aleatoria de la imagen entre -3 y 3 píxeles en horizontal y vertical.
- Escalado aleatorio de la imagen entre 0.8 y 1.2.
- Inversión aleatoria de la imagen en horizontal y vertical. **NO!!!**

El aumentado de datos se ejecuta en CPU y ralentiza el entrenamiento.

Normalmente además, se necesitarán más epochs para entrenar el modelo. 

In [22]:
## Implementamos en el ejemplo base el aumentado de datos
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense, BatchNormalization
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint
from keras.preprocessing.image import ImageDataGenerator

datagen = ImageDataGenerator(
    rotation_range=30,
    width_shift_range=0.1,
    height_shift_range=0.1,
    zoom_range=0.2,
    horizontal_flip=False,
    vertical_flip=False,
    fill_mode='nearest')

## Importante: ImageDataGenerator espera una imagen con 3 canales, necesitamos hacer reshape
x_train = x_train.reshape(48000, 28, 28, 1)
x_val = x_val.reshape(12000, 28, 28, 1)
x_test = x_test.reshape(10000, 28, 28, 1)

## Ajustamos el generador de datos
datagen.fit(x_train)

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=1024, activation='relu'))
model.add(BatchNormalization())
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2,patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)


epochs=25
batch_size=128
## Entrenamos con el generador de datos en lugar de con el dataset
history = model.fit(datagen.flow(x_train, y_train, batch_size=batch_size),
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')



Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.95550, saving model to best_model.h5
Epoch 2/25
Epoch 2: val_accuracy improved from 0.95550 to 0.97275, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy improved from 0.97275 to 0.97750, saving model to best_model.h5
Epoch 4/25
Epoch 4: val_accuracy did not improve from 0.97750
Epoch 5/25
Epoch 5: val_accuracy improved from 0.97750 to 0.97867, saving model to best_model.h5
Epoch 6/25
Epoch 6: val_accuracy improved from 0.97867 to 0.98150, saving model to best_model.h5
Epoch 7/25
Epoch 7: val_accuracy improved from 0.98150 to 0.98392, saving model to best_model.h5
Epoch 8/25
Epoch 8: val_accuracy did not improve from 0.98392
Epoch 9/25
Epoch 9: val_accuracy did not improve from 0.98392
Epoch 10/25
Epoch 10: val_accuracy improved from 0.98392 to 0.98883, saving model to best_model.h5
Epoch 11/25
Epoch 11: val_accuracy did not improve from 0.98883
Epoch 12/25
Epoch 12: val_accuracy improved from 0.98883 to 0.98900, sav