# S1. Red convolucional para MNIST



Como en la anterior práctica, primero importamos el conjunto de MNIST y lo normalizamos, pero sin convertir las imágenes en vectores unidimensionales, ya que vamos a trabajar con redes convolucionales que explotan la estructura 2D de las imágenes.

In [2]:
## Importar y normalizar datos

from tensorflow import keras
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

print('training set', x_train.shape)
print('test set', x_test.shape)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')

# Normalize [0..255]-->[0..1]
x_train /= 255
x_test /= 255

# convert class vectors to binary class matrices
num_classes=10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2, random_state=42)

print('training set', x_train.shape)
print('val set', x_val.shape)

training set (60000, 28, 28)
test set (10000, 28, 28)
training set (48000, 28, 28)
val set (12000, 28, 28)


## Modelo base
 Partiremos de una topología base que toma la red MLP de la última sesión de la primera práctica (dos capas densas de 1024 neuronas), y le incorpora un par de capas convolucionales cada una seguida por average pooling inspirada en la arquitectura LeNet (1998) propuesta por Yann LeCun para MNIST. 
 

In [4]:
from keras import Sequential
from keras.layers import Input, Conv2D, AveragePooling2D, Flatten, Dense
from keras.optimizers import Adam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint

model = Sequential()

model.add(Input((28,28,1)))
model.add(Conv2D(filters=6, kernel_size=(5,5), activation='relu', input_shape=(28,28,1)))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Conv2D(filters=16, kernel_size=(5,5), activation='relu'))
model.add(AveragePooling2D(pool_size=(2,2), strides=2))
model.add(Flatten())
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=1024, activation='relu'))
model.add(Dense(units=10, activation = 'softmax'))

opt=Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy',
            optimizer=opt,
            metrics=['accuracy'])

reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.2, patience=2, min_lr=0.00001)
checkpoint = ModelCheckpoint(filepath='best_model.h5', monitor='val_accuracy', save_best_only=True, verbose=1)

epochs=25
batch_size=128
history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_val, y_val),
                    callbacks=[reduce_lr,checkpoint])  

## Cargar el mejor modelo y evaluarlo con el test set
model = keras.models.load_model('best_model.h5')
score = model.evaluate(x_test, y_test, verbose=0)
print(f'Test loss: {score[0]*100:.2f}')
print(f'Test accuracy: {score[1]*100:.2f}')

Epoch 1/25
Epoch 1: val_accuracy improved from -inf to 0.97050, saving model to best_model.h5
Epoch 2/25
Epoch 2: val_accuracy improved from 0.97050 to 0.98158, saving model to best_model.h5
Epoch 3/25
Epoch 3: val_accuracy improved from 0.98158 to 0.98533, saving model to best_model.h5
Epoch 4/25
Epoch 4: val_accuracy improved from 0.98533 to 0.98783, saving model to best_model.h5
Epoch 5/25
Epoch 5: val_accuracy did not improve from 0.98783
Epoch 6/25
Epoch 6: val_accuracy improved from 0.98783 to 0.98917, saving model to best_model.h5
Epoch 7/25
Epoch 7: val_accuracy did not improve from 0.98917
Epoch 8/25
Epoch 8: val_accuracy improved from 0.98917 to 0.99050, saving model to best_model.h5
Epoch 9/25
Epoch 9: val_accuracy did not improve from 0.99050
Epoch 10/25
Epoch 10: val_accuracy did not improve from 0.99050
Epoch 11/25
Epoch 11: val_accuracy did not improve from 0.99050
Epoch 12/25
Epoch 12: val_accuracy improved from 0.99050 to 0.99150, saving model to best_model.h5
Epoch 13

## Ejercicio:

Probar las técnicas presentadas en la práctica 1 (dropout, batchnorm y aumento de datos) para obtener un acierto en test > 99%, incluso mejor que la obtenida con redes MLP.