<img>
<font color="#CA3532"><h1 align="left">Deep Learning</h1></font>
<font color="#6E6E6E"><h2 align="left">Introducción a Keras</h2></font>

### <font color="#CA3532">Recursos</font>

- Página oficial: https://keras.io/
- Getting started with the Keras Sequential model: https://keras.io/getting-started/sequential-model-guide/
- Keras guide (de la página de TensorFlow): https://www.tensorflow.org/guide/keras
- Libro de Francois Chollet, *Deep Learning with Python*: https://www.manning.com/books/deep-learning-with-python

### <font color="#CA3532">Resolviendo MNIST con Keras</font>

En este notebook vamos a construir una red neuronal para el problema MNIST (http://yann.lecun.com/exdb/mnist/) usando Keras. Lo primero, como siempre, es importar las librerías necesarias:

In [None]:
import tensorflow as tf
from tensorflow import keras

import numpy as np
import matplotlib.pyplot as plt
from time import time
import shutil

Cargamos los datos de MNIST:

In [None]:
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

print(train_images.shape)
print(train_labels.shape)
print(train_labels)

print(test_images.shape)
print(test_labels.shape)
print(test_labels)

Dibujamos algunas de las imágenes:

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(train_labels[i])

Antes de construir los modelos normalizamos las imágenes dividiendo entre el valor máximo para tenerlo entre 0 y 1:

In [None]:
train_images = train_images / 255
test_images = test_images / 255

In [None]:
plt.imshow(train_images[0], cmap=plt.cm.binary)
plt.colorbar()
plt.show()

Ahora restamos la media:

In [None]:
mean_img = train_images.mean(axis=0)
train_images = train_images - mean_img
test_images = test_images - mean_img

In [None]:
plt.imshow(mean_img, cmap=plt.cm.binary)
plt.colorbar()
plt.show()

In [None]:
plt.imshow(train_images[0], plt.cm.binary, vmin=-1, vmax=1)
plt.colorbar()
plt.show()

Vamos a crear una red muy sencilla para este problema. Usaremos una única capa oculta con **30 unidades sigmoides**. La activación en la salida será **SoftMax** y como función de coste usaremos **cross-entropy**.

En Keras hay que definir el modelo como un conjunto de capas apiladas unas sobre otras de manera secuencial. La clase básica para implementar un modelo es <a href="https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential"> tf.keras.Sequential</a>.

In [None]:
model = keras.Sequential()

Sobre este modelo vamos apilando las capas. La primera de nuestro modelo será una capa que "aplane" la entrada, transformando las imágenes de 28x28 píxeles a vectores con 784 componentes. Nótese que, por estar la primera capa directamente conectada a la entrada, es necesario especificar el tamaño de la misma con el argumento *input_shape*.

In [None]:
model.add(keras.layers.Flatten(input_shape=(28, 28), name="entrada"))

A continuación añadimos la capa oculta con 30 unidades sigmoides:

In [None]:
model.add(keras.layers.Dense(30, activation="sigmoid", name="oculta"))

Y finalmente añadimos la capa de salida con 10 unidades de tipo *softmax*:

In [None]:
model.add(keras.layers.Dense(10, activation="softmax", name="salida"))

El método *summary()* nos muestra un resumen del modelo:

In [None]:
model.summary()

También podemos mostrar el resumen como un grafo de conexiones entre capas:

In [None]:
tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)

Una vez hemos construido el modelo, es necesario decirle a Keras cómo vamos a entrenarlo. Para ello tenemos que llamar al método *compile()*, indicando el optimizador, la función de coste y las métricas que vamos a usar para evaluar el modelo. En este caso usaremos un optimizador de descenso por gradiente estándar, la función de coste cross-entropy y la métrica *accuracy*:

In [None]:
model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01),
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

Y finalmente entrenamos el modelo invocando al método *fit()*. Se pueden especificar, entre otros, los siguientes argumentos:

- Los datos de entrenamiento junto con sus etiquetas correspondientes.
- El número de épocas (10 en el ejemplo).
- El tamaño del batch (1000 en el ejemplo).
- Los datos de validación.

In [None]:
nepochs = 50
history = model.fit(train_images,
                    train_labels,
                    epochs=nepochs,
                    validation_data=(test_images, test_labels),
                    batch_size=1000)

El método *fit()* devuelve un objeto de la clase *History*. En este, el atributo *history* contiene el valor del coste y las métricas de evaluación en cada época para el conjunto de entrenamiento. Si se han usado datos de validación, el coste y las métricas también están disponibles para éste. Podemos usar estos datos para representar gráficamente la evolución del coste y la precisión:

In [None]:
hd = history.history

epochs = range(1, nepochs+1)

plt.figure(figsize=(12,6))

plt.subplot(1,2,1)
plt.plot(epochs, hd['acc'], "r", label="train")
plt.plot(epochs, hd['val_acc'], "b", label="valid")
plt.grid(True)
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.title("Accuracy")
plt.legend()

plt.subplot(1,2,2)
plt.plot(epochs, hd['loss'], "r", label="train")
plt.plot(epochs, hd['val_loss'], "b", label="valid")
plt.grid(True)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.title("Loss")
plt.legend()

plt.show()

### <font color="#CA3532">TensorBoard</font>

TensorBoard (https://www.tensorflow.org/tensorboard ) es una herramienta de visualización que nos permite, entre otras cosas, monitorizar el entrenamiento de la red. Para usar TensorBoard desde Google Colab hacemos lo siguiente:

In [None]:
## NOTA: En algunas versiones de Firefox no funciona. Se recomienda utilizar Google Chrome

In [None]:
%load_ext tensorboard

In [None]:
%tensorboard --logdir logs

Esto lanza la herramienta TensorBoard dentro de colab. Hemos especificado un directorio de logs del que TensorBoard leerá la información. Tenemos que decirle a keras que guarde la información de cada época en el mismo directorio, y esto lo hacemos mediante un *callback*:

In [None]:
# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=(28, 28), name="entrada"))
model.add(keras.layers.Dense(30, activation="sigmoid", name="oculta"))
model.add(keras.layers.Dense(10, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01),
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir="logs/prueba-1", histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=100,
                    validation_data=(test_images, test_labels),
                    batch_size=1000,
                    callbacks=callbacks)

### <font color="#CA3532">Evaluación del modelo y predicciones</font>

Una vez entrenado el modelo, podemos llamar a *evaluate()* para evaluarlo sobre un conjunto de datos (test) o a *predict()* para obtener sus predicciones.

El método *evaluate()* obtiene las métricas de evaluación sobre un conjunto de datos:

In [None]:
loss_test, acc_test = model.evaluate(test_images, test_labels)
print("Loss on test set = %f" % (loss_test))
print("Accuracy on test set = %f" % (acc_test))

El método *predict()* devuelve las predicciones del modelo:

In [None]:
predictions = model.predict(test_images)
print(predictions.shape)

Podemos comparar las predicciones con las clases reales para obtener un accuracy que debe ser igual que el anterior:

In [None]:
y_test = np.argmax(predictions, axis=1)
aciertos = y_test == test_labels
acc_v2 = np.mean(aciertos)
print("Accuracy on test set (v2)= %f" % (acc_v2))

Veamos qué tal lo hace la red:

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(test_images[i], cmap=plt.cm.binary)
    plt.xlabel("%d, %d" % (test_labels[i], y_test[i]))

### Matriz de confusión:

In [None]:
from sklearn.metrics import confusion_matrix

confusion_matrix(test_labels, y_test)

### <font color="#CA3532">Cómo guardar un modelo entrenado</font>

Para guardar los pesos del modelo una vez entrenado, usamos el método *save_weights()*:

In [None]:
model.save_weights('./logs/keras/modelo')
!ls logs/keras

Para cargar los pesos posteriormente, usamos el método *load_weights*. Para poder cargar los pesos en un modelo, el modelo debe estar creado de la misma manera que los pesos guardados:

In [None]:
# Creamos un nuevo modelo con la misma estructura que el anterior:
model2 = keras.Sequential()
model2.add(keras.layers.Flatten(input_shape=(28, 28)))
model2.add(keras.layers.Dense(30, activation="sigmoid"))
model2.add(keras.layers.Dense(10, activation="softmax"))

# Compilamos el modelo nuevo:
model2.compile(optimizer=keras.optimizers.SGD(learning_rate=0.01),
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

# Cargamos los pesos a partir del fichero anterior:
model2.load_weights('./logs/keras/modelo')

# Evaluamos el modelo sobre el conjunto de test:
loss_test, acc_test = model2.evaluate(test_images, test_labels)
print("Loss on test set = %f" % (loss_test))
print("Accuracy on test set = %f" % (acc_test))

También es posible guardar un modelo completo, incluyendo la arquitectura y los pesos:

In [None]:
model.save('./logs/keras/modelo.h5')

Para cargar el modelo completo, usamos *keras.models.load_model()*:

In [None]:
model3 = keras.models.load_model('./logs/keras/modelo.h5')
model3.summary()
loss_test, acc_test = model.evaluate(test_images, test_labels)
print("Loss on test set = %f" % (loss_test))
print("Accuracy on test set = %f" % (acc_test))

### <font color="#CA3532">Algunos ejemplos de capas disponibles en Keras</font>

Capa completamente conectada de 30 unidades ReLU:

In [None]:
keras.layers.Dense(30, activation="relu")

Que también se puede especificar así:

In [None]:
keras.layers.Dense(30, activation=tf.nn.relu)

Capa completamente conectada de 30 unidades sigmoides:

In [None]:
keras.layers.Dense(30, activation="sigmoid")

Que también se puede especificar así:

In [None]:
keras.layers.Dense(30, activation=tf.nn.sigmoid)

Capa lineal (no se especifica función de activación) con regularización L1 aplicada a los pesos:

In [None]:
keras.layers.Dense(30, kernel_regularizer=keras.regularizers.l1(0.01))

Capa lineal (no se especifica función de activación) con regularización L2 aplicada al bias:

In [None]:
keras.layers.Dense(30, bias_regularizer=keras.regularizers.l2(0.01))

Capa lineal (no se especifica función de activación) con los pesos inicializados según una normal N(0, 0.1):

In [None]:
keras.layers.Dense(30, kernel_initializer=keras.initializers.RandomNormal(mean=0.0, stddev=0.1))

Capa lineal (no se especifica función de activación) con los pesos inicializados según Xavier:

In [None]:
keras.layers.Dense(30, kernel_initializer=keras.initializers.glorot_normal())

Capa que sólo aplica la función de activación (ReLU):

In [None]:
keras.layers.Activation(tf.nn.relu)

Capa de dropout con una probabilidad de 0.2 de eliminar una neurona:

In [None]:
keras.layers.Dropout(0.2)

Capa de normalización batch:

In [None]:
keras.layers.BatchNormalization()

El siguiente modelo combina algunas de las capas anteriores:

In [None]:
model = keras.Sequential()

model.add(keras.layers.Flatten(input_shape=(28, 28)))
model.add(keras.layers.Dense(256, kernel_regularizer=keras.regularizers.l2(1.0)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation(tf.nn.relu))
model.add(keras.layers.Dropout(rate=0.5))
model.add(keras.layers.Dense(64, kernel_regularizer=keras.regularizers.l2(1.0)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.Activation(tf.nn.relu))
model.add(keras.layers.Dense(10, kernel_regularizer=keras.regularizers.l2(1.0)))
model.add(keras.layers.Activation(tf.nn.softmax))

Imprimamos un resumen del modelo:

In [None]:
model.summary()

In [None]:
model.compile(optimizer=keras.optimizers.Adam(),
              loss='sparse_categorical_crossentropy',
              metrics=['acc'])

In [None]:
nepochs = 30
history = model.fit(train_images,
                    train_labels,
                    epochs=nepochs,
                    validation_data=(test_images, test_labels),
                    batch_size=1000)

In [None]:
hd = history.history

epochs = range(1, nepochs+1)

plt.figure(figsize=(12,6))

plt.subplot(1,2,1)
plt.plot(epochs, hd['acc'], "r", label="train")
plt.plot(epochs, hd['val_acc'], "b", label="valid")
plt.grid(True)
plt.xlabel("epoch")
plt.ylabel("accuracy")
plt.title("Accuracy")
plt.legend()

plt.subplot(1,2,2)
plt.plot(epochs, hd['loss'], "r", label="train")
plt.plot(epochs, hd['val_loss'], "b", label="valid")
plt.grid(True)
plt.xlabel("epoch")
plt.ylabel("loss")
plt.title("Loss")
plt.legend()

plt.show()

## <font color="#CA3532">Ejercicio</color>

Entrenar un buen modelo para MNIST

In [None]:
# Variables que no vamos a modificar
log_dir = "./models/"
input_shape = (28, 28)
num_clases = 10
n_epochs = 10

LEARNING_RATE_BASE = 0.01
LEARNING_RATE_SMALL = 0.0001
LEARNING_RATE_BIG = 1.0

BATCH_SIZE_BASE = 400
BATCH_SIZE_BIG = 1000
BATCH_SIZE_SMALL = 50

### <font color="#CA3532">Modelo base</font>

In [None]:
# Caso base:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'base'

In [None]:
# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

### <font color="#CA3532">Prueba de diferentes learning-rate</font>

In [None]:
# Prueba learning-rate small:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_SMALL
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'learningRate-small'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Prueba learning-rate big:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Prueba learning-rate verybig:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BIG * 10000
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'learningRate-verybig'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### <font color="#CA3532">Prueba de diferentes batch-size</font>

In [None]:
# Caso batchsize big:
batch_size = BATCH_SIZE_BIG
learning_rate = LEARNING_RATE_BASE
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize small:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BASE
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-small'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### <font color="#CA3532">Prueba de diferentes learning-rate con diferentes batch-size</font>

In [None]:
# Caso batchsize small learningrate small:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_SMALL
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-small-learningRate-small'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize small learningrate big:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize small learningrate verybig:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG * 10000
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-small-learningRate-verybig'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize big learningrate small:
batch_size = BATCH_SIZE_BIG
learning_rate = LEARNING_RATE_SMALL
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-big-learningRate-small'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize big learningrate big:
batch_size = BATCH_SIZE_BIG
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-big-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso batchsize big learningrate verybig:
batch_size = BATCH_SIZE_BIG
learning_rate = LEARNING_RATE_BIG * 10000
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
nombre = 'batchSize-big-learningRate-verybig'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### <font color="#CA3532">Prueba con función de coste diferentes</font>

Para utilizar las funciones de coste ``mean_squared_error`` y ``categorical_hinge``, es necesario que los labels estén en formato onehot (también conocido como labels categóricos). Podéis consultar la API de Keras:

**Todos los losses disponibles**: https://keras.io/api/losses/

**MSE**: https://keras.io/api/losses/regression_losses/#mean_squared_error-function

**HINGE**: https://keras.io/api/losses/hinge_losses/#categorical_hinge-function

In [None]:
from tensorflow.keras.utils import to_categorical

In [None]:
train_labels_categorical = to_categorical(train_labels)
test_labels_categorical = to_categorical(test_labels)
print(train_labels[:10])
print(train_labels_categorical[:10])

In [None]:
# Caso MSE:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'sigmoid'
loss = 'mean_squared_error' ### IMPORTANTE cambiar aquí mean_squared_error
nombre = 'mse'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels_categorical, ### IMPORTANTE cambiar aquí train_labels_categorical
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels_categorical), ### IMPORTANTE cambiar aquí test_labels_categorical
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso MSE con learning rate BIG:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'mean_squared_error' ### IMPORTANTE cambiar aquí mean_squared_error
nombre = 'mse-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels_categorical, ### IMPORTANTE cambiar aquí train_labels_categorical
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels_categorical), ### IMPORTANTE cambiar aquí test_labels_categorical
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso hinge:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'sigmoid'
loss = 'categorical_hinge' ### IMPORTANTE cambiar aquí categorical_hinge
nombre = 'hinge'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation=None, name="salida")) ## IMPORTANTE activation = None

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels_categorical, ### IMPORTANTE cambiar aquí train_labels_categorical
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels_categorical), ### IMPORTANTE cambiar aquí test_labels_categorical
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso hinge con learning rate BIG:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'categorical_hinge' ### IMPORTANTE cambiar aquí categorical_hinge
nombre = 'hinge-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation=None, name="salida")) ## IMPORTANTE activation = None

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels_categorical, ### IMPORTANTE cambiar aquí train_labels_categorical
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels_categorical), ### IMPORTANTE cambiar aquí test_labels_categorical
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### <font color="#CA3532">Prueba con diferentes funciones de activación en la capa oculta</font>

In [None]:
# Caso tanh:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'tanh'
loss = 'sparse_categorical_crossentropy'
nombre = 'tanh'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso relu:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'relu'
loss = 'sparse_categorical_crossentropy'
nombre = 'relu'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso softplus:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'softplus'
loss = 'sparse_categorical_crossentropy'
nombre = 'softplus'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso elu:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'elu'
loss = 'sparse_categorical_crossentropy'
nombre = 'elu'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso selu:
batch_size = BATCH_SIZE_BASE
learning_rate = LEARNING_RATE_BASE
activation = 'selu'
loss = 'sparse_categorical_crossentropy'
nombre = 'selu'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso tanh best batch (SMALL) and learningrate (BIG):
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'tanh'
loss = 'sparse_categorical_crossentropy'
nombre = 'tanh-batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso relu best batch (SMALL) and learningrate (BIG):
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'relu'
loss = 'sparse_categorical_crossentropy'
nombre = 'relu-batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso softplus best batch (SMALL) and learningrate (BIG):
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'softplus'
loss = 'sparse_categorical_crossentropy'
nombre = 'softplus-batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso elu best batch (SMALL) and learningrate (BIG):
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'elu'
loss = 'sparse_categorical_crossentropy'
nombre = 'elu-batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Caso selu best batch (SMALL) and learningrate (BIG):
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'selu'
loss = 'sparse_categorical_crossentropy'
nombre = 'selu-batchSize-small-learningRate-big'

# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### <font color="#CA3532">Prueba cómo contrarrestar Overfitting</font>

¿Cómo podemos evitar el Overfitting?

- Usando más datos en entrenamiento.

- Usando un conjunto de validación y **early stopping**.

- Aplicando algún tipo de **regularización** (L1, L2 o Dropout).

#### Regularizacion L1

In [None]:
# Vuelvo a ejecutar el modelo base para un experimento nuevo. Quiero hacer la suma
# del valor absoluto de los pesos de la capa densa intermedia. Además, quiero mostrar
# los pesos de la capa de entrada de forma global y en particular para cada neurona

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l1reg = 0.0 # Si añades regularizacion 0.0 es como no añadir nada
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l1(l1reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

In [None]:
print(model.layers[1].weights[0].shape)
plt.imshow(np.abs(model.layers[1].weights[0]).sum(axis=1).reshape(28,-1))
plt.colorbar()
plt.show()

In [None]:
max_value = np.abs(model.layers[1].weights[0].numpy()).max()
plt.figure(figsize=(15,15))
for i, neuron_weights in enumerate(model.layers[1].weights[0].numpy().T):
  plt.subplot(8,8,i+1)
  plt.title("Neurona "+str(i))
  plt.imshow(neuron_weights.reshape(28,28), vmin=-max_value, vmax=max_value, cmap="bwr")
  plt.xticks([], [])
  plt.yticks([], [])
plt.show()

In [None]:
# Ahora probamos a añadir la regularización para comparar. Vamos a analizar los pesos
# de la misma manera

# Caso base con regularización L1 pequeña:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l1reg = 0.0001
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-l1reg-small'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l1(l1reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

In [None]:
print(model.layers[1].weights[0].shape)
plt.imshow(np.abs(model.layers[1].weights[0]).sum(axis=1).reshape(28,-1))
plt.colorbar()
plt.show()

In [None]:
max_value = np.abs(model.layers[1].weights[0].numpy()).max()
plt.figure(figsize=(15,15))
for i, neuron_weights in enumerate(model.layers[1].weights[0].numpy().T):
  plt.subplot(8,8,i+1)
  plt.title("Neurona "+str(i))
  plt.imshow(neuron_weights.reshape(28,28), vmin=-max_value, vmax=max_value, cmap="bwr")
  plt.xticks([], [])
  plt.yticks([], [])
plt.show()

In [None]:
# Probamos una regularización más alta y volvemos a hacer el análisis de pesos:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l1reg = 0.1
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-l1reg-big'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l1(l1reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

In [None]:
print(model.layers[1].weights[0].shape)
plt.imshow(np.abs(model.layers[1].weights[0]).sum(axis=1).reshape(28,-1))
plt.colorbar()
plt.show()

In [None]:
max_value = np.abs(model.layers[1].weights[0].numpy()).max()
plt.figure(figsize=(15,15))
for i, neuron_weights in enumerate(model.layers[1].weights[0].numpy().T):
  plt.subplot(8,8,i+1)
  plt.title("Neurona "+str(i))
  plt.imshow(neuron_weights.reshape(28,28), vmin=-max_value, vmax=max_value, cmap="bwr")
  plt.xticks([], [])
  plt.yticks([], [])
plt.show()

**Discutir los resultados**

* ¿Qué está pasando con la regularización L1?

In [None]:
# Probamos ahora la activación RELU que sí que tiene más overfitting para ver si las curvas
# de aprendizaje ya no se diferencian tanto en train y test

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'relu'
loss = 'sparse_categorical_crossentropy'
l1reg = 0.001
nombre = 'OVERFITTING-relu-batchSize-small-learningRate-big-l1reg-mid'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l1(l1reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

**¿Qué está pasando con L1?**

La penalización por regularización L1 ignora el valor de los pesos y provoca un decaimiento de los pesos de manera constante. Un learning rate alto hace que los pesos se penalicen de forma agresiva.

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

#### Regularizacion L2

In [None]:
# Probamos ahora a introducir L2 en vez de L1 para regularizar

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l2reg = 0.001
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-l2reg-small'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l2(l2reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

In [None]:
# Probamos un poco más alta la regularización

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l2reg = 0.01
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-l2reg-mid'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l2(l2reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

**¿Qué está pasando con L2?**

La penalización por regularización L2 suaviza la penalización de los pesos, ya que cuanto mayor es el peso mayor es la penalización.

#### Regularizacion L1 + L2

In [None]:
# Probamos a añadir regularización L1 y L2 a la vez con tf.keras.regularizers.l1_l2

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
l1reg = 0.0001
l2reg = 0.01
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-l1reg-small-l2reg-mid'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dense(64, activation=activation, name="oculta",
                             kernel_regularizer=keras.regularizers.l1_l2(l1reg, l2reg)))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
print(model.layers)
print(model.layers[1])
print("\nSuma de valores absolutos de pesos:", np.abs(model.layers[1].weights[0]).sum())

#### Dropout

In [None]:
# Vamos a probar añadir dropout entre la capa flatten y la capa densa intermedia

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
dropout = 0.2
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-dropout-02'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dropout(dropout))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Vamos a probar añadir dropout entre la capa flatten y la capa densa intermedia

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
dropout = 0.5
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-dropout-05'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dropout(dropout))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
# Vamos a probar añadir dropout entre la capa flatten y la capa densa intermedia

# Caso base:
batch_size = BATCH_SIZE_SMALL
learning_rate = LEARNING_RATE_BIG
activation = 'sigmoid'
loss = 'sparse_categorical_crossentropy'
dropout = 0.8
nombre = 'OVERFITTING-base-batchSize-small-learningRate-big-dropout-08'


# Volvemos a crear el modelo para que se empiece a entrenar desde 0:
model = keras.Sequential()
model.add(keras.layers.Flatten(input_shape=input_shape, name="entrada"))
model.add(keras.layers.Dropout(dropout))
model.add(keras.layers.Dense(64, activation=activation, name="oculta"))
model.add(keras.layers.Dense(num_clases, activation="softmax", name="salida"))

model.compile(optimizer=keras.optimizers.SGD(learning_rate=learning_rate),
              loss=loss,
              metrics=['acc'])

# Callback a TensorBoard:
callbacks = [keras.callbacks.TensorBoard(log_dir=log_dir+"prueba-"+nombre, histogram_freq=1, write_images=True)]

# Entrenamiento del modelo:
history = model.fit(train_images,
                    train_labels,
                    epochs=n_epochs,
                    validation_data=(test_images, test_labels),
                    batch_size=batch_size,
                    callbacks=callbacks)

In [None]:
%reload_ext tensorboard
%tensorboard --logdir $log_dir

### Ejercicio para casa:

¿Qué pasaría si añadimos regularización a la capa de salida? Pruébalo y haz un análisis similar de los pesos. Discutamos los resultados el próximo día.