![image](https://drive.google.com/u/0/uc?id=15DUc09hFGqR8qcpYiN1OajRNaASmiL6d&export=download)

# **Taller No. 12 - ISIS4825**

## **Redes Neuronales Convolucionales, Arquitecturas Neuronales y Aprendizaje Profundo**
## **Contenido**
1. [**Objetivos**](#id1)
2. [**Problema**](#id2)
3. [**Importando las librerías necesarias para el laboratorio**](#id3)
4. [**Visualización y Análisis Exploratorio**](#id4)
5. [**Preparación de los Datos**](#id5)
6. [**Modelamiento**](#id6)
7. [**Predicción**](#id7)
8. [**Validación**](#id8)
9. [**Trabajo Asíncrono**](#id9)

## **Objetivos**<a name="id1"></a>
- Entrar en materia con las Redes Neuronales Convolucionales.
- Conocer tácticas de aprendizaje y mejoras en el entrenamiento.
- Familiarizarse con el aprendizaje por transferencia.
- Entrar en más profundidad con los gadgets de `TensorFlow` y `Keras`.
- Aprender arquitecturas neuronales, sus ventajas y desventajas.

## **Problema**<a name="id2"></a>
- En un cultivo de gran área se busca hacer la clasificación de 9 especies que allí habitan, dado que hacen daño al ecosistema. El objetivo de la clasificación es identificar correctamente la planta, para respectiva su extracción.

## **Notebook Configuration**

In [None]:
!shred -u setup_colab.py
!shred -u setup_colab_general.py
!wget -q "https://github.com/jpcano1/python_utils/raw/main/setup_colab_general.py" -O setup_colab_general.py
!wget -q "https://github.com/jpcano1/python_utils/raw/main/ISIS_4825/setup_colab.py" -O setup_colab.py
import setup_colab as setup
setup.setup_workshop_12()

## **Importando las librerías necesarias para el laboratorio**<a name="id3"></a>

In [None]:
import os

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import backend as K
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_datasets as tfds

from skimage import io

from utils import general as gen
from utils import tf_utils

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt
plt.style.use("seaborn-deep")
import seaborn as sns

from sklearn.model_selection import ShuffleSplit
from sklearn.utils import resample
from sklearn.metrics import confusion_matrix

### **Carga de Datos**

In [None]:
data_dir = gen.read_listdir("data")
labels = pd.read_csv("data/labels.csv")

In [None]:
labels.head()

## **Visualización y Análisis Exploratorio**<a name="id4"></a>
- En esta ocación nos vamos a enfrentar al dataset *Deep Weeds*. Que es un dataset que busca clasificar distintas especies de maleza silvestre para efectuar un control de plagas vegetativas más certero. Las especies son las siguientes:
    - Chinee apple
    - Lantana
    - Negative
    - Parkinsonia
    - Parthenium
    - Prickly acacia
    - Rubber vine
    - Siam weed
    - Snake weed


In [None]:
np.random.seed(5678)
random_sample = np.random.choice(len(data_dir), 9)

In [None]:
imgs = []
for i in random_sample:
    img = io.imread(data_dir[i])
    imgs.append(img)

In [None]:
gen.visualize_subplot(imgs, labels.loc[random_sample, "Species"].values, 
                      (3, 3), (10, 10))

In [None]:
distribution = labels["Species"].value_counts().sort_index()

In [None]:
ax = sns.barplot(x=distribution.index, y=distribution.values, palette="Set1")
ax.set_xticklabels(distribution.index, rotation=45)
plt.show()

In [None]:
n_samples = 1100

In [None]:
pos_class = labels.query("Label != 8")
neg_class = labels.query("Label == 8")

In [None]:
neg_class_downsampled = resample(neg_class, replace=False, 
                                  n_samples=n_samples, 
                                  random_state=1234)

In [None]:
labels_resampled = pd.concat([pos_class, neg_class_downsampled])

In [None]:
labels_resampled.reset_index(drop=True, inplace=True)

In [None]:
labels = labels_resampled.copy()

In [None]:
labels["Species"].value_counts().sort_index()

## **Preparación de los Datos**<a name="id5"></a>

In [None]:
shuffle_split = ShuffleSplit(test_size=0.2, random_state=1234)

In [None]:
for full_train_index, test_index in shuffle_split.split(labels):
    full_train_set = labels.loc[full_train_index]
    test_set = labels.loc[test_index]

In [None]:
full_train_set.reset_index(drop=True, inplace=True)
test_set.reset_index(drop=True, inplace=True)

In [None]:
for train_index, val_index in shuffle_split.split(full_train_set):
    train_set = full_train_set.loc[train_index]
    val_set = full_train_set.loc[val_index]

In [None]:
train_set.reset_index(drop=True, inplace=True)
val_set.reset_index(drop=True, inplace=True)

In [None]:
train_datagen = ImageDataGenerator(horizontal_flip=True,
                                   vertical_flip=True, 
                                   zoom_range=0.5,
                                   rescale=1/255.,
                                   rotation_range=10,
                                   brightness_range=[1, 1.5], 
                                   fill_mode="wrap")

val_dataget = ImageDataGenerator(rescale=1/255.)

In [None]:
size = (128, 128)

In [None]:
train_generator = train_datagen.flow_from_dataframe(train_set, 
                                                    directory="data",
                                                    x_col="Filename", 
                                                    y_col="Species", 
                                                    target_size=size)

valid_generator = val_dataget.flow_from_dataframe(val_set, directory="data",
                                                  x_col="Filename", 
                                                  y_col="Species", 
                                                  target_size=size, 
                                                  shuffle=False, 
                                                  batch_size=16)

In [None]:
np.random.seed(1234)
random_batch = np.random.randint(0, len(train_generator))
X_batch, y_batch = train_generator[random_batch]

In [None]:
np.random.seed(1234)
random_sample = np.random.choice(len(X_batch), 9)
y_batch = labels.Species.unique()[ y_batch[random_sample].argmax(axis=1)]

In [None]:
gen.visualize_subplot(X_batch[random_sample], 
                      y_batch, (3, 3), (10, 10))

## **Modelamiento**<a name="id6"></a>
- Inception:

![image](https://cloud.google.com/tpu/docs/images/inceptionv3onc--oview.png?hl=es)

In [None]:
if not os.path.exists("models"):
    os.makedirs("models")

weights_dir = "models/weights.h5"

In [None]:
base_model = keras.applications.InceptionV3(include_top=False,
                                            weights="imagenet", 
                                            input_shape=(*size, 3))
global_avg = keras.layers.GlobalAveragePooling2D()(base_model.output)
dense_1 = tf_utils.DenseBlock(128)(global_avg)
dense_2 = tf_utils.DenseBlock(64)(dense_1)

output = keras.layers.Dense(9, activation="softmax")(dense_2)
model = keras.Model(inputs=base_model.inputs, outputs=output)

lr = 1e-3

In [None]:
for layer in base_model.layers:
    layer.trainable = False

In [None]:
model.summary()

In [None]:
optimizer = keras.optimizers.Adam(lr=lr)
model.compile(optimizer=optimizer, loss="categorical_crossentropy", 
              metrics=["acc"])
params = {
    "steps_per_epoch": train_generator.samples // train_generator.batch_size,
    "validation_steps": valid_generator.samples // valid_generator.batch_size,
    "epochs": 5,
    "validation_data": valid_generator
}

In [None]:
history = model.fit(train_generator, **params)

In [None]:
for layer in base_model.layers:
    layer.trainable = True

In [None]:
model.summary()

### **Adam Optimizer**
- Adam se refiere a *adaptative moment estimation*, y es un optimizador rápido con buen grado de convergencia, es ampliamente usado y es una variación más de SGD.
- He aquí una tabla con las características de los mejores optimizadores.

|Class|Convergence Speed|Convergence Quality|
|---|---|---|
|SGD|*|***|
|SGD(momentum=...)|**|***|
|SGD(momentum=..., nesterov=True)|**|***|
|Adagrad|***|* (Se detiene muy antes)|
|RMSprop|***|** o ***|
|Adam|***|** o ***|
|Nadam|***|** o ***|
|AdaMax|***|** o ***|

In [None]:
optimizer = keras.optimizers.Adam(lr=lr)
metrics = [keras.metrics.Precision(name="Precision"), 
           keras.metrics.Recall(name="Recall"), "accuracy"]

callbacks = [tf_utils.CustomCallback(weights_dir, patience=5)]

model.compile(optimizer=optimizer, loss="categorical_crossentropy", 
              metrics=metrics)

In [None]:
params = {
    "steps_per_epoch": train_generator.samples // train_generator.batch_size,
    "validation_steps": valid_generator.samples // valid_generator.batch_size,
    "callbacks": callbacks,
    "epochs": 10,
    "validation_data": valid_generator
}

In [None]:
history = model.fit(train_generator, **params)

## **Predicción**<a name="id7"></a>



In [None]:
inception_dir = "models/inception.h5"

if os.path.exists(inception_dir):
    model.load_weights(inception_dir)
    print("Weights Loaded!!")
    lr = 1e-4

In [None]:
test_datagen = ImageDataGenerator(rescale=1/255.)

test_generator = test_datagen.flow_from_dataframe(test_set, directory="data", 
                                                  x_col="Filename", 
                                                  y_col="Species", 
                                                  target_size=size, 
                                                  shuffle=False, 
                                                  batch_size=16)

In [None]:
np.random.seed(1234)
random_batch = np.random.randint(0, len(test_generator))
X_batch, y_batch = test_generator[random_batch]

In [None]:
np.random.seed(5678)
random_sample = np.random.choice(len(X_batch), 9)

In [None]:
y_pred = model.predict(X_batch[random_sample])
y_pred = labels["Species"].unique()[y_pred.argmax(axis=1)]

In [None]:
y_batch = labels["Species"].unique()[y_batch[random_sample].argmax(axis=1)]

In [None]:
titles = [f"{y_t} - {y_p}" for y_t, y_p in zip(y_batch, y_pred)]

In [None]:
gen.visualize_subplot(
    X_batch[random_sample], 
    titles, (3, 3), (10, 10)
)

## **Validación**<a name="id8"></a>

In [None]:
loss, precision, recall, acc = model.evaluate(test_generator)

In [None]:
loss

In [None]:
precision

In [None]:
recall

In [None]:
acc

In [None]:
y_pred = model.predict(test_generator)

In [None]:
y_pred = y_pred.argmax(axis=1)

In [None]:
conf_matrix = confusion_matrix(test_generator.labels, y_pred)

In [None]:
plt.matshow(conf_matrix, cmap="gray")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

In [None]:
norm_conf_mat = conf_matrix / conf_matrix.sum(axis=1, keepdims=True)
np.fill_diagonal(norm_conf_mat, 0)

In [None]:
plt.matshow(norm_conf_mat, cmap="gray")
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.show()

## **Trabajo Asíncrono**<a name="id9"></a>
1. Investigue sobre las redes convolucionales pre-entrenadas que vienen incorporadas en `keras.applications` y seleccione una para resolver el problema propuesto del taller (conjunto de datos DeepWeeds). Justifique por qué la está usando (máximo un párrafo) y compare métricas de desempeño sobre el test set con respecto a los arrojados por la arquitectura del taller.
2. Ahora, tome un dataset de los que vienen incorporados en [TensorFlow](https://www.tensorflow.org/datasets/catalog/overview?hl=es-419) y resuelva el problema de clasificación asociado usando aprendizaje por transferencia (No está permitido seleccionar un dataset de la familia MNIST, ni tampoco el dataset del proyecto).
3. Desarrolle su propio [callback](https://www.tensorflow.org/guide/keras/custom_callback?hl=es-419) para resolver nuevamente el problema del punto anterior.