<img src="https://github.com/hernancontigiani/ceia_memorias_especializacion/raw/master/Figures/logoFIUBA.jpg" width="250" align="center">

*TRABAJO FINAL - VISIÓN POR COMPUTADORA II - JUAN I. MUNAR*

#### **SKIN CANCER: HAM10000**
#### PARTE 2 DE 2

##### **3.2. UNDERSAMPLING Y OVERSAMPLING**

Se continuará el análisis iniciado en la parte 1 modificando el tratamiento de los datos desbalanceados, realizando undersampling y oversampling

In [None]:
# Librerías básicas
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import matplotlib.image as mpimg
import os

In [None]:
# Ubicaciones y df con One Hot Encoding de clases para cada imagen
image_dir = "/kaggle/input/ham1000-segmentation-and-classification/images"
mask_dir = "/kaggle/input/ham1000-segmentation-and-classification/masks"
df = pd.read_csv("/kaggle/input/ham1000-segmentation-and-classification/GroundTruth.csv")

In [None]:
# Obtención de labels de las columnas
df['label'] = df.drop(columns = ["image"], axis = 1).idxmax(axis=1)

In [None]:
# Ordeno las columnas
orden_columnas = ['AKIEC', 'BCC', 'BKL', 'DF', 'MEL', 'NV', 'VASC', 'image', 'label']
clases = ['AKIEC', 'BCC', 'BKL', 'DF', 'MEL', 'NV', 'VASC']
df = df[orden_columnas]

In [None]:
# Creo directorios en kaggle
!mkdir -p datasets/train datasets/test
%cd /kaggle/working/datasets/train
!mkdir AKIEC BCC BKL DF MEL NV VASC
%ls

%cd /kaggle/working/datasets/test
!mkdir AKIEC BCC BKL DF MEL NV VASC
%ls

%cd /kaggle/working

In [None]:
# Recorto las clases a un valor máximo
max_size = 600

for clase in clases:
    n_0 = sum(df[clase])
    if n_0 > max_size:
        n_filas_eliminar = int(n_0 - max_size)
        indices_eliminar = df[df[clase] == 1].sample(n_filas_eliminar).index
        df = df.drop(indices_eliminar)
df.label.value_counts()

In [None]:
# Divido el dataframe en X (nombre de las imágenes) e y (clase)
from sklearn.model_selection import train_test_split
X = df['image']
y = df
X_train, X_test, y_train, y_test = train_test_split(X,
                                                    df,
                                                    test_size=0.1,
                                                    random_state=42,
                                                    stratify=df['label'])

Hago un oversampling de las clases minoritarias.

In [None]:
# Resampleo categóricamente (SMOTEN)
from imblearn.over_sampling import SMOTEN
sampler = SMOTEN(random_state=0)
X = y_train.drop(columns=['label'], axis=1)
y = y_train['label']

X_res, y_res = sampler.fit_resample(X, y)

In [None]:
# Actualizo X_train
X_train = X_res['image']

In [None]:
# Importo las imágenes
import shutil
for clase in clases:
    imgs_name = X_train[X_res[clase]==1].to_list()
    for img in imgs_name:
        shutil.copy(f'/kaggle/input/ham1000-segmentation-and-classification/images/{img}.jpg',
                    f'/kaggle/working/datasets/train/{clase}')

for clase in clases:
    imgs_name = X_test[y_test[clase]==1].to_list()
    for img in imgs_name:
        shutil.copy(f'/kaggle/input/ham1000-segmentation-and-classification/images/{img}.jpg',
                    f'/kaggle/working/datasets/test/{clase}')

In [None]:
# Chequeo el split
%ls /kaggle/working/datasets/train/*/* | wc -l
%ls /kaggle/working/datasets/test/*/* | wc -l

##### **3.2.1. TRANSFER LEARNING**

Se probarán a continuación diferentes arquitecturas de clasificación clásicas sobre las cuales se hará transfer learning. La idea es aprovechar las capas que captan bordes, colores y texturas para utilizarlas en nuestro problema.

Es la intención realizar data augmentation en todos los casos para tener más datos a la vez que se tengan en cuenta diferencias de coloración, rotación, tamaño, etc. Se incluye este paso para cada punto en particular para adaptar las imágenes a los modelos preentrenados.

##### *3.2.1.1. VGG19*

Los detalles de la red se pueden leer en los siguientes docs [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556).

In [None]:
# Cargo el modelo descargado previamente
import torch
from torchvision import models

# Ruta al archivo del modelo preentrenado en Kaggle
modelo_ruta = "/kaggle/input/modelos-tf/vgg19-dcbb9e9d.pth"

# Cargar el modelo desde el archivo
vgg19 = models.vgg19(weights=None)
state_dict = torch.load(modelo_ruta)
vgg19.load_state_dict(state_dict)

In [None]:
import torchvision
from torchvision.transforms import v2
from torchvision import datasets, transforms
from torch.utils.data import DataLoader#, WeightedRandomSampler

# Utilizo la versión v2 de torchvision.transform
transform_train = v2.Compose([
    v2.RandomResizedCrop(224),
    v2.RandomHorizontalFlip(0.5),
    v2.ColorJitter(saturation=0.1, hue=0.1),
    v2.RandomRotation(45),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Transformaciones de test
transform_test = v2.Compose([
    v2.Resize(256),
    v2.CenterCrop(224),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Cargo datasets
train_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/train',
    transform=transform_train
)
test_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/test',
    transform=transform_test
)

# Cargo DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=32,
    #sampler=train_sampler,
    shuffle=True, # shuffle=True es incompatible con sampler
    num_workers=4,
    pin_memory=True
)
test_loader = DataLoader(
    test_dataset,
    batch_size=32,
    #sampler=test_sampler,
    shuffle=True,
    num_workers=4
)

In [None]:
# Congelo todas las capas de la red
for param in vgg19.parameters():
    param.requires_grad = False

In [None]:
# Modifico la última capa para adaptarse al número de clases de tu conjunto de datos
import torch.nn as nn

num_classes = len(train_dataset.classes)
last_layer_in_features = vgg19.classifier[-1].in_features
vgg19.classifier[-1] = torch.nn.Linear(
    in_features=last_layer_in_features,
    out_features=num_classes
)

In [None]:
# Defino la función de entrenamiento
def train(model, optimizer, criterion, metric, data, epochs, tb_writer=None):

    train_loader = data["train"]
    valid_loader = data["valid"]

    train_writer = tb_writer["train"]
    valid_writer = tb_writer["valid"]

    if tb_writer:
        train_writer.add_graph(model,
                               torch.zeros((1, 3, data["image_width"],
                                            data["image_height"])))
        valid_writer.add_graph(model,
                               torch.zeros((1, 3, data["image_width"],
                                            data["image_height"])))

    if torch.cuda.is_available():
        model.to("cuda")
        metric.to("cuda")

    train_loss = []
    train_met = []
    valid_loss = []
    valid_met = []

    best_val_loss = float('inf')

    for epoch in range(epochs):

        # Pongo el modelo en modo entrenamiento
        model.train()

        epoch_train_loss = 0.0
        epoch_train_metric = 0.0

        for train_data, train_target in train_loader:

            if torch.cuda.is_available():
                train_data = train_data.to("cuda")
                train_target = train_target.to("cuda")

            optimizer.zero_grad()
            output = model(train_data.float())
            loss = criterion(output, train_target)
            epoch_train_loss += loss.item()
            loss.backward()
            optimizer.step()

            metric_ = metric(output, train_target)
            epoch_train_metric += metric_.item()

        epoch_train_loss = epoch_train_loss / len(train_loader)
        epoch_train_metric = epoch_train_metric / len(train_loader)
        train_loss.append(epoch_train_loss)
        train_met.append(epoch_train_metric)

        # Pongo el modelo en modo testeo
        model.eval()

        epoch_valid_loss = 0.0
        epoch_valid_metric = 0.0

        for valid_data, valid_target in valid_loader:
            if torch.cuda.is_available():
                valid_data = valid_data.to("cuda")
                valid_target = valid_target.to("cuda")

            output = model(valid_data.float())
            epoch_valid_loss += criterion(output, valid_target).item()
            epoch_valid_metric += metric(output, valid_target).item()

        epoch_valid_loss = epoch_valid_loss / len(valid_loader)
        epoch_valid_metric = epoch_valid_metric / len(valid_loader)
        valid_loss.append(epoch_valid_loss)
        valid_met.append(epoch_valid_metric)

        if epoch_valid_loss < best_val_loss:
            best_val_loss = epoch_valid_loss
            best_params = model.state_dict()

        print("Epoch: {}/{} - Train loss {:.6f} - Train metric {:.6f} - Valid Loss {:.6f} - Valid metric {:.6f}".format(
        epoch+1, epochs, epoch_train_loss, epoch_train_metric, epoch_valid_loss, epoch_valid_metric))

        if tb_writer:
            train_writer.add_scalar("loss", epoch_train_loss, epoch)
            valid_writer.add_scalar("loss", epoch_valid_loss, epoch)
            train_writer.add_scalar("metric", epoch_train_metric, epoch)
            valid_writer.add_scalar("metric", epoch_valid_metric, epoch)
            train_writer.flush()
            valid_writer.flush()

    history = {}
    history["train_loss"] = train_loss
    history["train_met"] = train_met
    history["valid_loss"] = valid_loss
    history["valid_met"] = valid_met

    torch.save(best_params, 'best_model_params.pth')

    return history

In [None]:
# Alto y ancho de imágenes
H = 256
W = 256

In [None]:
# Corremos el entrenamiento
import torchmetrics
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim

optimizer = torch.optim.Adam(vgg19.parameters(), lr=0.0001)
loss = torch.nn.CrossEntropyLoss()
metric = torchmetrics.F1Score(task='multiclass', num_classes=num_classes)
data = {"train": train_loader,
        "valid": test_loader,
        "image_width": W,
        "image_height": H}
epochs = 50
writer = {"train": SummaryWriter(log_dir="transfer_learning_vgg/train"),
          "valid": SummaryWriter(log_dir="transfer_learning_vgg/valid")}

history = train(vgg19,
                optimizer,
                loss,
                metric,
                data,
                epochs,
                writer)

In [None]:
# Ploteo
fig, axs = plt.subplots(2, 1, figsize=(10, 10))

axs[0].plot(history["train_loss"])
axs[0].plot(history["valid_loss"])
axs[0].title.set_text('Error de Entrenamiento vs Validación')
axs[0].legend(['Train', 'Valid'])

axs[1].plot(history["train_met"])
axs[1].plot(history["valid_met"])
axs[1].title.set_text('F1 de Entrenamiento vs Validación')
axs[1].legend(['Train', 'Valid'])

In [None]:
# Cargo los mejores parámetros (min loss)
best_model_params = torch.load('best_model_params.pth')
vgg19.load_state_dict(best_model_params)

In [None]:
# Evaluación del accuracy y Confusion Matrix
from torchmetrics.classification import MulticlassConfusionMatrix

all_labels = []
all_preds = []

# Todo debe correr en el mismo sitio
device = 'cuda'

# Aseguro el modelo en evaluation
vgg19.to(device)
vgg19.eval()

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = vgg19(inputs)
        _, preds = torch.max(outputs, 1)

        all_labels.extend(labels.cpu().numpy())
        all_preds.extend(preds.cpu().numpy())

confmat = MulticlassConfusionMatrix(num_classes=num_classes)
confmat(torch.tensor(all_preds),
        torch.tensor(all_labels))

In [None]:
# Ploteo la matriz de confusion
ax_ = clases
fig_, ax_ = confmat.plot()

plt.xlabel("Predicted")
plt.ylabel("True")

ax_.set_xticklabels(clases)
plt.xticks(rotation=45)
ax_.set_yticklabels(clases)
plt.yticks(rotation=45)

plt.title("Confusion Matrix")
plt.show()

In [None]:
# Evaluación del accuracy
vgg19.eval()

correctas = 0
total = 0
device = 'cuda'

with torch.no_grad():
    for imagenes, etiquetas in test_loader:
        imagenes, etiquetas = imagenes.to(device), etiquetas.to(device)
        salidas = vgg19(imagenes)
        _, predicciones = torch.max(salidas.data, 1)
        total += etiquetas.size(0)
        correctas += (predicciones == etiquetas).sum().item()

# Calcular el accuracy
accuracy = correctas / total
print(f'Accuracy en el conjunto de pruebas: {accuracy * 100:.2f}%')

In [None]:
# Evalúo el accuracy por clase
vgg19.eval()  # Asegurarte de que el modelo esté en modo de evaluación
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vgg19.to(device)

correct_predictions_per_class = {i: 0 for i in range(num_classes)}
total_samples_per_class = {i: 0 for i in range(num_classes)}

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = vgg19(inputs)
        _, predicted_classes = torch.max(outputs, 1)

        for i in range(len(labels)):
            label = labels[i].item()
            prediction = predicted_classes[i].item()
            total_samples_per_class[label] += 1
            correct_predictions_per_class[label] += int(label == prediction)

class_accuracies = {}
for class_label, correct_predictions in correct_predictions_per_class.items():
    total_samples = total_samples_per_class[class_label]
    accuracy = correct_predictions / total_samples if total_samples > 0 else 0.0
    class_accuracies[class_label] = accuracy

for class_label, accuracy in class_accuracies.items():
    print(f'Accuracy for Class {clases[class_label]}: {accuracy:.2%}')

In [None]:
# Salvo el modelo y el estado del optimizador
ruta_modelo_completo = '/kaggle/working/vgg19'
torch.save({
    'modelo_estado_dict': vgg19.state_dict(),
    'optimizador_estado_dict': optimizer.state_dict(),
}, ruta_modelo_completo)

El resultado no es bueno.

##### *3.2.1.2. ResNet50*

Probemos ajustar la red ResNet50, esta vez utilizando descargas directas.

Los detalles de la red se pueden encontrar en [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385)

In [None]:
# Importo ResNet50
from torchvision.models import resnet50

# Ruta al archivo del modelo preentrenado en Kaggle
modelo_ruta = "/kaggle/input/modelos-tf/resnet50-11ad3fa6.pth"

# Cargar el modelo desde el archivo
resnet50_model = models.resnet50(weights=None)
state_dict = torch.load(modelo_ruta)
resnet50_model.load_state_dict(state_dict)

In [None]:
import torchvision
from torchvision.transforms import v2
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Utilizo la versión v2 de torchvision.transform
transform_train = v2.Compose([
    v2.RandomResizedCrop(224),
    v2.RandomHorizontalFlip(0.5),
    v2.ColorJitter(saturation=0.1, hue=0.1),
    v2.RandomRotation(45),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Transformaciones de test
transform_test = v2.Compose([
    v2.Resize(232),
    v2.CenterCrop(224),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Cargo datasets
train_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/train',
    transform=transform_train
)
test_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/test',
    transform=transform_test
)

# Cargo DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=64,
    shuffle=True,
    #sampler=train_sampler,
    num_workers=4,
    pin_memory=True
)
test_loader = DataLoader(
    test_dataset,
    batch_size=64,
    #sampler=test_sampler,
    shuffle=True,
    num_workers=4
)

In [None]:
# Congelo todas las capas de la red
for param in resnet50_model.parameters():
    param.requires_grad = False

In [None]:
# Modifico la última capa para adaptarse al número de clases de tu conjunto de datos
import torch.nn as nn

num_classes = len(train_dataset.classes)
last_layer_in_features = resnet50_model.fc.in_features
resnet50_model.fc = torch.nn.Linear(
    in_features=last_layer_in_features,
    out_features=num_classes
)

In [None]:
# Alto y ancho de imágenes
H = 232
W = 232

In [None]:
# Corremos el entrenamiento
import torchmetrics
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim

optimizer = torch.optim.Adam(resnet50_model.parameters(), lr=0.00001)
loss = torch.nn.CrossEntropyLoss()
metric = torchmetrics.F1Score(task='multiclass', num_classes=num_classes)
data = {"train": train_loader,
        "valid": test_loader,
        "image_width": W,
        "image_height": H}
epochs = 50
writer = {"train": SummaryWriter(log_dir="transfer_learning_RN50/train"),
          "valid": SummaryWriter(log_dir="transfer_learning_RN50/valid")}

history = train(resnet50_model,
                optimizer,
                loss,
                metric,
                data,
                epochs,
                writer)

In [None]:
# Ploteo
fig, axs = plt.subplots(2, 1, figsize=(10, 10))

axs[0].plot(history["train_loss"])
axs[0].plot(history["valid_loss"])
axs[0].title.set_text('Error de Entrenamiento vs Validación')
axs[0].legend(['Train', 'Valid'])

axs[1].plot(history["train_met"])
axs[1].plot(history["valid_met"])
axs[1].title.set_text('F1 de Entrenamiento vs Validación')
axs[1].legend(['Train', 'Valid'])

In [None]:
# Cargo los mejores parámetros (min loss)
best_model_params = torch.load('best_model_params.pth')
resnet50_model.load_state_dict(best_model_params)

In [None]:
# Evaluación del accuracy y Confusion Matrix
from torchmetrics.classification import MulticlassConfusionMatrix

all_labels = []
all_preds = []

# Todo debe correr en el mismo sitio
device = 'cuda'

# Aseguro el modelo en evaluation
resnet50_model.to(device)
resnet50_model.eval()

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = resnet50_model(inputs)
        _, preds = torch.max(outputs, 1)

        all_labels.extend(labels.cpu().numpy())
        all_preds.extend(preds.cpu().numpy())

confmat = MulticlassConfusionMatrix(num_classes=num_classes)
confmat(torch.tensor(all_preds),
        torch.tensor(all_labels))

In [None]:
# Ploteo la matriz de confusion
ax_ = clases
fig_, ax_ = confmat.plot()

plt.xlabel("Predicted")
plt.ylabel("True")

ax_.set_xticklabels(clases)
plt.xticks(rotation=45)
ax_.set_yticklabels(clases)
plt.yticks(rotation=45)

plt.title("Confusion Matrix")
plt.show()

In [None]:
# Evaluación del accuracy
resnet50_model.eval()

correctas = 0
total = 0
device = 'cuda'

with torch.no_grad():
    for imagenes, etiquetas in test_loader:
        imagenes, etiquetas = imagenes.to(device), etiquetas.to(device)
        salidas = resnet50_model(imagenes)
        _, predicciones = torch.max(salidas.data, 1)
        total += etiquetas.size(0)
        correctas += (predicciones == etiquetas).sum().item()

# Calcular el accuracy
accuracy = correctas / total
print(f'Accuracy en el conjunto de pruebas: {accuracy * 100:.2f}%')

In [None]:
resnet50_model.eval()  # Asegurarte de que el modelo esté en modo de evaluación
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
resnet50_model.to(device)

correct_predictions_per_class = {i: 0 for i in range(num_classes)}
total_samples_per_class = {i: 0 for i in range(num_classes)}

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = resnet50_model(inputs)
        _, predicted_classes = torch.max(outputs, 1)

        for i in range(len(labels)):
            label = labels[i].item()
            prediction = predicted_classes[i].item()
            total_samples_per_class[label] += 1
            correct_predictions_per_class[label] += int(label == prediction)

class_accuracies = {}
for class_label, correct_predictions in correct_predictions_per_class.items():
    total_samples = total_samples_per_class[class_label]
    accuracy = correct_predictions / total_samples if total_samples > 0 else 0.0
    class_accuracies[class_label] = accuracy

for class_label, accuracy in class_accuracies.items():
    print(f'Accuracy for Class {clases[class_label]}: {accuracy:.2%}')

In [None]:
# Salvo el modelo y el estado del optimizador
ruta_modelo_completo = '/kaggle/working/resnet50'
torch.save({
    'modelo_estado_dict': resnet50_model.state_dict(),
    'optimizador_estado_dict': optimizer.state_dict(),
}, ruta_modelo_completo)

##### *3.2.1.3. Vision Transformer*

Probemos ajustar la red Vision Transformer.

Los detalles de la red se pueden encontrar en: [An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale](https://arxiv.org/abs/2010.11929)

In [None]:
from torchvision.models import vit_b_16
# Ruta al archivo del modelo preentrenado en Kaggle
modelo_ruta = "/kaggle/input/modelos-tf/vit_b_16-c867db91.pth"

# Cargar el modelo desde el archivo
vitb16_model = models.vit_b_16(weights=None).to(device)
state_dict = torch.load(modelo_ruta)
vitb16_model.load_state_dict(state_dict)

In [None]:
import torchvision
from torchvision.transforms import v2
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Utilizo la versión v2 de torchvision.transform
transform_train = v2.Compose([
    v2.RandomResizedCrop(224),
    v2.RandomHorizontalFlip(0.5),
    v2.ColorJitter(saturation=0.1, hue=0.1),
    v2.RandomRotation(45),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Transformaciones de test
transform_test = v2.Compose([
    v2.Resize(224),
    v2.CenterCrop(224),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Cargo datasets
train_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/train',
    transform=transform_train
)
test_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/test',
    transform=transform_test
)

# Cargo DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=64,
    shuffle=True,
    #sampler=train_sampler,
    num_workers=4,
    pin_memory=True
)
test_loader = DataLoader(
    test_dataset,
    batch_size=64,
    #sampler=test_sampler,
    shuffle=True,
    num_workers=4
)

In [None]:
# Congelo todas las capas de la red
for param in vitb16_model.parameters():
    param.requires_grad = False

In [None]:
# Se modifica la última capa
import torch.nn as nn

device = "cuda" if torch.cuda.is_available() else "cpu"
num_classes = len(train_dataset.classes)
vitb16_model.heads = nn.Linear(in_features=768, out_features=num_classes).to(device)

In [None]:
# Alto y ancho de imágenes
H = 224
W = 224

In [None]:
# Corremos el entrenamiento
import torchmetrics
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim

optimizer = torch.optim.Adam(vitb16_model.parameters(), lr=0.00001)
loss = torch.nn.CrossEntropyLoss()
metric = torchmetrics.F1Score(task='multiclass', num_classes=num_classes)
data = {"train": train_loader,
        "valid": test_loader,
        "image_width": W,
        "image_height": H}
epochs = 50
writer = {"train": SummaryWriter(log_dir="transfer_learning_ViT/train"),
          "valid": SummaryWriter(log_dir="transfer_learning_ViT/valid")}

history = train(vitb16_model.to('cpu'),
                optimizer,
                loss,
                metric,
                data,
                epochs,
                writer)

In [None]:
# Ploteo
fig, axs = plt.subplots(2, 1, figsize=(10, 10))

axs[0].plot(history["train_loss"])
axs[0].plot(history["valid_loss"])
axs[0].title.set_text('Error de Entrenamiento vs Validación')
axs[0].legend(['Train', 'Valid'])

axs[1].plot(history["train_met"])
axs[1].plot(history["valid_met"])
axs[1].title.set_text('F1 de Entrenamiento vs Validación')
axs[1].legend(['Train', 'Valid'])

In [None]:
# Cargo los mejores parámetros (min loss)
best_model_params = torch.load('best_model_params.pth')
vitb16_model.load_state_dict(best_model_params)

In [None]:
# Evaluación del accuracy y Confusion Matrix
from torchmetrics.classification import MulticlassConfusionMatrix

all_labels = []
all_preds = []

# Todo debe correr en el mismo sitio
device = 'cuda'

# Aseguro el modelo en evaluation
vitb16_model.to(device)
vitb16_model.eval()

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = vitb16_model(inputs)
        _, preds = torch.max(outputs, 1)

        all_labels.extend(labels.cpu().numpy())
        all_preds.extend(preds.cpu().numpy())

confmat = MulticlassConfusionMatrix(num_classes=num_classes)
confmat(torch.tensor(all_preds),
        torch.tensor(all_labels))

In [None]:
# Ploteo la matriz de confusion
ax_ = clases
fig_, ax_ = confmat.plot()

plt.xlabel("Predicted")
plt.ylabel("True")

ax_.set_xticklabels(clases)
plt.xticks(rotation=45)
ax_.set_yticklabels(clases)
plt.yticks(rotation=45)

plt.title("Confusion Matrix")
plt.show()

In [None]:
# Evaluación del accuracy
vitb16_model.eval()

correctas = 0
total = 0
device = 'cuda'

with torch.no_grad():
    for imagenes, etiquetas in test_loader:
        imagenes, etiquetas = imagenes.to(device), etiquetas.to(device)
        salidas = vitb16_model(imagenes)
        _, predicciones = torch.max(salidas.data, 1)
        total += etiquetas.size(0)
        correctas += (predicciones == etiquetas).sum().item()

# Calcular el accuracy
accuracy = correctas / total
print(f'Accuracy en el conjunto de pruebas: {accuracy * 100:.2f}%')

In [None]:
vitb16_model.eval()  # Asegurarte de que el modelo esté en modo de evaluación
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
vitb16_model.to(device)

correct_predictions_per_class = {i: 0 for i in range(num_classes)}
total_samples_per_class = {i: 0 for i in range(num_classes)}

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = vitb16_model(inputs)
        _, predicted_classes = torch.max(outputs, 1)

        for i in range(len(labels)):
            label = labels[i].item()
            prediction = predicted_classes[i].item()
            total_samples_per_class[label] += 1
            correct_predictions_per_class[label] += int(label == prediction)

class_accuracies = {}
for class_label, correct_predictions in correct_predictions_per_class.items():
    total_samples = total_samples_per_class[class_label]
    accuracy = correct_predictions / total_samples if total_samples > 0 else 0.0
    class_accuracies[class_label] = accuracy

for class_label, accuracy in class_accuracies.items():
    print(f'Accuracy for Class {clases[class_label]}: {accuracy:.2%}')

In [None]:
# Salvo el modelo y el estado del optimizador
ruta_modelo_completo = '/kaggle/working/ViTb16'
torch.save({
    'modelo_estado_dict': vitb16_model.state_dict(),
    'optimizador_estado_dict': optimizer.state_dict(),
}, ruta_modelo_completo)

##### **3.2.2. ENTRENAMIENTO COMPLETO**

Se propone entrenar desde cero una arquitectura convolucional sencilla para evaluar el desempeño. La idea es que no sea lo suficientemente profunda como para requerir conexiones residuales o demasiado tiempo de entrenamiento.

In [None]:
CANTIDAD_CLASES = len(clases)
ANCHO_IMAGENES = 256
ALTO_IMAGENES = 256

In [None]:
class ConvModel(torch.nn.Module):
    def __init__(self, output_units):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(in_channels=3, out_channels=16, kernel_size=3, stride=1, padding='same')
        self.pool1 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = torch.nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3, stride=1, padding='same')
        self.pool2 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = torch.nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, stride=1, padding='same')
        self.pool3 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv4 = torch.nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, stride=1, padding='same')
        self.pool4 = torch.nn.MaxPool2d(kernel_size=2, stride=2)
        self.fc1 = torch.nn.Linear(in_features=32768, out_features=512)
        self.fc2 = torch.nn.Linear(in_features=512, out_features=output_units)

    def forward(self, x):
        x = self.pool1(torch.relu(self.conv1(x)))
        x = self.pool2(torch.relu(self.conv2(x)))
        x = self.pool3(torch.relu(self.conv3(x)))
        x = self.pool4(torch.relu(self.conv4(x)))
        x = torch.flatten(x, 1)
        x = torch.relu(self.fc1(x))
        x = self.fc2(x)
        return x

conv_model = ConvModel(CANTIDAD_CLASES)

In [None]:
# Observemos las versión corta de la arquitectura
for name, layer in conv_model.named_children():
    print(name, layer)

In [None]:
import torchvision
from torchvision.transforms import v2
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Utilizo la versión v2 de torchvision.transform
transform_train = v2.Compose([
    v2.RandomResizedCrop(256),
    v2.RandomHorizontalFlip(0.5),
    v2.ColorJitter(saturation=0.1, hue=0.1),
    v2.RandomRotation(45),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Transformaciones de test
transform_test = v2.Compose([
    v2.Resize(256),
    v2.CenterCrop(256),
    v2.ToTensor(),
    v2.Normalize(mean=[0.485, 0.456, 0.406],
                 std=[0.229, 0.224, 0.225]),
])

# Cargo datasets
train_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/train',
    transform=transform_train
)
test_dataset = torchvision.datasets.ImageFolder(
    root='/kaggle/working/datasets/test',
    transform=transform_test
)

# Cargo DataLoaders
train_loader = DataLoader(
    train_dataset,
    batch_size=64,
    shuffle=True,
    #sampler=train_sampler,
    num_workers=4,
    pin_memory=True
)
test_loader = DataLoader(
    test_dataset,
    batch_size=64,
    #sampler=test_sampler,
    shuffle=True,
    num_workers=4
)

In [None]:
# Corremos el entrenamiento
import torchmetrics
from torch.utils.tensorboard import SummaryWriter
import torch.optim as optim

optimizer = torch.optim.Adam(conv_model.parameters(), lr=0.00025)
loss = torch.nn.CrossEntropyLoss()
metric = torchmetrics.F1Score(task='multiclass', num_classes=num_classes)
data = {"train": train_loader,
        "valid": test_loader,
        "image_width": 256,
        "image_height": 256}
epochs = 50
writer = {"train": SummaryWriter(log_dir="transfer_learning_conv/train"),
          "valid": SummaryWriter(log_dir="transfer_learning_conv/valid")}

history = train(conv_model.to('cpu'),
                optimizer,
                loss,
                metric,
                data,
                epochs,
                writer)

In [None]:
# Ploteo
fig, axs = plt.subplots(2, 1, figsize=(10, 10))

axs[0].plot(history["train_loss"])
axs[0].plot(history["valid_loss"])
axs[0].title.set_text('Error de Entrenamiento vs Validación')
axs[0].legend(['Train', 'Valid'])

axs[1].plot(history["train_met"])
axs[1].plot(history["valid_met"])
axs[1].title.set_text('F1 de Entrenamiento vs Validación')
axs[1].legend(['Train', 'Valid'])

In [None]:
# Cargo los mejores parámetros (min loss)
best_model_params = torch.load('best_model_params.pth')
conv_model.load_state_dict(best_model_params)

In [None]:
# Evaluación del accuracy y Confusion Matrix
from torchmetrics.classification import MulticlassConfusionMatrix

all_labels = []
all_preds = []

# Todo debe correr en el mismo sitio
device = 'cuda'

# Aseguro el modelo en evaluation
conv_model.to(device)
conv_model.eval()

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = conv_model(inputs)
        _, preds = torch.max(outputs, 1)

        all_labels.extend(labels.cpu().numpy())
        all_preds.extend(preds.cpu().numpy())

confmat = MulticlassConfusionMatrix(num_classes=num_classes)
confmat(torch.tensor(all_preds),
        torch.tensor(all_labels))

In [None]:
# Ploteo la matriz de confusion
ax_ = clases
fig_, ax_ = confmat.plot()

plt.xlabel("Predicted")
plt.ylabel("True")

ax_.set_xticklabels(clases)
plt.xticks(rotation=45)
ax_.set_yticklabels(clases)
plt.yticks(rotation=45)

plt.title("Confusion Matrix")
plt.show()

In [None]:
# Evaluación del accuracy
conv_model.eval()

correctas = 0
total = 0
device = 'cuda'

with torch.no_grad():
    for imagenes, etiquetas in test_loader:
        imagenes, etiquetas = imagenes.to(device), etiquetas.to(device)
        salidas = conv_model(imagenes)
        _, predicciones = torch.max(salidas.data, 1)
        total += etiquetas.size(0)
        correctas += (predicciones == etiquetas).sum().item()

# Calcular el accuracy
accuracy = correctas / total
print(f'Accuracy en el conjunto de pruebas: {accuracy * 100:.2f}%')

In [None]:
conv_model.eval()  # Asegurarte de que el modelo esté en modo de evaluación
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
conv_model.to(device)

correct_predictions_per_class = {i: 0 for i in range(num_classes)}
total_samples_per_class = {i: 0 for i in range(num_classes)}

with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)

        outputs = conv_model(inputs)
        _, predicted_classes = torch.max(outputs, 1)

        for i in range(len(labels)):
            label = labels[i].item()
            prediction = predicted_classes[i].item()
            total_samples_per_class[label] += 1
            correct_predictions_per_class[label] += int(label == prediction)

class_accuracies = {}
for class_label, correct_predictions in correct_predictions_per_class.items():
    total_samples = total_samples_per_class[class_label]
    accuracy = correct_predictions / total_samples if total_samples > 0 else 0.0
    class_accuracies[class_label] = accuracy

for class_label, accuracy in class_accuracies.items():
    print(f'Accuracy for Class {clases[class_label]}: {accuracy:.2%}')

In [None]:
# Salvo el modelo y el estado del optimizador
ruta_modelo_completo = '/kaggle/working/conv'
torch.save({
    'modelo_estado_dict': conv_model.state_dict(),
    'optimizador_estado_dict': optimizer.state_dict(),
}, ruta_modelo_completo)

##### **4. CONCLUSIONES**

- Los modelos evaluados con transfer learning han performado pobremente, para obtener una buena performance uno debería reentrenar los pesos de toda la red porque parece haber Negative Transfer Learning. Los modelos fueron entrenados en datasets muy diferentes al utilizado en este trabajo, las capas preentrenadas no aportan significativamente a las predicciones del modelo, que en su capa lineal copia los datos mayoritarios, una red convolucional sencilla performa mejor que las redes profundas.
- El desbalance del DataSet es demasiado marcado, por más que se afecte el sampler, hay una gran diferencia entre la cantidad de veces que el modelo ve las imágenes de cada clase y termina repercutiendo en el resultado.
- Para obtener un buen desempeño, se debe balancear el dataset y entrenar un modelo desde cero con su respectivo costo computacional. El ideal parece ser trabajar con segmentación semántica, eliminando los errores provenientes del fondo y del pelo.