### **FUENTES**:

PetFinder Kaggle:

https://www.kaggle.com/competitions/petfinder-adoption-prediction/data

First Tutorial:

https://towardsdatascience.com/how-to-train-an-image-classifier-in-pytorch-and-use-it-to-perform-basic-inference-on-single-images-99465a1e9bf5

Second Deep Tutorial:

https://rumn.medium.com/part-1-ultimate-guide-to-fine-tuning-in-pytorch-pre-trained-model-and-its-configuration-8990194b71e

Logo Recognition API:

https://heartbeat.comet.ml/logo-recognition-ios-application-using-machine-learning-and-flask-api-aec4eff3be11

Hybrid (multimodal) neural network architecture : Combination of tabular, textual and image inputs:

https://medium.com/@dave.cote.msc/hybrid-multimodal-neural-network-architecture-combination-of-tabular-textual-and-image-inputs-7460a4f82a2e



### **INDICACIONES PREVIAS**:

+ **Git**:
    + Clonamos el repo: root de todos los repos y ponemos git clone "url_repo"
    + Hacemos el checkout de la rama main: git checkout -b new-branch

+ **Poetry**:
    + Instalamos poetry: https://python-poetry.org/docs/
    + Realizamos un Update del pyproject: poetry update
    + Activamos el entorno que creo poetry: poetry shell
    + Intentamos correr una celda, si nos pide seleccionar el environment y no lo vemos en la lista, cerrar y volver abrir VSC

+ **Torch y CUDA**:
    + Verificar que versión pide torch:
        + Versión de torch instalada: poetry show (en mi caso la 1.13.1)
        + Buscar la versión correspondiente en la documentación: https://pytorch.org/get-started/previous-versions/  (en mi caso el 11.7)
    + Instalar CUDA para Torch (buscar la versión correspondiente de CUDA): https://developer.nvidia.com/cuda-11-7-0-download-archive
    + Verificar que CUDA esté funcional: correr en una celda torch.cuda.is_available()

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [2]:
! pip install optuna
! pip install -U kaleido

Collecting optuna
  Downloading optuna-3.6.1-py3-none-any.whl (380 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m380.1/380.1 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.2-py3-none-any.whl (232 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.0/233.0 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.2-py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.5-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m9.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.5 alembic-1.13.2 colorlog-6.8.2 optuna-3.6.1
Collecting kaleido
  Downloading kaleido-0.2.1-py2.py3-none-manylinux1_x86_64.whl (79.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [3]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, cohen_kappa_score
import os
import shutil
import time
import copy
import datetime
from tqdm import tqdm
#import matplotlib.pyplot as plt
#import seaborn as sns
#import cv2
#from PIL import Image
#from pathlib import Path

import optuna
from optuna.artifacts import FileSystemArtifactStore, upload_artifact

import torch
import torchvision.models as models
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
import torch.nn.functional as F

from joblib import load, dump

from google.colab import files
import sys
sys.path.append("/content/drive/MyDrive/labo2/UA_MDM_LDI_II/tutoriales")
import utils
#from utils import plot_confusion_matrix
# Verificamos que CUDA está funcional
torch.cuda.is_available()

True

**Seteo el Modelo**

Teoría de Resnet: https://towardsdatascience.com/introduction-to-resnets-c0a830a288a4

In [4]:
# Importo modelo ResNet entrenado en Imagenet
resnet50 = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)
# Modificar la última capa para adaptarse a tu problema específico
num_ftrs = resnet50.fc.in_features
resnet50.fc = torch.nn.Linear(num_ftrs, 5) # Clasificación 5 clases
# Configuro para usar cuda si está disponible
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
resnet50 = resnet50.to(device)
# Instancio del criterio de pérdida CrossEntropyLoss
criterion = nn.CrossEntropyLoss()
# Instancio Stochastic Gradient Descent (SGD): Defino el parámetro del Learning Rate (define "el paso" en que avanzan los pesos en cada iteración) y el Momentum (pone innercia a la dirección del gradiente descendiente para que no cambie de dirección en minimos locales)
optimizer = optim.SGD(resnet50.parameters(), lr=0.001, momentum=0.9) # Parámetros default del SGD


Downloading: "https://download.pytorch.org/models/resnet50-11ad3fa6.pth" to /root/.cache/torch/hub/checkpoints/resnet50-11ad3fa6.pth
100%|██████████| 97.8M/97.8M [00:00<00:00, 149MB/s]


**Seteo parámetros, directorios y funciones**

In [5]:
# Paths
import os
BASE_DIR = "/content/drive/MyDrive/labo2"
PATH_TO_TRAIN = os.path.join(BASE_DIR, "input/petfinder-adoption-prediction/train/train.csv")
PATH_TO_IMAGES_DIR = os.path.join(BASE_DIR, "input/petfinder-adoption-prediction/train_images")
PATH_TO_TEMP_FILES = os.path.join(BASE_DIR, "UA_MDM_LDI_II/work/optuna_temp_artifacts")
PATH_TO_OPTUNA_ARTIFACTS = os.path.join(BASE_DIR, "UA_MDM_LDI_II/work/optuna_artifacts")

MODEL_NAME = '07 ResNet'

MODEL_VERSION = '1.0.0'

# Parametros y variables
CREATE_PYTORCH_DIRECTORIES = 0
SEED = 42
BATCH_SIZE = 80
TEST_SIZE = 0.2
IMAGE_SIZE = 299
CPU_CORES = os.cpu_count()

# Armo el nuevo directorio de train
new_train_directory = os.path.join(BASE_DIR, 'UA_MDM_LDI_II/work/train_images_classes')
os.makedirs(new_train_directory, exist_ok=True) # si ya existe el nombre, lo deja como está

# Armo el nuevo directorio de validación
new_val_directory = os.path.join(BASE_DIR, 'UA_MDM_LDI_II/work/val_images_classes')
os.makedirs(new_val_directory, exist_ok=True)

# Definir las clases ordenadas
class_names = ['0', '1', '2', '3', '4']

# Mapear las etiquetas de las clases a números enteros consecutivos
class_to_idx = {class_name: i for i, class_name in enumerate(class_names)}

# Creo las carpetas de clases dentro de los directorios
for clase in class_names: # Una para cada clase
   os.makedirs(os.path.join(new_train_directory, str(clase)), exist_ok=True)
   os.makedirs(os.path.join(new_val_directory, str(clase)), exist_ok=True)




# Funciones para la carga y el preproceso
def resize_to_square(im):
    old_size = im.shape[:2] # old_size is in (height, width) format
    # Calcula el factor de escala necesario para redimensionar la imagen de manera que el lado más largo tenga el tamaño deseado
    ratio = float(IMAGE_SIZE)/max(old_size)
    # Calcula las nuevas dimensiones de la imagen
    new_size = tuple([int(x*ratio) for x in old_size])
    # Redimensiona la imagen con el nuevo tamaño
    im = cv2.resize(im, (new_size[1], new_size[0]))
    # Calcula las diferencias de tamaño y agrega pixeles (color negro) en los extremos para que quede centrada y cuadrada
    delta_w = IMAGE_SIZE - new_size[1]
    delta_h = IMAGE_SIZE - new_size[0]
    top, bottom = delta_h//2, delta_h-(delta_h//2)
    left, right = delta_w//2, delta_w-(delta_w//2)
    color = [0, 0, 0]
    new_image = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT,value=color)
    return new_image


def load_image(pet_id):
    path_to_image = os.path.join(PATH_TO_IMAGES_DIR, f'{pet_id}-1.jpg') # Irá a la primera imagen de la mascota
    image = cv2.imread(path_to_image)
    # Convierte la imagen de BGR a RGB porque estos modelos esperan ese orden de canales
    image = cv2.convertScaleAbs(image)
    image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    new_image = resize_to_square(image)
    return new_image


In [6]:

def visualize_pet(pet_id):
    path_to_image = os.path.join(PATH_TO_IMAGES_DIR, f'{pet_id}-1.jpg') # Irá a la primera imagen de la mascota
    # Cargar la imagen
    image_to_show = cv2.imread(path_to_image)
    # Convertir a formato RGB
    image_to_show = cv2.cvtColor(image_to_show, cv2.COLOR_BGR2RGB)
    # Visualizar la imagen
    plt.imshow(image_to_show)
    plt.axis('off')  # No mostrar los ejes
    plt.show()

def visualize_image(image):
    # Convierte la imagen a un formato de enteros (CV_8U)
    image = cv2.convertScaleAbs(image)
    image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # Visualizar la imagen
    plt.imshow(image.astype(np.uint8))
    plt.axis('off')  # No mostrar los ejes
    plt.show()


**Cargo y Proceso Data**

Nota: Pytorch necesita que estén las imágenes en los distintos directorios según su clase y su participación en el training

In [7]:
# Cargo
train_df = pd.read_csv(PATH_TO_TRAIN)

# Split para validación
train_data, val_data = train_test_split(train_df,
                               test_size = TEST_SIZE,
                               random_state = SEED,
                               stratify = train_df.AdoptionSpeed)




if CREATE_PYTORCH_DIRECTORIES == 1: # Poner en 0 si ya tengo las carpetas train_images_classes y val_images_classes con las imágenes copiadas
    # Función para copiar las imágenes a los directorios correspondientes
    def copy_imag(data, directorio_destino):
        for index, row in data.iterrows():
            petID = row['PetID']
            adoption_speed = row['AdoptionSpeed']

            # Nombre del archivo de imagen
            nombre_archivo = f"{petID}-1.jpg"

            # Ruta completa de la imagen de origen
            ruta_origen = os.path.join(PATH_TO_IMAGES_DIR, nombre_archivo)

            # Ruta completa del directorio de destino
            ruta_destino = os.path.join(directorio_destino, str(adoption_speed), nombre_archivo)

            # Verificar si el archivo de origen existe
            if os.path.exists(ruta_origen):
                # Copiar el archivo de origen al directorio de destino
                shutil.copy2(ruta_origen, ruta_destino)
        print("Completada la copia a: ",str(directorio_destino))

    # Copiar las imágenes al directorio de train
    copy_imag(train_data, new_train_directory)

    # Copiar las imágenes al directorio de val
    copy_imag(val_data, new_val_directory)

    print("Proceso completado.")

In [8]:
# Genero los DataLoaders
def create_dataloaders(train_directory, val_directory, batch_size, num_workers):
    # Transformaciones de imagen para el conjunto de entrenamiento
    train_transforms = transforms.Compose([
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])

    # Transformaciones de imagen para el conjunto de validación (sin data augment)
    val_transforms = transforms.Compose([
        transforms.Resize((IMAGE_SIZE, IMAGE_SIZE)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406],
                             std=[0.229, 0.224, 0.225])
    ])

    # Crear conjuntos de datos para el conjunto de entrenamiento y validación
    conjunto_entrenamiento = datasets.ImageFolder(train_directory, transform=train_transforms)
    conjunto_validacion = datasets.ImageFolder(val_directory, transform=val_transforms)

    # Asignar las clases ordenadas al conjunto de datos
    conjunto_entrenamiento.class_to_idx = {class_name: i for i, class_name in enumerate(class_names)}
    conjunto_validacion.class_to_idx = {class_name: i for i, class_name in enumerate(class_names)}

    # Crear dataloaders para el conjunto de entrenamiento y validación
    train_dataloader = torch.utils.data.DataLoader(conjunto_entrenamiento, batch_size=batch_size, shuffle=True, num_workers=num_workers)
    val_dataloader = torch.utils.data.DataLoader(conjunto_validacion, batch_size=batch_size, shuffle=False, num_workers=num_workers)

    return train_dataloader, val_dataloader

# Aplico las funcion de los DataLoaders
train_dataloader, val_dataloader = create_dataloaders(new_train_directory , new_val_directory , BATCH_SIZE, CPU_CORES)

In [9]:
#Genero una lista de PetIDs con imagen en el orden en que aparecen en el data loader
test_sample_ids = [i[0].split('/')[-1].split('-')[0] for i in val_dataloader.dataset.samples]

**Entreno**

In [None]:
def train_val(model, criterion, optimizer, dataloaders, datasets, device, num_epochs=20, lr=0.001, momentum = 0.9 ,trial=None):

    # Instancio Stochastic Gradient Descent (SGD): Defino el parámetro del Learning Rate (define "el paso" en que avanzan los pesos en cada iteración) y el Momentum (pone innercia a la dirección del gradiente descendiente para que no cambie de dirección en minimos locales)
    optimizer = optim.SGD(resnet50.parameters(), lr=lr, momentum=momentum) # Parámetros default del SGD

    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    best_kappa =  -999

    train_losses = []
    val_losses = []

    try:
        previous_best = study.best_value
    except:
        previous_best = -999


    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        kappa_labels_true = []
        kappa_labels_predicted = []
        output_scores = []

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in tqdm(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # Zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                # Track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)



                    # Backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                    elif phase == 'val':
                        kappa_labels_true.extend(labels.cpu().numpy().tolist())
                        kappa_labels_predicted.extend(preds.cpu().numpy().tolist())
                        outputs_np = outputs.cpu().numpy()
                        output_scores.extend([outputs_np[i,:] for i in range(outputs_np.shape[0])])

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

                #END OF BATCH

            epoch_loss = running_loss / len(datasets[phase])
            epoch_acc = running_corrects.double() / len(datasets[phase])

            if phase == 'train':
                train_losses.append(epoch_loss)
                kappa_score = np.nan
            else:
                val_losses.append(epoch_loss)
                kappa_score = cohen_kappa_score(kappa_labels_true,
                                  kappa_labels_predicted,
                                  weights = 'quadratic')



            print(f'{phase.title()} Loss: {epoch_loss:.4f} Acc: {epoch_acc*100:.2f}% Kappa: {kappa_score:.3f}')

            # If this is the best Epoch so far -> Deep copy the model
            if phase == 'val' and kappa_score > best_kappa:
                best_acc = epoch_acc
                best_kappa = kappa_score
                best_model_wts = copy.deepcopy(model.state_dict())


                #Best Epoch within a trial and better than previous trials
                if trial is not None and best_kappa > previous_best:

                    #Save test dataset with predictions
                    predicted_filename = os.path.join(PATH_TO_TEMP_FILES,f'test_{trial.study.study_name}_{trial.number}.joblib')
                    predicted_df = pd.DataFrame({'PetID':test_sample_ids,
                                'pred':output_scores}).merge(val_data, on='PetID')
                    dump(predicted_df, predicted_filename)

                    #Generate and save CM
                    cm_filename = os.path.join(PATH_TO_TEMP_FILES,f'cm_{trial.study.study_name}_{trial.number}.jpg')
                    utils.plot_confusion_matrix(kappa_labels_true,kappa_labels_predicted).write_image(cm_filename)

            #END OF PHASE

        #END OF EPOCH

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:.2f}%'.format(best_acc * 100))

    # Load best model weights
    model.load_state_dict(best_model_wts)

    # Save in optuna trial the best test dataset, cm and model weights
    if trial is not None and best_kappa > previous_best:
        upload_artifact(trial, predicted_filename, artifact_store)

        upload_artifact(trial, cm_filename, artifact_store)

        file_name = f'{MODEL_NAME}_{MODEL_VERSION}_{trial.number}.pth'
        model_path = os.path.join(PATH_TO_TEMP_FILES, file_name)
        torch.save(model, model_path) # Podemos guardar solo los pesos si queremos: best_model.state_dict()
        upload_artifact(trial, model_path, artifact_store)

    return model,best_kappa

best_model,_ = train_val(resnet50, criterion, optimizer,
                       dataloaders={'train': train_dataloader,
                                    'val': val_dataloader},
                       datasets={'train': train_data, 'val': val_data},
                       device=device,
                       num_epochs=4)
# Guardo el modelo
run_id = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f'{MODEL_NAME}_{MODEL_VERSION}_{run_id}.pth'
model_path = os.path.join(PATH_TO_TEMP_FILES, file_name)
torch.save(best_model, model_path) # Podemos guardar solo los pesos si queremos: best_model.state_dict()
print(f'Modelo guardado en {model_path}')

Epoch 0/3
----------


100%|██████████| 128/128 [03:43<00:00,  1.74s/it]


Train Loss: 1.2534 Acc: 25.75% Kappa: nan


100%|██████████| 33/33 [00:33<00:00,  1.03s/it]


Val Loss: 1.2327 Acc: 28.64% Kappa: 0.252
Epoch 1/3
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 1.2062 Acc: 29.53% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.18it/s]


Val Loss: 1.2200 Acc: 29.01% Kappa: 0.263
Epoch 2/3
----------


100%|██████████| 128/128 [03:19<00:00,  1.56s/it]


Train Loss: 1.1862 Acc: 31.12% Kappa: nan


100%|██████████| 33/33 [00:23<00:00,  1.41it/s]


Val Loss: 1.2129 Acc: 30.04% Kappa: 0.270
Epoch 3/3
----------


100%|██████████| 128/128 [03:21<00:00,  1.57s/it]


Train Loss: 1.1701 Acc: 33.04% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.14it/s]


Val Loss: 1.2087 Acc: 29.11% Kappa: 0.269
Training complete in 15m 35s
Best val Acc: 30.04%
Modelo guardado en /content/drive/MyDrive/labo2/UA_MDM_LDI_II/work/optuna_temp_artifacts/06 ResNet_1.0.0_20240629_135125.pth


In [10]:
def train_val(model, criterion, optimizer, dataloaders, datasets, device, num_epochs=20, lr=0.001, momentum = 0.9 ,trial=None):

    # Instancio Stochastic Gradient Descent (SGD): Defino el parámetro del Learning Rate (define "el paso" en que avanzan los pesos en cada iteración) y el Momentum (pone innercia a la dirección del gradiente descendiente para que no cambie de dirección en minimos locales)
    optimizer = optim.SGD(resnet50.parameters(), lr=lr, momentum=momentum) # Parámetros default del SGD

    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    best_kappa =  -999

    train_losses = []
    val_losses = []

    try:
        previous_best = study.best_value
    except:
        previous_best = -999


    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        kappa_labels_true = []
        kappa_labels_predicted = []
        output_scores = []

        # Each epoch has a training and validation phase
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in tqdm(dataloaders[phase]):
                inputs = inputs.to(device)
                labels = labels.to(device)

                # Zero the parameter gradients
                optimizer.zero_grad()

                # Forward
                # Track history if only in train
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)



                    # Backward + optimize only if in training phase
                    if phase == 'train':
                        loss.backward()
                        optimizer.step()
                    elif phase == 'val':
                        kappa_labels_true.extend(labels.cpu().numpy().tolist())
                        kappa_labels_predicted.extend(preds.cpu().numpy().tolist())
                        outputs_np = outputs.cpu().numpy()
                        output_scores.extend([outputs_np[i,:] for i in range(outputs_np.shape[0])])

                # Statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)

                #END OF BATCH

            epoch_loss = running_loss / len(datasets[phase])
            epoch_acc = running_corrects.double() / len(datasets[phase])

            if phase == 'train':
                train_losses.append(epoch_loss)
                kappa_score = np.nan
            else:
                val_losses.append(epoch_loss)
                kappa_score = cohen_kappa_score(kappa_labels_true,
                                  kappa_labels_predicted,
                                  weights = 'quadratic')



            print(f'{phase.title()} Loss: {epoch_loss:.4f} Acc: {epoch_acc*100:.2f}% Kappa: {kappa_score:.3f}')

            # If this is the best Epoch so far -> Deep copy the model
            if phase == 'val' and kappa_score > best_kappa:
                best_acc = epoch_acc
                best_kappa = kappa_score
                best_model_wts = copy.deepcopy(model.state_dict())


                #Best Epoch within a trial and better than previous trials
                if trial is not None and best_kappa > previous_best:

                    #Save test dataset with predictions
                    predicted_filename = os.path.join(PATH_TO_TEMP_FILES,f'test_{trial.study.study_name}_{trial.number}.joblib')
                    predicted_df = pd.DataFrame({'PetID':test_sample_ids,
                                'pred':output_scores}).merge(val_data, on='PetID')
                    dump(predicted_df, predicted_filename)

                    #Generate and save CM
                    cm_filename = os.path.join(PATH_TO_TEMP_FILES,f'cm_{trial.study.study_name}_{trial.number}.jpg')
                    utils.plot_confusion_matrix(kappa_labels_true,kappa_labels_predicted).write_image(cm_filename)

            #END OF PHASE

        #END OF EPOCH

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    print('Best val Acc: {:.2f}%'.format(best_acc * 100))

    # Load best model weights
    model.load_state_dict(best_model_wts)

    # Save in optuna trial the best test dataset, cm and model weights
    if trial is not None and best_kappa > previous_best:
        upload_artifact(trial, predicted_filename, artifact_store)

        upload_artifact(trial, cm_filename, artifact_store)

        file_name = f'{MODEL_NAME}_{MODEL_VERSION}_{trial.number}.pth'
        model_path = os.path.join(PATH_TO_TEMP_FILES, file_name)
        torch.save(model, model_path) # Podemos guardar solo los pesos si queremos: best_model.state_dict()
        upload_artifact(trial, model_path, artifact_store)

    return model,best_kappa

In [11]:
artifact_store = FileSystemArtifactStore(base_path=PATH_TO_OPTUNA_ARTIFACTS)


def optuna_train(trial):

    epochs = trial.suggest_int('epochs', 5, 5)

    lr = trial.suggest_float('lr', 0.00001, 0.1, log=True)

    momentum = trial.suggest_float('momentum', 0.0, 0.95)

    _,best_score = train_val(resnet50, criterion, optimizer,
                       dataloaders={'train': train_dataloader,
                                    'val': val_dataloader},
                       datasets={'train': train_data, 'val': val_data},
                       device=device,
                       num_epochs=epochs,
                       lr=lr,
                       momentum = momentum,
                       trial=trial)


    return(best_score)

  artifact_store = FileSystemArtifactStore(base_path=PATH_TO_OPTUNA_ARTIFACTS)


In [17]:
study = optuna.create_study(direction='maximize',
                            #storage="sqlite:////content/drive/MyDrive/labo2/ultima.sqlite3",  # Specify the storage URL here.
                            storage="sqlite:////resnet.sqlite3",  # Specify the storage URL here.
                            #study_name=f'{MODEL_NAME}_{MODEL_VERSION}',
                            study_name='07 ResNet_1.0.0',
                            load_if_exists = True)

[I 2024-06-29 23:49:39,837] Using an existing study with name '07 ResNet_1.0.0' instead of creating a new one.


In [18]:
study.optimize(optuna_train, n_trials=7)

Epoch 0/4
----------


100%|██████████| 128/128 [13:26<00:00,  6.30s/it]


Train Loss: 1.2519 Acc: 25.30% Kappa: nan


100%|██████████| 33/33 [07:47<00:00, 14.17s/it]


Val Loss: 1.2661 Acc: 25.54% Kappa: 0.118
Epoch 1/4
----------


100%|██████████| 128/128 [03:15<00:00,  1.53s/it]


Train Loss: 1.2489 Acc: 25.64% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.32it/s]


Val Loss: 1.2626 Acc: 25.84% Kappa: 0.129
Epoch 2/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.2453 Acc: 26.05% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.32it/s]


Val Loss: 1.2601 Acc: 26.61% Kappa: 0.152
Epoch 3/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.2418 Acc: 26.37% Kappa: nan


100%|██████████| 33/33 [00:29<00:00,  1.13it/s]


Val Loss: 1.2577 Acc: 27.08% Kappa: 0.164
Epoch 4/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.2401 Acc: 26.35% Kappa: nan


100%|██████████| 33/33 [00:26<00:00,  1.26it/s]

upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.


upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.



Val Loss: 1.2554 Acc: 27.01% Kappa: 0.161
Training complete in 36m 8s
Best val Acc: 27.08%



upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.

[I 2024-06-30 00:25:59,720] Trial 2 finished with value: 0.1636998392565402 and parameters: {'epochs': 5, 'lr': 0.0003482078138431598, 'momentum': 0.04017738547592761}. Best is trial 2 with value: 0.1636998392565402.


Epoch 0/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 1.2057 Acc: 29.47% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.35it/s]


Val Loss: 1.2063 Acc: 30.78% Kappa: 0.272
Epoch 1/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 1.1302 Acc: 35.50% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.36it/s]


Val Loss: 1.2191 Acc: 30.38% Kappa: 0.260
Epoch 2/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.0411 Acc: 41.05% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.18it/s]


Val Loss: 1.2687 Acc: 30.04% Kappa: 0.264
Epoch 3/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8954 Acc: 48.97% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.15it/s]


Val Loss: 1.3656 Acc: 28.11% Kappa: 0.255
Epoch 4/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.53s/it]


Train Loss: 0.7023 Acc: 58.28% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.37it/s]

upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.


upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.



Val Loss: 1.5806 Acc: 27.54% Kappa: 0.179
Training complete in 18m 35s
Best val Acc: 30.78%



upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.

[I 2024-06-30 00:44:36,141] Trial 3 finished with value: 0.2715868091742597 and parameters: {'epochs': 5, 'lr': 0.08170446974511318, 'momentum': 0.09431884006792743}. Best is trial 3 with value: 0.2715868091742597.


Epoch 0/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 1.1269 Acc: 35.81% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.28it/s]


Val Loss: 1.2066 Acc: 31.28% Kappa: 0.268
Epoch 1/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.1271 Acc: 35.59% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.30it/s]


Val Loss: 1.2064 Acc: 31.54% Kappa: 0.275
Epoch 2/4
----------


100%|██████████| 128/128 [03:18<00:00,  1.55s/it]


Train Loss: 1.1259 Acc: 35.84% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.31it/s]


Val Loss: 1.2049 Acc: 31.58% Kappa: 0.273
Epoch 3/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Train Loss: 1.1245 Acc: 35.85% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.17it/s]


Val Loss: 1.2048 Acc: 31.81% Kappa: 0.282
Epoch 4/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.1246 Acc: 36.21% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.17it/s]

upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.


upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.



Val Loss: 1.2050 Acc: 31.54% Kappa: 0.276
Training complete in 18m 40s
Best val Acc: 31.81%



upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.

[I 2024-06-30 01:03:17,102] Trial 4 finished with value: 0.2815695347611612 and parameters: {'epochs': 5, 'lr': 5.060718643732913e-05, 'momentum': 0.7547590984443805}. Best is trial 4 with value: 0.2815695347611612.


Epoch 0/4
----------


100%|██████████| 128/128 [03:20<00:00,  1.57s/it]


Train Loss: 1.1193 Acc: 36.20% Kappa: nan


100%|██████████| 33/33 [00:30<00:00,  1.09it/s]


Val Loss: 1.2057 Acc: 31.58% Kappa: 0.306
Epoch 1/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Train Loss: 1.0969 Acc: 37.84% Kappa: nan


100%|██████████| 33/33 [00:27<00:00,  1.22it/s]


Val Loss: 1.2078 Acc: 31.38% Kappa: 0.307
Epoch 2/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Train Loss: 1.0727 Acc: 39.29% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.32it/s]


Val Loss: 1.2120 Acc: 31.48% Kappa: 0.302
Epoch 3/4
----------


100%|██████████| 128/128 [03:18<00:00,  1.55s/it]


Train Loss: 1.0457 Acc: 41.14% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.36it/s]


Val Loss: 1.2186 Acc: 30.58% Kappa: 0.295
Epoch 4/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 1.0134 Acc: 42.93% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.34it/s]

upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.


upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.



Val Loss: 1.2279 Acc: 30.11% Kappa: 0.300
Training complete in 18m 44s
Best val Acc: 31.38%



upload_artifact is experimental (supported from v3.3.0). The interface can change in the future.

[I 2024-06-30 01:22:02,069] Trial 5 finished with value: 0.30655287795570496 and parameters: {'epochs': 5, 'lr': 0.0024465107264376234, 'momentum': 0.8272799002353258}. Best is trial 5 with value: 0.30655287795570496.


Epoch 0/4
----------


100%|██████████| 128/128 [03:15<00:00,  1.53s/it]


Train Loss: 1.0761 Acc: 39.09% Kappa: nan


100%|██████████| 33/33 [00:31<00:00,  1.04it/s]


Val Loss: 1.2165 Acc: 30.78% Kappa: 0.289
Epoch 1/4
----------


100%|██████████| 128/128 [03:20<00:00,  1.57s/it]


Train Loss: 1.0298 Acc: 41.90% Kappa: nan


100%|██████████| 33/33 [00:30<00:00,  1.10it/s]


Val Loss: 1.2302 Acc: 30.44% Kappa: 0.304
Epoch 2/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.9777 Acc: 44.96% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.35it/s]


Val Loss: 1.2478 Acc: 30.84% Kappa: 0.293
Epoch 3/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.55s/it]


Train Loss: 0.8995 Acc: 49.60% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.35it/s]


Val Loss: 1.2918 Acc: 29.31% Kappa: 0.279
Epoch 4/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8039 Acc: 54.94% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.34it/s]
[I 2024-06-30 01:40:47,168] Trial 6 finished with value: 0.30391165815413945 and parameters: {'epochs': 5, 'lr': 0.0050507930941607986, 'momentum': 0.79158925937734}. Best is trial 5 with value: 0.30655287795570496.


Val Loss: 1.3543 Acc: 29.68% Kappa: 0.276
Training complete in 18m 45s
Best val Acc: 30.44%
Epoch 0/4
----------


100%|██████████| 128/128 [03:15<00:00,  1.53s/it]


Train Loss: 0.9708 Acc: 45.39% Kappa: nan


100%|██████████| 33/33 [00:29<00:00,  1.12it/s]


Val Loss: 1.2447 Acc: 30.11% Kappa: 0.288
Epoch 1/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.9331 Acc: 47.68% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.33it/s]


Val Loss: 1.2584 Acc: 30.38% Kappa: 0.293
Epoch 2/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8777 Acc: 51.11% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.35it/s]


Val Loss: 1.2789 Acc: 29.31% Kappa: 0.281
Epoch 3/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8265 Acc: 53.89% Kappa: nan


100%|██████████| 33/33 [00:27<00:00,  1.19it/s]


Val Loss: 1.3415 Acc: 30.54% Kappa: 0.287
Epoch 4/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.7593 Acc: 57.60% Kappa: nan


100%|██████████| 33/33 [00:27<00:00,  1.18it/s]
[I 2024-06-30 01:59:26,444] Trial 7 finished with value: 0.29266152154817027 and parameters: {'epochs': 5, 'lr': 0.01049828611487456, 'momentum': 0.1946377274637287}. Best is trial 5 with value: 0.30655287795570496.


Val Loss: 1.3541 Acc: 29.31% Kappa: 0.273
Training complete in 18m 39s
Best val Acc: 30.38%
Epoch 0/4
----------


100%|██████████| 128/128 [03:16<00:00,  1.54s/it]


Train Loss: 0.8821 Acc: 51.87% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.34it/s]


Val Loss: 1.2613 Acc: 30.21% Kappa: 0.286
Epoch 1/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8716 Acc: 52.01% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.33it/s]


Val Loss: 1.2678 Acc: 30.31% Kappa: 0.293
Epoch 2/4
----------


100%|██████████| 128/128 [03:17<00:00,  1.54s/it]


Train Loss: 0.8644 Acc: 52.22% Kappa: nan


100%|██████████| 33/33 [00:28<00:00,  1.15it/s]


Val Loss: 1.2730 Acc: 30.44% Kappa: 0.295
Epoch 3/4
----------


100%|██████████| 128/128 [03:15<00:00,  1.53s/it]


Train Loss: 0.8541 Acc: 52.68% Kappa: nan


100%|██████████| 33/33 [00:25<00:00,  1.31it/s]


Val Loss: 1.2761 Acc: 30.08% Kappa: 0.291
Epoch 4/4
----------


100%|██████████| 128/128 [03:19<00:00,  1.56s/it]


Train Loss: 0.8465 Acc: 53.13% Kappa: nan


100%|██████████| 33/33 [00:24<00:00,  1.32it/s]
[I 2024-06-30 02:18:01,983] Trial 8 finished with value: 0.2953952824759234 and parameters: {'epochs': 5, 'lr': 0.001326106388194096, 'momentum': 0.2207424819465973}. Best is trial 5 with value: 0.30655287795570496.


Val Loss: 1.2844 Acc: 29.81% Kappa: 0.283
Training complete in 18m 35s
Best val Acc: 30.44%


In [19]:
a = 2*50
print(a)

100
