# Hipótese 3 — Class Weight
## ResNet18 | Augmentation Leve | Com Class Weight

Este experimento avalia se a aplicação de pesos na função de perda
melhora a sensibilidade da classe Pneumonia, considerando o
desbalanceamento do dataset.

Mantemos:
- Arquitetura ResNet18
- Augmentation leve (baseline)
- Mesmo split reprodutível
- Mesma seed

Alteramos:
- Inclusão de class weights na CrossEntropyLoss

Arquivos gerados:
- models/resnet18_light_CW.pt
- outputs/metrics/resnet18_light_CW.npz

In [1]:
import os

NOTEBOOK_DIR = os.getcwd()
PROJECT_ROOT = os.path.abspath(os.path.join(NOTEBOOK_DIR, ".."))

print("Project root:", PROJECT_ROOT)

Project root: c:\projects\xray-project


In [2]:
import sys
import torch
import torch.nn as nn
import pandas as pd
import numpy as np

SRC_PATH = os.path.join(PROJECT_ROOT, "src")

if SRC_PATH not in sys.path:
    sys.path.append(SRC_PATH)

from dataset import XRayDataset
from transforms import get_transforms
from model import get_model
from train_utils import train_model
from utils import set_seed, create_directories, save_metrics
from torch.utils.data import DataLoader

## Controle de Reprodutibilidade

Fixamos a mesma seed utilizada nos experimentos anteriores.

In [3]:
set_seed(42)

## Preparação de Diretórios

Garantimos que existam:
- models/
- outputs/metrics/

In [4]:
models_dir, metrics_dir, _ = create_directories(PROJECT_ROOT)

## Carregamento dos Splits

Utilizamos exatamente os mesmos arquivos:
- train_split.csv
- val_split.csv

Isso garante controle experimental.

In [5]:
metadata_dir = os.path.join(PROJECT_ROOT, "data", "metadata")

train_df = pd.read_csv(os.path.join(metadata_dir, "train_split.csv"))
val_df = pd.read_csv(os.path.join(metadata_dir, "val_split.csv"))

print("Treino:", len(train_df))
print("Validação:", len(val_df))

Treino: 4093
Validação: 1139


## Configuração

Modelo: ResNet18  
Augmentation: leve  
Class Weight: ativado  
Épocas: 10  
Batch Size: 16  
Learning Rate: 1e-4

In [6]:
model_name = "resnet18"
augmentation = "light"
use_class_weight = True

batch_size = 16
epochs = 10
learning_rate = 1e-4

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Dispositivo:", device)

Dispositivo: cpu


In [7]:
train_transform, val_transform = get_transforms(
    img_size=224,
    augmentation=augmentation
)

train_dataset = XRayDataset(train_df, transform=train_transform)
val_dataset = XRayDataset(val_df, transform=val_transform)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

In [8]:
class_counts = train_df["label"].value_counts().sort_index()

num_normal = class_counts[0]
num_pneumonia = class_counts[1]

print("Normal:", num_normal)
print("Pneumonia:", num_pneumonia)

total = num_normal + num_pneumonia

weight_normal = total / (2 * num_normal)
weight_pneumonia = total / (2 * num_pneumonia)

class_weights = torch.tensor(
    [weight_normal, weight_pneumonia],
    dtype=torch.float
).to(device)

print("Class Weights:", class_weights)

Normal: 1082
Pneumonia: 3011
Class Weights: tensor([1.8914, 0.6797])


In [9]:
model, target_layer = get_model(model_name=model_name)
model = model.to(device)



In [10]:
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

In [11]:
model_filename = f"{model_name}_{augmentation}_CW.pt"
metrics_filename = f"{model_name}_{augmentation}_CW.npz"

model_save_path = os.path.join(models_dir, model_filename)
metrics_save_path = os.path.join(metrics_dir, metrics_filename)

In [12]:
history, best_auc = train_model(
    model,
    train_loader,
    val_loader,
    criterion,
    optimizer,
    device,
    epochs,
    model_save_path
)

Epoch 1/10 [Treino]:   0%|          | 0/256 [00:00<?, ?it/s]

                                                                                     


Epoch 1/10
Train Loss: 0.1510
Val Loss:   0.0981
Val AUC:    0.9977
Val F1:     0.9742
Val Recall: 0.9530
Val Prec.:  0.9964
→ Novo melhor modelo salvo.


                                                                                      


Epoch 2/10
Train Loss: 0.0801
Val Loss:   0.1152
Val AUC:    0.9980
Val F1:     0.9706
Val Recall: 0.9450
Val Prec.:  0.9976
→ Novo melhor modelo salvo.


                                                                                      


Epoch 3/10
Train Loss: 0.0632
Val Loss:   0.1584
Val AUC:    0.9979
Val F1:     0.9662
Val Recall: 0.9346
Val Prec.:  1.0000


                                                                                      


Epoch 4/10
Train Loss: 0.0574
Val Loss:   0.0433
Val AUC:    0.9985
Val F1:     0.9891
Val Recall: 0.9874
Val Prec.:  0.9908
→ Novo melhor modelo salvo.


                                                                                      


Epoch 5/10
Train Loss: 0.0370
Val Loss:   0.0491
Val AUC:    0.9990
Val F1:     0.9879
Val Recall: 0.9805
Val Prec.:  0.9953
→ Novo melhor modelo salvo.


                                                                                      


Epoch 6/10
Train Loss: 0.0396
Val Loss:   0.0399
Val AUC:    0.9985
Val F1:     0.9908
Val Recall: 0.9931
Val Prec.:  0.9886


                                                                                      


Epoch 7/10
Train Loss: 0.0381
Val Loss:   0.2093
Val AUC:    0.9983
Val F1:     0.9576
Val Recall: 0.9186
Val Prec.:  1.0000


                                                                                      


Epoch 8/10
Train Loss: 0.0285
Val Loss:   0.0669
Val AUC:    0.9982
Val F1:     0.9855
Val Recall: 0.9759
Val Prec.:  0.9953


                                                                                      


Epoch 9/10
Train Loss: 0.0263
Val Loss:   0.0797
Val AUC:    0.9983
Val F1:     0.9831
Val Recall: 0.9690
Val Prec.:  0.9976


                                                                                       


Epoch 10/10
Train Loss: 0.0136
Val Loss:   0.0438
Val AUC:    0.9987
Val F1:     0.9897
Val Recall: 0.9920
Val Prec.:  0.9874

Melhor AUC obtida: 0.999003539154039




In [21]:
metrics_dict = history
metrics_dict["best_auc"] = best_auc

save_metrics(metrics_dict, metrics_save_path)

print("Modelo salvo em:", model_save_path)
print("Métricas salvas em:", metrics_save_path)
print("Melhor AUC:", best_auc)

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (10,) + inhomogeneous part.