# Hipótese 3 — Class Weight
## ResNet18 | Augmentation Leve | Com Class Weight

Este experimento avalia se a aplicação de pesos na função de perda
melhora a sensibilidade da classe Pneumonia, considerando o
desbalanceamento do dataset.

Mantemos:
- Arquitetura ResNet18
- Augmentation leve (baseline)
- Mesmo split reprodutível
- Mesma seed

Alteramos:
- Inclusão de class weights na CrossEntropyLoss

Arquivos gerados:
- models/resnet18_light_CW.pt
- outputs/metrics/resnet18_light_CW.pkl

In [2]:
import os

NOTEBOOK_DIR = os.getcwd()
PROJECT_ROOT = os.path.abspath(os.path.join(NOTEBOOK_DIR, ".."))

print("Project root:", PROJECT_ROOT)

Project root: c:\projects\xray-project


In [3]:
import sys
import torch
import torch.nn as nn
import pandas as pd
import numpy as np

SRC_PATH = os.path.join(PROJECT_ROOT, "src")

if SRC_PATH not in sys.path:
    sys.path.append(SRC_PATH)

from dataset import XRayDataset
from transforms import get_transforms
from model import get_model
from train_utils import train_model
from utils import set_seed, create_directories, save_metrics
from torch.utils.data import DataLoader

## Controle de Reprodutibilidade

Fixamos a mesma seed utilizada nos experimentos anteriores.

In [4]:
set_seed(42)

## Preparação de Diretórios

Garantimos que existam:
- models/
- outputs/metrics/

In [5]:
models_dir, metrics_dir, _ = create_directories(PROJECT_ROOT)

## Carregamento dos Splits

Utilizamos exatamente os mesmos arquivos:
- train_split.csv
- val_split.csv

Isso garante controle experimental.

In [6]:
metadata_dir = os.path.join(PROJECT_ROOT, "data", "metadata")

train_df = pd.read_csv(os.path.join(metadata_dir, "train_split.csv"))
val_df = pd.read_csv(os.path.join(metadata_dir, "val_split.csv"))

print("Treino:", len(train_df))
print("Validação:", len(val_df))

Treino: 4093
Validação: 1139


## Configuração

Modelo: ResNet18  
Augmentation: leve  
Class Weight: ativado  
Épocas: 10  
Batch Size: 16  
Learning Rate: 1e-4

In [7]:
model_name = "resnet18"
augmentation = "light"
use_class_weight = True

batch_size = 16
epochs = 10
learning_rate = 1e-4

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print("Dispositivo:", device)

Dispositivo: cpu


In [8]:
train_transform, val_transform = get_transforms(
    img_size=224,
    augmentation=augmentation
)

train_dataset = XRayDataset(train_df, transform=train_transform)
val_dataset = XRayDataset(val_df, transform=val_transform)

train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)

In [9]:
class_counts = train_df["label"].value_counts().sort_index()

num_normal = class_counts[0]
num_pneumonia = class_counts[1]

print("Normal:", num_normal)
print("Pneumonia:", num_pneumonia)

total = num_normal + num_pneumonia

weight_normal = total / (2 * num_normal)
weight_pneumonia = total / (2 * num_pneumonia)

class_weights = torch.tensor(
    [weight_normal, weight_pneumonia],
    dtype=torch.float
).to(device)

print("Class Weights:", class_weights)

Normal: 1082
Pneumonia: 3011
Class Weights: tensor([1.8914, 0.6797])


## Modelo

Arquitetura permanece ResNet18.

In [10]:
model, target_layer = get_model(model_name=model_name)
model = model.to(device)



## Função de Perda e Otimizador

Ainda não utilizamos class weight nesta etapa.

In [11]:
criterion = nn.CrossEntropyLoss(weight=class_weights)
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

## Padrão de Nomeação

modelo_augmentation_classweight

In [12]:
model_filename = f"{model_name}_{augmentation}_{'CW' if use_class_weight else 'noCW'}.pt"
metrics_filename = f"{model_name}_{augmentation}_{'CW' if use_class_weight else 'noCW'}.pkl"

model_save_path = os.path.join(models_dir, model_filename)
metrics_save_path = os.path.join(metrics_dir, metrics_filename)


## Treinamento

O loop de treinamento está implementado em src/train_utils.py.

A cada época são exibidos:
- Loss de treino
- Loss de validação
- ROC-AUC de validação
- F1 de validação 
- Recall de validação
- Prec. de validação

O modelo conforme maior AUC é salvo automaticamente.


In [13]:
history, best_auc = train_model(
    model,
    train_loader,
    val_loader,
    criterion,
    optimizer,
    device,
    epochs,
    model_save_path
)

                                                                                      


Epoch 1/10
Train Loss: 0.1469
Val Loss:   0.1032
Val AUC:    0.9975
Val F1:     0.9766
Val Recall: 0.9576
Val Prec.:  0.9964
→ Novo melhor modelo salvo.


                                                                                      


Epoch 2/10
Train Loss: 0.0845
Val Loss:   0.1376
Val AUC:    0.9981
Val F1:     0.9693
Val Recall: 0.9427
Val Prec.:  0.9976
→ Novo melhor modelo salvo.


                                                                                      


Epoch 3/10
Train Loss: 0.0552
Val Loss:   0.1211
Val AUC:    0.9979
Val F1:     0.9729
Val Recall: 0.9472
Val Prec.:  1.0000


                                                                                      


Epoch 4/10
Train Loss: 0.0551
Val Loss:   0.0375
Val AUC:    0.9988
Val F1:     0.9885
Val Recall: 0.9885
Val Prec.:  0.9885
→ Novo melhor modelo salvo.


                                                                                      


Epoch 5/10
Train Loss: 0.0383
Val Loss:   0.0486
Val AUC:    0.9991
Val F1:     0.9879
Val Recall: 0.9794
Val Prec.:  0.9965
→ Novo melhor modelo salvo.


                                                                                      


Epoch 6/10
Train Loss: 0.0451
Val Loss:   0.0313
Val AUC:    0.9992
Val F1:     0.9931
Val Recall: 0.9943
Val Prec.:  0.9920
→ Novo melhor modelo salvo.


                                                                                      


Epoch 7/10
Train Loss: 0.0343
Val Loss:   0.0690
Val AUC:    0.9987
Val F1:     0.9820
Val Recall: 0.9679
Val Prec.:  0.9965


                                                                                      


Epoch 8/10
Train Loss: 0.0261
Val Loss:   0.0488
Val AUC:    0.9989
Val F1:     0.9873
Val Recall: 0.9805
Val Prec.:  0.9942


                                                                                      


Epoch 9/10
Train Loss: 0.0268
Val Loss:   0.0472
Val AUC:    0.9985
Val F1:     0.9879
Val Recall: 0.9851
Val Prec.:  0.9908


                                                                                       


Epoch 10/10
Train Loss: 0.0245
Val Loss:   0.0408
Val AUC:    0.9988
Val F1:     0.9902
Val Recall: 0.9874
Val Prec.:  0.9931

Melhor AUC obtida: 0.9991538672989039




In [14]:
metrics_dict = {
    "model_name": model_name,
    "augmentation": augmentation,
    "class_weight": use_class_weight,
    "epochs": epochs,
    "history": history,
    "best_auc": best_auc
}

save_metrics(metrics_dict, metrics_save_path)

print("Modelo salvo em:", model_save_path)
print("Métricas salvas em:", metrics_save_path)
print("Melhor AUC:", best_auc)

Modelo salvo em: c:\projects\xray-project\models\resnet18_light_CW.pt
Métricas salvas em: c:\projects\xray-project\outputs\metrics\resnet18_light_CW.pkl
Melhor AUC: 0.9991538672989039
