# 🧠 Fine-tuning: BERTimbau-Medical + Focal Loss → Classificação BI-RADS

**Pipeline completo:**
```
BERTimbau-Medical (do Google Drive, pré-treinado com MLM médico)
    ↓ Fine-tuning com Focal Loss
Classificador BI-RADS (0-6)
    ↓ Predições no test set
submission.csv (pronto para submeter no Kaggle)
```

**Requisitos:**
- Google Colab com GPU (T4/L4)
- Modelo BERTimbau-Medical salvo no Google Drive
- Dataset da competição SPR 2026


## Etapa 1 — Setup

In [None]:
# ===== ETAPA 1: SETUP =====
!pip install -q transformers datasets accelerate scikit-learn

import os
import torch
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import f1_score, classification_report

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    gpu_name = torch.cuda.get_device_name(0)
    gpu_mem = torch.cuda.get_device_properties(0).total_memory / 1e9
    print(f'✅ GPU: {gpu_name} ({gpu_mem:.1f} GB)')
else:
    print('⚠️ Sem GPU! Vá em Runtime > Change runtime type > T4/L4 GPU')
print(f'Device: {device}')

✅ GPU: NVIDIA L4 (23.7 GB)
Device: cuda


## Etapa 2 — Montar Google Drive e carregar modelo

In [None]:
# ===== ETAPA 2: MONTAR DRIVE =====
from google.colab import drive
drive.mount('/content/drive')

# Caminho do modelo pré-treinado com MLM médico
MODEL_PATH = '/content/drive/MyDrive/spr_2026/models/bertimbau-medical'

# Verificar se o modelo existe
if os.path.exists(MODEL_PATH):
    arquivos = os.listdir(MODEL_PATH)
    print(f'✅ Modelo encontrado em: {MODEL_PATH}')
    print(f'   Arquivos: {arquivos}')
else:
    print(f'❌ Modelo NÃO encontrado em: {MODEL_PATH}')
    print('   Verifique o caminho no seu Google Drive!')

Mounted at /content/drive
✅ Modelo encontrado em: /content/drive/MyDrive/spr_2026/models/bertimbau-medical
   Arquivos: ['config.json', 'model.safetensors', 'tokenizer_config.json', 'tokenizer.json', 'training_args.bin']


## Etapa 3 — Carregar dataset
Faça upload manual do zip da competição se a API do Kaggle não funcionar.

In [None]:
# ===== ETAPA 3: DATASET =====

DATA_DIR = '/content'

train_df = pd.read_csv(f'{DATA_DIR}/train.csv')
test_df = pd.read_csv(f'{DATA_DIR}/test.csv')

# Renomear 'target' para 'label' (padrão do HuggingFace)
train_df['label'] = train_df['target']

print(f'Train: {train_df.shape}')
print(f'Test: {test_df.shape}')
print(f'Colunas treino: {train_df.columns.tolist()}')
print(f'\nDistribuição das classes:')
print(train_df['label'].value_counts().sort_index())

Train: (18272, 4)
Test: (4, 2)
Colunas treino: ['ID', 'report', 'target', 'label']

Distribuição das classes:
label
0      610
1      693
2    15968
3      713
4      214
5       29
6       45
Name: count, dtype: int64


## Etapa 4 — Carregar BERTimbau-Medical e tokenizar

In [None]:
# ===== ETAPA 4: TOKENIZAR =====
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from datasets import Dataset

# Carregar tokenizer do modelo pré-treinado
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
print(f'✅ Tokenizer carregado de: {MODEL_PATH}')

MAX_LEN = 256
NUM_LABELS = train_df['label'].nunique()
print(f'Classes: {NUM_LABELS} (BI-RADS 0-{NUM_LABELS-1})')

# Tokenizar treino
def tokenize(examples):
    return tokenizer(
        examples['text'],
        truncation=True,
        max_length=MAX_LEN,
        padding='max_length',
    )

# Preparar datasets
train_data = Dataset.from_dict({
    'text': train_df['report'].tolist(),
    'label': train_df['label'].tolist()
})

test_data = Dataset.from_dict({
    'text': test_df['report'].tolist(),
})

train_tokenized = train_data.map(tokenize, batched=True, remove_columns=['text'])
test_tokenized = test_data.map(tokenize, batched=True, remove_columns=['text'])

train_tokenized.set_format('torch')
test_tokenized.set_format('torch')

print(f'\n✅ Tokenizado!')
print(f'   Train: {len(train_tokenized)} exemplos')
print(f'   Test: {len(test_tokenized)} exemplos')

✅ Tokenizer carregado de: /content/drive/MyDrive/spr_2026/models/bertimbau-medical
Classes: 7 (BI-RADS 0-6)


Map:   0%|          | 0/18272 [00:00<?, ? examples/s]

Map:   0%|          | 0/4 [00:00<?, ? examples/s]


✅ Tokenizado!
   Train: 18272 exemplos
   Test: 4 exemplos


## Etapa 5 — Split treino/validação

In [None]:
# ===== ETAPA 5: SPLIT =====
from datasets import ClassLabel

# Converter label para ClassLabel
train_tokenized = train_tokenized.cast_column('label', ClassLabel(num_classes=NUM_LABELS))

split = train_tokenized.train_test_split(test_size=0.1, seed=42, stratify_by_column='label')
train_split = split['train']
val_split = split['test']

print(f'Train: {len(train_split)}, Val: {len(val_split)}')

Casting the dataset:   0%|          | 0/18272 [00:00<?, ? examples/s]

Train: 16444, Val: 1828


## Etapa 6 — Focal Loss + Modelo
Focal Loss dá mais peso para exemplos difíceis, ajudando com o desbalanceamento das classes BI-RADS.

In [None]:
# ===== ETAPA 6: FOCAL LOSS + MODELO =====
import torch.nn as nn
import torch.nn.functional as F

class FocalLoss(nn.Module):
    """
    Focal Loss: foca nos exemplos difíceis.
    - gamma > 0: reduz a loss dos exemplos fáceis
    - alpha: peso por classe (opcional)
    """
    def __init__(self, gamma=2.0, alpha=None, num_classes=7):
        super().__init__()
        self.gamma = gamma
        if alpha is not None:
            self.alpha = torch.tensor(alpha, dtype=torch.float32)
        else:
            self.alpha = None
        self.num_classes = num_classes

    def forward(self, logits, targets):
        probs = F.softmax(logits, dim=-1)
        targets_one_hot = F.one_hot(targets, num_classes=self.num_classes).float()
        pt = (probs * targets_one_hot).sum(dim=-1)
        focal_weight = (1 - pt) ** self.gamma

        ce_loss = F.cross_entropy(logits, targets, reduction='none')
        loss = focal_weight * ce_loss

        if self.alpha is not None:
            alpha = self.alpha.to(logits.device)
            alpha_weight = alpha[targets]
            loss = alpha_weight * loss

        return loss.mean()

# Calcular pesos das classes (inversamente proporcional à frequência)
class_counts = train_df['label'].value_counts().sort_index().values
class_weights = 1.0 / class_counts
class_weights = class_weights / class_weights.sum() * len(class_weights)
print(f'Pesos das classes: {dict(zip(range(len(class_weights)), class_weights.round(3)))}')

focal_loss = FocalLoss(gamma=2.0, alpha=class_weights.tolist(), num_classes=NUM_LABELS)
print(f'✅ Focal Loss configurada (gamma=2.0, com pesos por classe)')

Pesos das classes: {0: np.float64(0.174), 1: np.float64(0.153), 2: np.float64(0.007), 3: np.float64(0.149), 4: np.float64(0.496), 5: np.float64(3.661), 6: np.float64(2.36)}
✅ Focal Loss configurada (gamma=2.0, com pesos por classe)


## Etapa 7 — Configurar treinamento

In [None]:
# ===== ETAPA 7: TRAINER CUSTOMIZADO =====
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# Carregar modelo para classificação (inicializa a cabeça de classificação)
# Os pesos do encoder vêm do BERTimbau-Medical (MLM)
model = AutoModelForSequenceClassification.from_pretrained(
    MODEL_PATH,
    num_labels=NUM_LABELS,
    ignore_mismatched_sizes=True,  # a cabeça MLM será substituída pela de classificação
)
model.to(device)

total_params = sum(p.numel() for p in model.parameters())
trainable = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f'✅ Modelo carregado: {total_params:,} params ({trainable:,} treináveis)')

# Trainer customizado com Focal Loss
class FocalLossTrainer(Trainer):
    def __init__(self, focal_loss_fn, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.focal_loss_fn = focal_loss_fn

    def compute_loss(self, model, inputs, return_outputs=False, **kwargs):
        labels = inputs.pop('labels')
        outputs = model(**inputs)
        logits = outputs.logits
        loss = self.focal_loss_fn(logits, labels)
        return (loss, outputs) if return_outputs else loss

# Métricas
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = np.argmax(logits, axis=-1)
    f1_macro = f1_score(labels, preds, average='macro')
    f1_per_class = f1_score(labels, preds, average=None)
    metrics = {'f1_macro': f1_macro}
    for i, f1 in enumerate(f1_per_class):
        metrics[f'f1_class_{i}'] = f1
    return metrics

# Salvar checkpoints no Drive
CHECKPOINT_DIR = '/content/drive/MyDrive/spr_2026/checkpoints/finetune-focal'

training_args = TrainingArguments(
    output_dir=CHECKPOINT_DIR,
    num_train_epochs=5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=32,
    learning_rate=2e-5,
    weight_decay=0.01,
    warmup_ratio=0.1,
    logging_steps=50,
    eval_strategy='steps',
    eval_steps=500,
    save_strategy='steps',
    save_steps=500,
    save_total_limit=2,
    load_best_model_at_end=True,
    metric_for_best_model='f1_macro',
    greater_is_better=True,
    fp16=True,
    dataloader_num_workers=2,
    report_to='none',
    seed=42,
)

trainer = FocalLossTrainer(
    focal_loss_fn=focal_loss,
    model=model,
    args=training_args,
    train_dataset=train_split,
    eval_dataset=val_split,
    compute_metrics=compute_metrics,
)

print('✅ Trainer configurado')
print(f'   Épocas: {training_args.num_train_epochs}')
print(f'   LR: {training_args.learning_rate}')
print(f'   Batch: {training_args.per_device_train_batch_size}')
print(f'   Checkpoints em: {CHECKPOINT_DIR}')

Loading weights:   0%|          | 0/389 [00:00<?, ?it/s]

BertForSequenceClassification LOAD REPORT from: /content/drive/MyDrive/spr_2026/models/bertimbau-medical
Key                                        | Status     | 
-------------------------------------------+------------+-
cls.predictions.decoder.weight             | UNEXPECTED | 
cls.predictions.transform.LayerNorm.bias   | UNEXPECTED | 
cls.predictions.transform.dense.weight     | UNEXPECTED | 
cls.predictions.transform.LayerNorm.weight | UNEXPECTED | 
cls.predictions.bias                       | UNEXPECTED | 
cls.predictions.transform.dense.bias       | UNEXPECTED | 
classifier.bias                            | MISSING    | 
bert.pooler.dense.bias                     | MISSING    | 
bert.pooler.dense.weight                   | MISSING    | 
classifier.weight                          | MISSING    | 

Notes:
- UNEXPECTED	:can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING	:those params were newly initialized because missing fro

✅ Modelo carregado: 334,403,591 params (334,403,591 treináveis)
✅ Trainer configurado
   Épocas: 5
   LR: 2e-05
   Batch: 16
   Checkpoints em: /content/drive/MyDrive/spr_2026/checkpoints/finetune-focal


## Etapa 8 — Treinar!
Se o Colab cair, use `trainer.train(resume_from_checkpoint=True)` pra retomar.

In [None]:
# ===== ETAPA 8: TREINAR =====
print('🚀 Iniciando fine-tuning com Focal Loss...')
print()

# Primeira vez:
train_result = trainer.train()

# Se o Colab CAIU e quer retomar:
# train_result = trainer.train(resume_from_checkpoint=True)

print('\n' + '='*50)
print('✅ FINE-TUNING CONCLUÍDO!')
print('='*50)
print(f'Loss final: {train_result.training_loss:.4f}')

# Avaliar no validation set
eval_results = trainer.evaluate()
print(f'\n📊 Resultados na validação:')
print(f'   F1 Macro: {eval_results["eval_f1_macro"]:.5f}')
for i in range(NUM_LABELS):
    key = f'eval_f1_class_{i}'
    if key in eval_results:
        print(f'   F1 Classe {i}: {eval_results[key]:.4f}')

🚀 Iniciando fine-tuning com Focal Loss...



Step,Training Loss,Validation Loss,F1 Macro,F1 Class 0,F1 Class 1,F1 Class 2,F1 Class 3,F1 Class 4,F1 Class 5,F1 Class 6
500,0.04708,0.038883,0.384372,0.4,0.902778,0.925553,0.268398,0.193878,0.0,0.0
1000,0.006426,0.029831,0.534287,0.620321,0.887417,0.915565,0.352941,0.463768,0.0,0.5
1500,0.016687,0.024448,0.609441,0.666667,0.884615,0.933289,0.422819,0.608696,0.0,0.75
2000,0.029729,0.017394,0.648989,0.774648,0.896104,0.955954,0.54955,0.566667,0.0,0.8
2500,0.003174,0.013475,0.682498,0.741259,0.884615,0.944554,0.475806,0.53125,0.4,0.8
3000,0.003772,0.007086,0.756405,0.765101,0.867925,0.94211,0.462121,0.681818,0.666667,0.909091
3500,0.005296,0.013728,0.664187,0.784615,0.890323,0.957967,0.510823,0.596491,0.0,0.909091
4000,0.003384,0.009402,0.739671,0.80303,0.890323,0.957287,0.510823,0.607143,0.5,0.909091
4500,0.001695,0.007372,0.782505,0.737589,0.883117,0.960364,0.530233,0.6,0.857143,0.909091
5000,0.001735,0.007328,0.78665,0.820896,0.890323,0.965383,0.578431,0.618182,0.8,0.833333


Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]

There were missing keys in the checkpoint model loaded: ['bert.embeddings.LayerNorm.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.4.attention.output.La


✅ FINE-TUNING CONCLUÍDO!
Loss final: 0.0146



📊 Resultados na validação:
   F1 Macro: 0.78665
   F1 Classe 0: 0.8209
   F1 Classe 1: 0.8903
   F1 Classe 2: 0.9654
   F1 Classe 3: 0.5784
   F1 Classe 4: 0.6182
   F1 Classe 5: 0.8000
   F1 Classe 6: 0.8333


## Etapa 9 — Gerar predições para submissão

In [None]:
# ===== ETAPA 9: PREDIÇÃO =====
print('🔮 Gerando predições no test set...')

predictions = trainer.predict(test_tokenized)
preds = np.argmax(predictions.predictions, axis=-1)

print(f'Predições geradas: {len(preds)}')
print(f'Distribuição das predições:')
unique, counts = np.unique(preds, return_counts=True)
for u, c in zip(unique, counts):
    print(f'   Classe {u}: {c} ({c/len(preds)*100:.1f}%)')

## Etapa 10 — Criar arquivo de submissão

## Etapa 11 — Resumo