# SPR 2026 - Sentence Transformers

**Notebook para submissão offline no Kaggle.**

---
**CONFIGURAÇÃO OFFLINE:**
1. No Kaggle, vá em Settings → Internet → **OFF**
2. Adicione o modelo: **sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2**
   - Vá em "Add Data" → "Models" → Pesquise pelo modelo
3. O modelo será carregado de: `/kaggle/input/paraphrase-multilingual-minilm-l12-v2`
---

In [None]:
import os
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
import lightgbm as lgb
from sentence_transformers import SentenceTransformer
import warnings
warnings.filterwarnings('ignore')

SEED = 42
DATA_DIR = '/kaggle/input/spr-2026-mammography-report-classification'

# Modelo offline
MODEL_PATH = '/kaggle/input/paraphrase-multilingual-minilm-l12-v2'

np.random.seed(SEED)
print('Bibliotecas carregadas!')

In [None]:
# Carregar dados
train = pd.read_csv(f'{DATA_DIR}/train.csv')
test = pd.read_csv(f'{DATA_DIR}/test.csv')

print(f'Train: {train.shape}')
print(f'Test: {test.shape}')

In [None]:
# Carregar modelo de embeddings
model = SentenceTransformer(MODEL_PATH)
print('Modelo carregado!')

In [None]:
# Gerar embeddings
print('Gerando embeddings do treino...')
X_train = model.encode(train['report'].tolist(), show_progress_bar=True, batch_size=32)
y_train = train['target'].values

print('Gerando embeddings do teste...')
X_test = model.encode(test['report'].tolist(), show_progress_bar=True, batch_size=32)

print(f'X_train shape: {X_train.shape}')
print(f'X_test shape: {X_test.shape}')

In [None]:
# Treinar LightGBM
clf = lgb.LGBMClassifier(
    n_estimators=200,
    max_depth=10,
    learning_rate=0.05,
    class_weight='balanced',
    random_state=SEED,
    verbose=-1
)

clf.fit(X_train, y_train)
print('Modelo treinado!')

In [None]:
# Submissão
predictions = clf.predict(X_test)

submission = pd.DataFrame({
    'ID': test['ID'],
    'target': predictions
})

submission.to_csv('submission.csv', index=False)

print('submission.csv criado!')
print(submission['target'].value_counts().sort_index())