## Experimento 10 — Comparando Modelos (NB, SVM, KNN)
**Objetivo:** comparar modelos simples em um dataset clássico.

**Enunciado:**  
1. Use o **Iris**. Compare **GaussianNB**, **SVM** (linear) e **KNN** (k=5).  
2. Reporte **acurácia**.  
3. **Tarefa Extra:** compare diferentes **proporções de teste** (20%, 30%, 40%) e observe o impacto nas métricas.
> Dica: para SVM e KNN, utilize `Pipeline(StandardScaler() -> Modelo)` para evitar *data leakage*.

[![Abrir no Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/flavioluizseixas/aprendizado-de-maquina-na-saude/blob/main/0-Nivelamento/Experimento_10.ipynb)

In [5]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# sklearn (para os experimentos de ML)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# Opção para gráficos inline (no Jupyter/Colab)
# %matplotlib inline  # Descomente no Jupyter clássico se necessário

RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

In [6]:
# Código-base (comparação simples)
iris = load_iris()
X = iris.data
y = iris.target

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_STATE, stratify=y
)

models = {
    "NaiveBayes": GaussianNB(),
    "SVM_linear": Pipeline([("scaler", StandardScaler()), ("svc", SVC(kernel="linear", random_state=RANDOM_STATE))]),
    "KNN_k5": Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=5))])
}

for name, model in models.items():
    model.fit(X_train, y_train)
    preds = model.predict(X_test)
    acc = accuracy_score(y_test, preds)
    print(f"{name}: acc = {acc:.3f}")


NaiveBayes: acc = 0.967
SVM_linear: acc = 1.000
KNN_k5: acc = 0.933


### Adaptação (Extra)
- Varie `test_size` em `[0.2, 0.3, 0.4]` e repita a comparação.
- Use `stratify=y` para manter a proporção entre classes.

In [7]:
# Código final (com Extra - variação de test_size)
iris = load_iris()
X = iris.data
y = iris.target

def evaluate_split(test_size):
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=RANDOM_STATE, stratify=y
    )
    models = {
        "NaiveBayes": GaussianNB(),
        "SVM_linear": Pipeline([("scaler", StandardScaler()), ("svc", SVC(kernel="linear", random_state=RANDOM_STATE))]),
        "KNN_k5": Pipeline([("scaler", StandardScaler()), ("knn", KNeighborsClassifier(n_neighbors=5))])
    }
    print(f"\n=== test_size = {test_size:.1f} ===")
    for name, model in models.items():
        model.fit(X_train, y_train)
        preds = model.predict(X_test)
        acc = accuracy_score(y_test, preds)
        print(f"{name}: acc = {acc:.3f}")

for ts in [0.2, 0.3, 0.4]:
    evaluate_split(ts)



=== test_size = 0.2 ===
NaiveBayes: acc = 0.967
SVM_linear: acc = 1.000
KNN_k5: acc = 0.933

=== test_size = 0.3 ===
NaiveBayes: acc = 0.911
SVM_linear: acc = 0.911
KNN_k5: acc = 0.911

=== test_size = 0.4 ===
NaiveBayes: acc = 0.933
SVM_linear: acc = 0.950
KNN_k5: acc = 0.917
