# Compara√ß√£o de detectores de outliers no Wine Dataset
Este notebook carrega a base Wine do UCI, aplica v√°rios m√©todos de detec√ß√£o de outliers do PyOD e visualiza os resultados usando PCA.

In [None]:
# Instalar depend√™ncias (descomente se necess√°rio)
# !pip install pyod scikit-learn pandas matplotlib


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

from pyod.models.knn import KNN
from pyod.models.lof import LOF
from pyod.models.iforest import IForest
from pyod.models.ocsvm import OCSVM
from pyod.models.ecod import ECOD

## 1) Carregar Wine Dataset

In [None]:
url = "https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data"
columns = [
    "Class label", "Alcohol", "Malic acid", "Ash", "Alcalinity of ash",
    "Magnesium", "Total phenols", "Flavanoids", "Nonflavanoid phenols",
    "Proanthocyanins", "Color intensity", "Hue", "OD280/OD315", "Proline"
]

df = pd.read_csv(url, header=None, names=columns)
df.head()

## 2) Preparar dados

In [None]:
X = df.drop("Class label", axis=1).values
print(f"Shape dos dados: {X.shape}")

## 3) Definir detectores PyOD

In [None]:
contamination = 0.05  # taxa estimada de outliers

detectors = {
    "KNN": KNN(contamination=contamination),
    "LOF": LOF(contamination=contamination),
    "IsolationForest": IForest(contamination=contamination),
    "OneClassSVM": OCSVM(contamination=contamination),
    "ECOD": ECOD(contamination=contamination)
}
results = {}

## 4) Treinar todos os detectores e coletar resultados

In [None]:
for name, clf in detectors.items():
    clf.fit(X)
    labels = clf.labels_
    scores = clf.decision_scores_
    results[name] = (labels, scores)
    print(f"{name} - outliers detectados: {labels.sum()}\n")

## 5) Comparar quantidades de outliers

In [None]:
for name, (labels, _) in results.items():
    n_outliers = labels.sum()
    pct = n_outliers / len(labels) * 100
    print(f"{name}: {n_outliers} outliers detectados ({pct:.1f}%)")

## 6) Visualiza√ß√£o dos outliers com PCA (2D)

In [None]:
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)

plt.figure(figsize=(14, 10))

for i, (name, (labels, _)) in enumerate(results.items(), start=1):
    plt.subplot(2, 3, i)
    plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap="coolwarm", edgecolor="k")
    plt.title(name)
    plt.xlabel("PCA 1")
    plt.ylabel("PCA 2")

plt.tight_layout()
plt.show()

## üß† Conclus√£o
Este notebook permitiu comparar diferentes detectores de outliers no mesmo conjunto de dados, ajudando a entender como cada abordagem se comporta em rela√ß√£o √† identifica√ß√£o de anomalias.