# AI vs Human Content Detection

Este notebook implementa m√∫ltiples modelos de ML para detectar si un texto fue generado por IA o escrito por humanos.

## Estrategia:
1. **Baseline r√°pido:** Logistic Regression
2. **Modelo principal:** XGBoost con GridSearch
3. **Comparaci√≥n:** Random Forest
4. **Deep Learning:** PyTorch Neural Network
5. **Ensemble:** Combinaci√≥n de los mejores modelos

## 1. Imports y Configuraci√≥n

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score, roc_curve
)
import xgboost as xgb

# PyTorch imports
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, TensorDataset

import warnings
warnings.filterwarnings('ignore')

# Configuraci√≥n
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette('husl')
RANDOM_STATE = 42

# Detectar si hay GPU disponible
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'üéÆ PyTorch usando: {device}')
if torch.cuda.is_available():
    print(f'   GPU: {torch.cuda.get_device_name(0)}')
    print(f'   Memoria disponible: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB')

# Detectar si XGBoost puede usar GPU
use_gpu_xgb = torch.cuda.is_available()
if use_gpu_xgb:
    print(f'üöÄ XGBoost usar√° GPU (tree_method=gpu_hist)')
else:
    print(f'üíª XGBoost usar√° CPU (tree_method=hist)')

### üéÆ Resumen de Aceleraci√≥n GPU

Este notebook est√° optimizado para usar GPU cuando est√© disponible:

| Modelo | GPU Support | Aceleraci√≥n |
|--------|-------------|-------------|
| **Logistic Regression** | ‚ùå No | CPU only (scikit-learn) |
| **XGBoost** | ‚úÖ S√≠ | `tree_method='gpu_hist'` |
| **Random Forest** | ‚ùå No | CPU only (scikit-learn) |
| **PyTorch NN** | ‚úÖ S√≠ | `.to(device)` autom√°tico |
| **Ensemble** | ‚ö†Ô∏è Parcial | Usa GPU en XGBoost |

**Beneficios de GPU:**
- XGBoost: 5-10x m√°s r√°pido con GPU
- PyTorch: 10-50x m√°s r√°pido con GPU (depende del tama√±o del modelo)

**Si no tienes GPU:** El notebook funcionar√° perfectamente en CPU.

## 2. Carga y Exploraci√≥n de Datos

In [None]:
# Cargar dataset
df = pd.read_csv('ai_human_content_detection_dataset.csv')

print(f"Dataset shape: {df.shape}")
print(f"\nColumnas: {df.columns.tolist()}")
print(f"\nPrimeras filas:")
df.head()

In [None]:
# Informaci√≥n del dataset
print("Informaci√≥n del dataset:")
print(df.info())
print("\nEstad√≠sticas descriptivas:")
df.describe()

In [None]:
# Distribuci√≥n de labels
print("Distribuci√≥n de labels:")
print(df['label'].value_counts())
print(f"\nPorcentaje AI (1): {df['label'].mean()*100:.2f}%")
print(f"Porcentaje Human (0): {(1-df['label'].mean())*100:.2f}%")

# Visualizaci√≥n
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

df['label'].value_counts().plot(kind='bar', ax=axes[0])
axes[0].set_title('Distribuci√≥n de Labels')
axes[0].set_xlabel('Label (0=Human, 1=AI)')
axes[0].set_ylabel('Frecuencia')
axes[0].set_xticklabels(['Human', 'AI'], rotation=0)

df['content_type'].value_counts().plot(kind='barh', ax=axes[1])
axes[1].set_title('Distribuci√≥n de Tipos de Contenido')
axes[1].set_xlabel('Frecuencia')

plt.tight_layout()
plt.show()

In [None]:
# Valores faltantes
print("Valores faltantes por columna:")
missing = df.isnull().sum()
missing_pct = (missing / len(df)) * 100
missing_df = pd.DataFrame({'Missing': missing, 'Percentage': missing_pct})
print(missing_df[missing_df['Missing'] > 0])

## 3. Preprocesamiento de Datos

In [None]:
# Seleccionar solo features num√©ricas (excluir text_content y content_type)
feature_columns = [
    'word_count', 'character_count', 'sentence_count', 'lexical_diversity',
    'avg_sentence_length', 'avg_word_length', 'punctuation_ratio',
    'flesch_reading_ease', 'gunning_fog_index', 'grammar_errors',
    'passive_voice_ratio', 'predictability_score', 'burstiness', 'sentiment_score'
]

# Verificar qu√© columnas existen realmente
available_features = [col for col in feature_columns if col in df.columns]
print(f"Features disponibles: {len(available_features)}/{len(feature_columns)}")
print(available_features)

# Preparar X e y
X = df[available_features].copy()
y = df['label'].copy()

# Manejar valores faltantes (rellenar con mediana)
X = X.fillna(X.median())

print(f"\nShape de X: {X.shape}")
print(f"Shape de y: {y.shape}")
print(f"\nValores faltantes restantes en X: {X.isnull().sum().sum()}")

In [None]:
# Split train/test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=RANDOM_STATE, stratify=y
)

print(f"Training set: {X_train.shape}")
print(f"Test set: {X_test.shape}")
print(f"\nDistribuci√≥n en train: {y_train.value_counts().to_dict()}")
print(f"Distribuci√≥n en test: {y_test.value_counts().to_dict()}")

In [None]:
# Escalar features (importante para Logistic Regression y PyTorch)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("Features escaladas correctamente")
print(f"Media train (debe ser ~0): {X_train_scaled.mean(axis=0)[:3]}")
print(f"Std train (debe ser ~1): {X_train_scaled.std(axis=0)[:3]}")

## 4. Baseline: Logistic Regression

In [None]:
print("=" * 50)
print("MODELO 1: LOGISTIC REGRESSION (BASELINE)")
print("=" * 50)

# Entrenar modelo
lr_model = LogisticRegression(random_state=RANDOM_STATE, max_iter=1000)
lr_model.fit(X_train_scaled, y_train)

# Predicciones
y_pred_lr = lr_model.predict(X_test_scaled)
y_pred_proba_lr = lr_model.predict_proba(X_test_scaled)[:, 1]

# M√©tricas
print(f"\nAccuracy: {accuracy_score(y_test, y_pred_lr):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_lr):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_lr):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_lr):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_lr):.4f}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_lr, target_names=['Human', 'AI']))

In [None]:
# Feature importance para Logistic Regression
lr_importance = pd.DataFrame({
    'feature': available_features,
    'coefficient': lr_model.coef_[0]
}).sort_values('coefficient', key=abs, ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(lr_importance['feature'], lr_importance['coefficient'])
plt.xlabel('Coefficient')
plt.title('Logistic Regression - Feature Importance')
plt.tight_layout()
plt.show()

print("\nTop 5 features m√°s importantes:")
print(lr_importance.head())

## 5. XGBoost con GridSearch (Modelo Principal)

In [None]:
print("=" * 50)
print("MODELO 2: XGBOOST CON GRIDSEARCH (GPU OPTIMIZADO)")
print("=" * 50)

# Definir grid de hiperpar√°metros
param_grid = {
    'max_depth': [3, 5, 7],
    'learning_rate': [0.01, 0.1, 0.3],
    'n_estimators': [100, 200],
    'subsample': [0.8, 1.0],
    'colsample_bytree': [0.8, 1.0]
}

# Modelo base con GPU si est√° disponible (XGBoost 3.1+ usa 'device' en lugar de 'gpu_id')
if use_gpu_xgb:
    xgb_base = xgb.XGBClassifier(
        random_state=RANDOM_STATE,
        eval_metric='logloss',
        tree_method='hist',  # 'hist' funciona tanto en CPU como GPU
        device='cuda:0'  # Especifica GPU (XGBoost 3.1+)
    )
    print("‚úÖ XGBoost configurado para usar GPU (device='cuda:0')")
else:
    xgb_base = xgb.XGBClassifier(
        random_state=RANDOM_STATE,
        eval_metric='logloss',
        tree_method='hist',  # CPU optimizado
        device='cpu'
    )
    print("‚ÑπÔ∏è XGBoost usando CPU (GPU no disponible)")

# GridSearch
print("\nIniciando GridSearch (esto puede tomar varios minutos)...")
grid_search = GridSearchCV(
    xgb_base,
    param_grid,
    cv=3,
    scoring='f1',
    n_jobs=-1,
    verbose=1
)

grid_search.fit(X_train, y_train)

print(f"\n‚úÖ GridSearch completado!")
print(f"Mejores par√°metros: {grid_search.best_params_}")
print(f"Mejor F1-score en CV: {grid_search.best_score_:.4f}")

In [None]:
# Mejor modelo de XGBoost
xgb_model = grid_search.best_estimator_

# Predicciones
y_pred_xgb = xgb_model.predict(X_test)
y_pred_proba_xgb = xgb_model.predict_proba(X_test)[:, 1]

# M√©tricas
print(f"\nXGBoost Test Metrics:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_xgb):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_xgb):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_xgb):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_xgb):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_xgb):.4f}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_xgb, target_names=['Human', 'AI']))

In [None]:
# Feature importance de XGBoost
xgb_importance = pd.DataFrame({
    'feature': available_features,
    'importance': xgb_model.feature_importances_
}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(xgb_importance['feature'], xgb_importance['importance'])
plt.xlabel('Importance')
plt.title('XGBoost - Feature Importance')
plt.tight_layout()
plt.show()

print("\nTop 5 features m√°s importantes:")
print(xgb_importance.head())

## 6. Random Forest (Comparaci√≥n)

In [None]:
print("=" * 50)
print("MODELO 3: RANDOM FOREST")
print("=" * 50)

# Entrenar Random Forest
rf_model = RandomForestClassifier(
    n_estimators=200,
    max_depth=10,
    min_samples_split=5,
    min_samples_leaf=2,
    random_state=RANDOM_STATE,
    n_jobs=-1
)

rf_model.fit(X_train, y_train)

# Predicciones
y_pred_rf = rf_model.predict(X_test)
y_pred_proba_rf = rf_model.predict_proba(X_test)[:, 1]

# M√©tricas
print(f"\nRandom Forest Test Metrics:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_rf):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_rf):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_rf):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_rf):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_rf):.4f}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_rf, target_names=['Human', 'AI']))

In [None]:
# Feature importance de Random Forest
rf_importance = pd.DataFrame({
    'feature': available_features,
    'importance': rf_model.feature_importances_
}).sort_values('importance', ascending=False)

plt.figure(figsize=(10, 6))
plt.barh(rf_importance['feature'], rf_importance['importance'])
plt.xlabel('Importance')
plt.title('Random Forest - Feature Importance')
plt.tight_layout()
plt.show()

print("\nTop 5 features m√°s importantes:")
print(rf_importance.head())

## 7. PyTorch Neural Network

In [None]:
# Definir arquitectura de la red neuronal
class AIDetectorNN(nn.Module):
    def __init__(self, input_dim):
        super(AIDetectorNN, self).__init__()
        self.fc1 = nn.Linear(input_dim, 128)
        self.bn1 = nn.BatchNorm1d(128)
        self.dropout1 = nn.Dropout(0.3)
        
        self.fc2 = nn.Linear(128, 64)
        self.bn2 = nn.BatchNorm1d(64)
        self.dropout2 = nn.Dropout(0.3)
        
        self.fc3 = nn.Linear(64, 32)
        self.bn3 = nn.BatchNorm1d(32)
        self.dropout3 = nn.Dropout(0.2)
        
        self.fc4 = nn.Linear(32, 1)
        
        self.relu = nn.ReLU()
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        x = self.fc1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.dropout1(x)
        
        x = self.fc2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.dropout2(x)
        
        x = self.fc3(x)
        x = self.bn3(x)
        x = self.relu(x)
        x = self.dropout3(x)
        
        x = self.fc4(x)
        x = self.sigmoid(x)
        
        return x

# Inicializar modelo
input_dim = X_train_scaled.shape[1]
pytorch_model = AIDetectorNN(input_dim).to(device)

print("Arquitectura del modelo PyTorch:")
print(pytorch_model)
print(f"\nN√∫mero de par√°metros: {sum(p.numel() for p in pytorch_model.parameters())}")

In [None]:
print("=" * 50)
print("MODELO 4: PYTORCH NEURAL NETWORK (GPU OPTIMIZADO)")
print("=" * 50)

# Preparar datos para PyTorch
X_train_tensor = torch.FloatTensor(X_train_scaled).to(device)
y_train_tensor = torch.FloatTensor(y_train.values).reshape(-1, 1).to(device)
X_test_tensor = torch.FloatTensor(X_test_scaled).to(device)
y_test_tensor = torch.FloatTensor(y_test.values).reshape(-1, 1).to(device)

print(f"‚úÖ Datos movidos a {device}")
print(f"   Train tensor shape: {X_train_tensor.shape}")
print(f"   Test tensor shape: {X_test_tensor.shape}")

# Crear DataLoaders con pin_memory para GPU
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(
    train_dataset, 
    batch_size=128,  # Batch size mayor para GPU
    shuffle=True,
    num_workers=0,  # 0 porque los datos ya est√°n en GPU
    pin_memory=False  # False porque ya est√°n en GPU
)

# Definir loss y optimizer
criterion = nn.BCELoss()
optimizer = optim.Adam(pytorch_model.parameters(), lr=0.001, weight_decay=1e-5)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.5, patience=5)

# Entrenamiento
num_epochs = 50
train_losses = []
val_losses = []

print("\nüöÄ Entrenando modelo PyTorch...")
import time
start_time = time.time()

for epoch in range(num_epochs):
    # Training
    pytorch_model.train()
    epoch_loss = 0
    for batch_X, batch_y in train_loader:
        optimizer.zero_grad()
        outputs = pytorch_model(batch_X)
        loss = criterion(outputs, batch_y)
        loss.backward()
        optimizer.step()
        epoch_loss += loss.item()
    
    avg_train_loss = epoch_loss / len(train_loader)
    train_losses.append(avg_train_loss)
    
    # Validation
    pytorch_model.eval()
    with torch.no_grad():
        val_outputs = pytorch_model(X_test_tensor)
        val_loss = criterion(val_outputs, y_test_tensor)
        val_losses.append(val_loss.item())
    
    scheduler.step(val_loss)
    
    if (epoch + 1) % 10 == 0:
        elapsed = time.time() - start_time
        print(f"Epoch [{epoch+1}/{num_epochs}] - Train Loss: {avg_train_loss:.4f}, Val Loss: {val_loss:.4f} - Tiempo: {elapsed:.1f}s")

total_time = time.time() - start_time
print(f"\n‚úÖ Entrenamiento completado en {total_time:.2f} segundos!")
print(f"   Tiempo promedio por epoch: {total_time/num_epochs:.2f}s")

In [None]:
# Gr√°fico de p√©rdida durante entrenamiento
plt.figure(figsize=(10, 5))
plt.plot(train_losses, label='Train Loss')
plt.plot(val_losses, label='Validation Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('PyTorch Model - Training Progress')
plt.legend()
plt.grid(True)
plt.show()

In [None]:
# Evaluar modelo PyTorch
pytorch_model.eval()
with torch.no_grad():
    y_pred_proba_pytorch = pytorch_model(X_test_tensor).cpu().numpy()
    y_pred_pytorch = (y_pred_proba_pytorch > 0.5).astype(int).flatten()

# M√©tricas
print(f"\nPyTorch Neural Network Test Metrics:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_pytorch):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_pytorch):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_pytorch):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_pytorch):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_pytorch):.4f}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_pytorch, target_names=['Human', 'AI']))

## 8. Ensemble Model

In [None]:
print("=" * 50)
print("MODELO 5: ENSEMBLE (VOTING CLASSIFIER)")
print("=" * 50)

# Crear ensemble con los mejores modelos (sin PyTorch)
ensemble_model = VotingClassifier(
    estimators=[
        ('lr', lr_model),
        ('xgb', xgb_model),
        ('rf', rf_model)
    ],
    voting='soft'  # Usar probabilidades
)

# Entrenar el ensemble con los datos escalados de entrenamiento
ensemble_model.fit(X_train_scaled, y_train)

# Realizar predicciones
y_pred_ensemble = ensemble_model.predict(X_test_scaled)
y_pred_proba_ensemble = ensemble_model.predict_proba(X_test_scaled)[:, 1]

# M√©tricas
print(f"\nEnsemble Test Metrics:")
print(f"Accuracy: {accuracy_score(y_test, y_pred_ensemble):.4f}")
print(f"Precision: {precision_score(y_test, y_pred_ensemble):.4f}")
print(f"Recall: {recall_score(y_test, y_pred_ensemble):.4f}")
print(f"F1-Score: {f1_score(y_test, y_pred_ensemble):.4f}")
print(f"ROC-AUC: {roc_auc_score(y_test, y_pred_proba_ensemble):.4f}")

print(f"\nClassification Report:")
print(classification_report(y_test, y_pred_ensemble, target_names=['Human', 'AI']))

## 9. Comparaci√≥n de Modelos

In [None]:
# Crear tabla comparativa
results = pd.DataFrame({
    'Model': ['Logistic Regression', 'XGBoost', 'Random Forest', 'PyTorch NN', 'Ensemble'],
    'Accuracy': [
        accuracy_score(y_test, y_pred_lr),
        accuracy_score(y_test, y_pred_xgb),
        accuracy_score(y_test, y_pred_rf),
        accuracy_score(y_test, y_pred_pytorch),
        accuracy_score(y_test, y_pred_ensemble)
    ],
    'Precision': [
        precision_score(y_test, y_pred_lr),
        precision_score(y_test, y_pred_xgb),
        precision_score(y_test, y_pred_rf),
        precision_score(y_test, y_pred_pytorch),
        precision_score(y_test, y_pred_ensemble)
    ],
    'Recall': [
        recall_score(y_test, y_pred_lr),
        recall_score(y_test, y_pred_xgb),
        recall_score(y_test, y_pred_rf),
        recall_score(y_test, y_pred_pytorch),
        recall_score(y_test, y_pred_ensemble)
    ],
    'F1-Score': [
        f1_score(y_test, y_pred_lr),
        f1_score(y_test, y_pred_xgb),
        f1_score(y_test, y_pred_rf),
        f1_score(y_test, y_pred_pytorch),
        f1_score(y_test, y_pred_ensemble)
    ],
    'ROC-AUC': [
        roc_auc_score(y_test, y_pred_proba_lr),
        roc_auc_score(y_test, y_pred_proba_xgb),
        roc_auc_score(y_test, y_pred_proba_rf),
        roc_auc_score(y_test, y_pred_proba_pytorch),
        roc_auc_score(y_test, y_pred_proba_ensemble)
    ]
})

# Ordenar por F1-Score
results = results.sort_values('F1-Score', ascending=False)

print("\n" + "=" * 80)
print("COMPARACI√ìN DE TODOS LOS MODELOS")
print("=" * 80)
print(results.to_string(index=False))

# Identificar el mejor modelo
best_model_name = results.iloc[0]['Model']
best_f1 = results.iloc[0]['F1-Score']
print(f"\nüèÜ MEJOR MODELO: {best_model_name} con F1-Score de {best_f1:.4f}")

In [None]:
# Visualizaci√≥n comparativa
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Gr√°fico 1: M√©tricas por modelo
metrics_to_plot = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
results_melted = results.melt(id_vars='Model', value_vars=metrics_to_plot, 
                               var_name='Metric', value_name='Score')

ax1 = axes[0, 0]
for model in results['Model']:
    model_data = results_melted[results_melted['Model'] == model]
    ax1.plot(model_data['Metric'], model_data['Score'], marker='o', label=model)
ax1.set_ylabel('Score')
ax1.set_title('Comparaci√≥n de M√©tricas por Modelo')
ax1.legend()
ax1.grid(True, alpha=0.3)
min_metric_score = max(0.0, results_melted['Score'].min() - 0.05)
ax1.set_ylim([min_metric_score, 1.0])

# Gr√°fico 2: F1-Score comparison
ax2 = axes[0, 1]
colors = ['#4ECDC4' if model == best_model_name else '#FF6B6B' for model in results['Model']]
ax2.barh(results['Model'], results['F1-Score'], color=colors)
ax2.set_xlabel('F1-Score')
ax2.set_title('F1-Score por Modelo')
min_f1_score = max(0.0, results['F1-Score'].min() - 0.05)
ax2.set_xlim([min_f1_score, 1.0])

# Gr√°fico 3: ROC Curves
ax3 = axes[1, 0]

# ROC para cada modelo
models_roc = [
    ('Logistic Regression', y_pred_proba_lr),
    ('XGBoost', y_pred_proba_xgb),
    ('Random Forest', y_pred_proba_rf),
    ('PyTorch NN', y_pred_proba_pytorch.flatten()),
    ('Ensemble', y_pred_proba_ensemble)
]

for name, y_proba in models_roc:
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    auc = roc_auc_score(y_test, y_proba)
    ax3.plot(fpr, tpr, label=f'{name} (AUC={auc:.3f})')

ax3.plot([0, 1], [0, 1], 'k--', label='Random')
ax3.set_xlabel('False Positive Rate')
ax3.set_ylabel('True Positive Rate')
ax3.set_title('ROC Curves - Todos los Modelos')
ax3.legend(loc='lower right')
ax3.grid(True, alpha=0.3)

# Gr√°fico 4: Confusion Matrix del mejor modelo
ax4 = axes[1, 1]
# Usar XGBoost como ejemplo (generalmente el mejor)
cm = confusion_matrix(y_test, y_pred_xgb)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=ax4,
            xticklabels=['Human', 'AI'], yticklabels=['Human', 'AI'])
ax4.set_title(f'Confusion Matrix - {best_model_name}')
ax4.set_ylabel('True Label')
ax4.set_xlabel('Predicted Label')

plt.tight_layout()
plt.show()

## 10. An√°lisis de Errores

In [None]:
# Analizar errores del mejor modelo (XGBoost)
errors_df = X_test.copy()
errors_df['true_label'] = y_test.values
errors_df['predicted_label'] = y_pred_xgb
errors_df['prediction_proba'] = y_pred_proba_xgb
errors_df['correct'] = errors_df['true_label'] == errors_df['predicted_label']

# False Positives (predijo AI pero era Human)
false_positives = errors_df[(errors_df['true_label'] == 0) & (errors_df['predicted_label'] == 1)]
print(f"False Positives (predijo AI, era Human): {len(false_positives)}")

# False Negatives (predijo Human pero era AI)
false_negatives = errors_df[(errors_df['true_label'] == 1) & (errors_df['predicted_label'] == 0)]
print(f"False Negatives (predijo Human, era AI): {len(false_negatives)}")

print(f"\nTotal de errores: {len(false_positives) + len(false_negatives)}")
print(f"Total de predicciones correctas: {errors_df['correct'].sum()}")
print(f"Accuracy: {errors_df['correct'].mean():.4f}")

In [None]:
# Analizar caracter√≠sticas de los errores
if len(false_positives) > 0:
    print("\nCaracter√≠sticas promedio de False Positives (Human clasificado como AI):")
    print(false_positives[available_features].mean())

if len(false_negatives) > 0:
    print("\nCaracter√≠sticas promedio de False Negatives (AI clasificado como Human):")
    print(false_negatives[available_features].mean())

## 11. Conclusiones y Recomendaciones

In [None]:
print("=" * 80)
print("CONCLUSIONES FINALES")
print("=" * 80)

print(f"\n1. Mejor Modelo: {best_model_name}")
print(f"   - F1-Score: {best_f1:.4f}")
print(f"   - Accuracy: {results[results['Model'] == best_model_name]['Accuracy'].values[0]:.4f}")

print(f"\n2. Features m√°s importantes (XGBoost):")
for idx, row in xgb_importance.head(5).iterrows():
    print(f"   - {row['feature']}: {row['importance']:.4f}")

print(f"\n3. Comparaci√≥n de enfoques:")
print(f"   - Modelos tradicionales (LR, XGB, RF): R√°pidos, interpretables, excelente rendimiento")
print(f"   - PyTorch NN: Comparable pero m√°s complejo y lento de entrenar")
print(f"   - Ensemble: Combina lo mejor de m√∫ltiples modelos")

print(f"\n4. Recomendaci√≥n:")
if best_model_name == 'XGBoost':
    print(f"   ‚úÖ Usar XGBoost para producci√≥n: mejor balance de rendimiento/velocidad")
elif best_model_name == 'Ensemble':
    print(f"   ‚úÖ Usar Ensemble si la latencia no es cr√≠tica: m√°ximo rendimiento")
else:
    print(f"   ‚úÖ Usar {best_model_name} seg√∫n tus necesidades espec√≠ficas")

print(f"\n5. Pr√≥ximos pasos:")
print(f"   - Analizar textos espec√≠ficos que causan errores")
print(f"   - Ajustar threshold de clasificaci√≥n seg√∫n el balance precision/recall deseado")
print(f"   - Considerar features adicionales si hay disponibles")
print(f"   - Validar con datos de producci√≥n")
print(f"   - Guardar el mejor modelo para deployment")