# √çNDICE - SCIKIT-LEARN CHEAT SHEET

## üìö Tabla de Contenidos

### Preparaci√≥n de Datos
- [1. Importaciones B√°sicas](#1-importaciones-basicas)
- [2. Divisi√≥n de Datos - train_test_split](#2-division-de-datos---train_test_split)
- [3. Escalado y Normalizaci√≥n de Datos](#3-escalado-y-normalizacion-de-datos)
- [4. Codificaci√≥n de Variables Categ√≥ricas](#4-codificacion-de-variables-categoricas)
  - [4.1. Variance Threshold](#41-variance-threshold)

### Modelos Supervisados
#### Regresi√≥n
- [5. Modelos de Regresi√≥n](#5-modelos-de-regresion---predecir-valores-continuos)
  - [5.1. ElasticNetCV](#51-elasticnetcv)

#### Clasificaci√≥n
- [6. Regresi√≥n Log√≠stica](#6-regresion-logistica---clasificacion-binaria-y-multiclase)
  - [6.1. LogisticRegressionCV](#61-logisticregressioncv)
- [7. √Årboles de Decisi√≥n](#7-arboles-de-decision)
- [8. Random Forest](#8-random-forest---ensemble-de-arboles)
- [9. Gradient Boosting](#9-gradient-boosting---ensemble-secuencial)
- [10. Support Vector Machines (SVM)](#10-support-vector-machines-svm)
- [11. K-Nearest Neighbors (KNN)](#11-k-nearest-neighbors-knn)
- [12. Naive Bayes](#12-naive-bayes)

### Modelos No Supervisados
- [13. Clustering](#13-clustering---aprendizaje-no-supervisado)
  - [13.1. KMeans - Gu√≠a Completa](#131-kmeans---gu√≠a-completa)
  - [13.2. Hierarchical Clustering](#132-hierarchical-clustering---gu√≠a-completa-dendrogram-y-linkage)

### Evaluaci√≥n y M√©tricas
- [14. Guardar y Cargar Modelos](#14-guardar-y-cargar-modelos)
- [15. M√©tricas de Evaluaci√≥n - Clasificaci√≥n](#15-metricas-de-evaluacion---clasificacion)
  - [15.1. ROC Curve y PR Curve](#151-roc-curve-y-pr-curve---gu√≠a-completa)
- [16. M√©tricas de Evaluaci√≥n - Regresi√≥n](#16-metricas-de-evaluacion---regresion)

---

SCIKIT-LEARN (sklearn)

## 1. Importaciones Basicas

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, MinMaxScaler, RobustScaler, LabelEncoder, OneHotEncoder, OrdinalEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

## 2. Division de Datos - train_test_split

In [None]:
# Division basica 80-20
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Division con estratificacion (mantiene proporcion de clases)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Division en tres conjuntos: train, validation, test
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42)

# Parametros importantes:
# test_size: proporcion del conjunto de prueba (0.2 = 20%)
# random_state: semilla para reproducibilidad
# stratify: mantiene la distribucion de clases en train y test
# shuffle: mezcla los datos antes de dividir (True por defecto)

## 3. Escalado y Normalizacion de Datos

In [None]:
# StandardScaler: media 0 y desviacion estandar 1
# Formula: (x - media) / desviacion_estandar
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)  # ajusta y transforma train
X_test_scaled = scaler.transform(X_test)        # solo transforma test

# MinMaxScaler: escala entre 0 y 1
# Formula: (x - min) / (max - min)
scaler = MinMaxScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# MinMaxScaler personalizado: escala en rango especifico
scaler = MinMaxScaler(feature_range=(0, 10))  # escala entre 0 y 10
X_train_scaled = scaler.fit_transform(X_train)

# RobustScaler: robusto a outliers, usa mediana y rango intercuartil
# Formula: (x - mediana) / IQR
scaler = RobustScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# IMPORTANTE: siempre fit solo en train, transform en train y test

## 4. Codificacion de Variables Categoricas - Gu√≠a Completa

Las variables categ√≥ricas no pueden ser procesadas directamente por algoritmos de ML.
Necesitamos convertirlas a formato num√©rico. Existen diferentes m√©todos seg√∫n el tipo de variable:

### Tipos de Variables Categ√≥ricas:
- **Nominales**: Sin orden (color: rojo, azul, verde)
- **Ordinales**: Con orden (talla: S, M, L, XL)
- **Binarias**: Solo dos valores (s√≠/no, True/False)

### M√©todos de Codificaci√≥n:
1. **LabelEncoder**: Para target (y) o variables ordinales
2. **OneHotEncoder**: Para variables nominales (features)
3. **OrdinalEncoder**: Para variables ordinales con orden espec√≠fico
4. **get_dummies**: Alternativa simple de pandas

In [None]:
# ============================================
# 1. LABELENCODER - Para variable objetivo (y)
# ============================================
# Convierte categor√≠as a n√∫meros enteros (0, 1, 2, ...)
# ‚ö†Ô∏è NO usar para features (X) porque implica orden que no existe

from sklearn.preprocessing import LabelEncoder

# Ejemplo: variable objetivo con clases
y = np.array(['gato', 'perro', 'gato', 'pajaro', 'perro', 'pajaro'])

le = LabelEncoder()
y_encoded = le.fit_transform(y)
print(f'Original: {y}')
print(f'Encoded:  {y_encoded}')  # [0, 1, 0, 2, 1, 2]
print(f'Classes:  {le.classes_}')  # ['gato', 'pajaro', 'perro']

# Volver a categor√≠as originales
y_decoded = le.inverse_transform(y_encoded)
print(f'Decoded:  {y_decoded}')

# Ver el mapeo completo
mapeo = dict(zip(le.classes_, le.transform(le.classes_)))
print(f'Mapeo: {mapeo}')  # {'gato': 0, 'pajaro': 1, 'perro': 2}

# ============================================
# 2. ONEHOTENCODER - Para variables nominales (Features)
# ============================================
# Crea columnas binarias (0/1) para cada categor√≠a
# ‚úÖ Usar para variables sin orden (color, pa√≠s, categor√≠a)

from sklearn.preprocessing import OneHotEncoder

# Datos de ejemplo
df = pd.DataFrame({
    'color': ['rojo', 'azul', 'verde', 'rojo', 'azul'],
    'tama√±o': ['S', 'M', 'L', 'M', 'S'],
    'precio': [10, 20, 30, 15, 25]
})

# ============================================
# 2.1. OneHotEncoder B√ÅSICO
# ============================================
encoder = OneHotEncoder(sparse_output=False)  # sparse_output=False para array denso
X_encoded = encoder.fit_transform(df[['color']])

print(f'\nOriginal:\n{df["color"].values}')
print(f'\nEncoded:\n{X_encoded}')
print(f'Feature names: {encoder.get_feature_names_out()}')
# Resultado: ['color_azul', 'color_rojo', 'color_verde']

# ============================================
# 2.2. OneHotEncoder con drop='first'
# ============================================
# drop='first' elimina primera categor√≠a para evitar multicolinealidad
# Si tenemos n categor√≠as, solo necesitamos n-1 columnas
encoder = OneHotEncoder(sparse_output=False, drop='first')
X_encoded = encoder.fit_transform(df[['color']])

print(f'\nEncoded (drop=first):\n{X_encoded}')
print(f'Feature names: {encoder.get_feature_names_out()}')
# Resultado: ['color_rojo', 'color_verde'] (azul eliminado)

# ============================================
# 2.3. OneHotEncoder M√öLTIPLES COLUMNAS
# ============================================
encoder = OneHotEncoder(sparse_output=False, drop='first')
X_encoded = encoder.fit_transform(df[['color', 'tama√±o']])

print(f'\nMultiple columns encoded:\n{X_encoded}')
print(f'Feature names: {encoder.get_feature_names_out()}')
# Resultado: ['color_rojo', 'color_verde', 'tama√±o_L', 'tama√±o_M', 'tama√±o_S']

# ============================================
# 2.4. OneHotEncoder MANTENIENDO DATAFRAME
# ============================================
# Mantener estructura de DataFrame para mejor legibilidad
encoder = OneHotEncoder(sparse_output=False, drop='first')
X_encoded = encoder.fit_transform(df[['color', 'tama√±o']])

# Crear DataFrame con nombres de columnas
df_encoded = pd.DataFrame(
    X_encoded, 
    columns=encoder.get_feature_names_out(),
    index=df.index
)

print(f'\nDataFrame encoded:\n{df_encoded}')

# Combinar con columnas num√©ricas originales
df_final = pd.concat([df[['precio']], df_encoded], axis=1)
print(f'\nDataFrame final:\n{df_final}')

# ============================================
# 2.5. OneHotEncoder con handle_unknown='ignore'
# ============================================
# Manejar categor√≠as nuevas en datos de test
encoder = OneHotEncoder(sparse_output=False, drop='first', handle_unknown='ignore')
encoder.fit(df[['color']])

# Datos de test con categor√≠a nueva 'amarillo'
df_test = pd.DataFrame({'color': ['rojo', 'amarillo', 'verde']})
X_test_encoded = encoder.transform(df_test[['color']])

print(f'\nTest con categor√≠a nueva:\n{X_test_encoded}')
# 'amarillo' no existe ‚Üí todas las columnas en 0

# ============================================
# 2.6. OneHotEncoder - Convertir SPARSE a DENSO
# ============================================
# Por defecto, OneHotEncoder devuelve matriz sparse (eficiente en memoria)
encoder = OneHotEncoder()  # sparse_output=True por defecto
X_sparse = encoder.fit_transform(df[['color']])

print(f'\nTipo sparse: {type(X_sparse)}')  # <class 'scipy.sparse._csr.csr_matrix'>
print(f'Shape: {X_sparse.shape}')

# Convertir a array denso si es necesario
X_dense = X_sparse.toarray()
print(f'Tipo denso: {type(X_dense)}')  # <class 'numpy.ndarray'>

# ‚ö†Ô∏è IMPORTANTE: StandardScaler no funciona con sparse ‚Üí usar .toarray()

# ============================================
# 3. ORDINALENCODER - Para variables ordinales
# ============================================
# Convierte categor√≠as a n√∫meros RESPETANDO ORDEN
# ‚úÖ Usar cuando existe orden l√≥gico: bajo < medio < alto

from sklearn.preprocessing import OrdinalEncoder

# Datos con orden
df_ordinal = pd.DataFrame({
    'educacion': ['Secundaria', 'Universidad', 'Primaria', 'Postgrado', 'Universidad'],
    'satisfaccion': ['Bajo', 'Medio', 'Alto', 'Medio', 'Bajo']
})

# ============================================
# 3.1. OrdinalEncoder B√ÅSICO
# ============================================
# Definir orden de categor√≠as expl√≠citamente
encoder = OrdinalEncoder(
    categories=[
        ['Primaria', 'Secundaria', 'Universidad', 'Postgrado'],
        ['Bajo', 'Medio', 'Alto']
    ]
)
X_encoded = encoder.fit_transform(df_ordinal)

print(f'\nOriginal:\n{df_ordinal}')
print(f'\nOrdinal encoded:\n{X_encoded}')
# Primaria=0, Secundaria=1, Universidad=2, Postgrado=3
# Bajo=0, Medio=1, Alto=2

# Ver mapeo
for i, col in enumerate(df_ordinal.columns):
    print(f'{col}: {encoder.categories_[i]}')

# ============================================
# 3.2. OrdinalEncoder sin especificar orden
# ============================================
# Si no especificas categories, usa orden alfab√©tico (‚ö†Ô∏è puede no ser lo que quieres)
encoder_auto = OrdinalEncoder()
X_auto = encoder_auto.fit_transform(df_ordinal)

print(f'\nOrdinal autom√°tico:\n{X_auto}')
print(f'Categor√≠as detectadas: {encoder_auto.categories_}')

# ============================================
# 3.3. OrdinalEncoder con handle_unknown='use_encoded_value'
# ============================================
# Asignar valor espec√≠fico a categor√≠as desconocidas
encoder = OrdinalEncoder(
    categories=[['Primaria', 'Secundaria', 'Universidad', 'Postgrado']],
    handle_unknown='use_encoded_value',
    unknown_value=-1  # valor para categor√≠as desconocidas
)
encoder.fit(df_ordinal[['educacion']])

# Test con categor√≠a nueva
df_test = pd.DataFrame({'educacion': ['Universidad', 'Doctorado', 'Primaria']})
X_test = encoder.transform(df_test)
print(f'\nTest con desconocido:\n{X_test}')  # Doctorado ‚Üí -1

# ============================================
# 4. pd.get_dummies() - Alternativa de Pandas
# ============================================
# Forma r√°pida de hacer One-Hot Encoding sin sklearn
df = pd.DataFrame({
    'color': ['rojo', 'azul', 'verde', 'rojo'],
    'tama√±o': ['S', 'M', 'L', 'M'],
    'precio': [10, 20, 30, 15]
})

# ============================================
# 4.1. get_dummies B√ÅSICO
# ============================================
df_dummies = pd.get_dummies(df, columns=['color'])
print(f'\nget_dummies b√°sico:\n{df_dummies}')

# ============================================
# 4.2. get_dummies con drop_first=True
# ============================================
df_dummies = pd.get_dummies(df, columns=['color', 'tama√±o'], drop_first=True)
print(f'\nget_dummies (drop_first):\n{df_dummies}')

# ============================================
# 4.3. get_dummies con prefijo personalizado
# ============================================
df_dummies = pd.get_dummies(df, columns=['color'], prefix='col')
print(f'\nget_dummies con prefijo:\n{df_dummies}')

# ============================================
# 4.4. get_dummies SOLO columnas categ√≥ricas autom√°tico
# ============================================
df_dummies = pd.get_dummies(df, drop_first=True)
# Detecta autom√°ticamente columnas object/category
print(f'\nget_dummies autom√°tico:\n{df_dummies}')

# ============================================
# 5. COMPARACI√ìN: sklearn vs pandas
# ============================================
# sklearn (OneHotEncoder):
# ‚úÖ Mantiene el encoder para aplicar a test
# ‚úÖ Maneja unknown categories
# ‚úÖ Controla sparse/dense
# ‚ùå M√°s verbose

# pandas (get_dummies):
# ‚úÖ M√°s simple y r√°pido
# ‚úÖ Se integra bien con DataFrames
# ‚ùå No mantiene encoder para test
# ‚ùå Dif√≠cil manejar unknown categories

# ============================================
# 6. WORKFLOW COMPLETO - Train y Test
# ============================================
print("\n" + "="*50)
print("WORKFLOW COMPLETO")
print("="*50)

# Datos de ejemplo
df_train = pd.DataFrame({
    'color': ['rojo', 'azul', 'verde', 'rojo', 'azul'],
    'tama√±o': ['S', 'M', 'L', 'M', 'S'],
    'precio': [10, 20, 30, 15, 25],
    'comprado': [1, 0, 1, 1, 0]
})

df_test = pd.DataFrame({
    'color': ['verde', 'rojo', 'azul'],
    'tama√±o': ['L', 'S', 'M'],
    'precio': [28, 12, 22]
})

# 6.1. Separar features y target
X_train = df_train[['color', 'tama√±o', 'precio']]
y_train = df_train['comprado']
X_test = df_test[['color', 'tama√±o', 'precio']]

# 6.2. Aplicar OneHotEncoder
encoder = OneHotEncoder(sparse_output=False, drop='first', handle_unknown='ignore')
X_train_cat = encoder.fit_transform(X_train[['color', 'tama√±o']])
X_test_cat = encoder.transform(X_test[['color', 'tama√±o']])

# 6.3. Crear DataFrames
df_train_encoded = pd.DataFrame(
    X_train_cat,
    columns=encoder.get_feature_names_out(),
    index=X_train.index
)
df_test_encoded = pd.DataFrame(
    X_test_cat,
    columns=encoder.get_feature_names_out(),
    index=X_test.index
)

# 6.4. Combinar con variables num√©ricas
X_train_final = pd.concat([X_train[['precio']], df_train_encoded], axis=1)
X_test_final = pd.concat([X_test[['precio']], df_test_encoded], axis=1)

print(f'\nX_train_final:\n{X_train_final}')
print(f'\nX_test_final:\n{X_test_final}')

# ============================================
# 7. CONSEJOS Y MEJORES PR√ÅCTICAS
# ============================================
# 1. LabelEncoder: SOLO para target (y), NO para features (X)
# 2. OneHotEncoder: para variables nominales sin orden
# 3. OrdinalEncoder: para variables con orden l√≥gico
# 4. Siempre usar drop='first' para evitar multicolinealidad
# 5. Usar handle_unknown='ignore' en producci√≥n
# 6. Fit en train, transform en train y test
# 7. Si usas pd.get_dummies, aseg√∫rate columnas test = columnas train
# 8. OneHotEncoder con sparse=True ahorra memoria en datasets grandes
# 9. Convertir sparse a dense (.toarray()) antes de StandardScaler
# 10. Para muchas categor√≠as (>50), considerar Target Encoding o Frequency Encoding

# ============================================
# 8. MANEJO DE ALTA CARDINALIDAD
# ============================================
# Cuando una variable tiene muchas categor√≠as (>50), OneHot puede crear demasiadas columnas

# Opci√≥n 1: Agrupar categor√≠as menos frecuentes
def agrupar_categorias_raras(df, columna, threshold=0.05):
    """Agrupa categor√≠as con frecuencia < threshold en 'Otros'"""
    freq = df[columna].value_counts(normalize=True)
    categorias_raras = freq[freq < threshold].index
    df[columna] = df[columna].replace(categorias_raras, 'Otros')
    return df

# Opci√≥n 2: Target Encoding (mean encoding)
# Reemplaza categor√≠a por media del target para esa categor√≠a
def target_encoding(df_train, df_test, columna, target):
    """Target encoding simple"""
    means = df_train.groupby(columna)[target].mean()
    df_train[f'{columna}_encoded'] = df_train[columna].map(means)
    df_test[f'{columna}_encoded'] = df_test[columna].map(means)
    # Manejar categor√≠as nuevas con media global
    global_mean = df_train[target].mean()
    df_test[f'{columna}_encoded'].fillna(global_mean, inplace=True)
    return df_train, df_test

# Opci√≥n 3: Frequency Encoding
# Reemplaza categor√≠a por su frecuencia
def frequency_encoding(df_train, df_test, columna):
    """Frequency encoding"""
    freq = df_train[columna].value_counts(normalize=True)
    df_train[f'{columna}_freq'] = df_train[columna].map(freq)
    df_test[f'{columna}_freq'] = df_test[columna].map(freq)
    df_test[f'{columna}_freq'].fillna(0, inplace=True)
    return df_train, df_test

### 4.1. Variance Threshold

In [None]:
# Colocar esta celda como "## 4.1 VarianceThreshold" justo despu√©s de la secci√≥n 3 (Escalado)
from sklearn.feature_selection import VarianceThreshold

# VarianceThreshold: elimina features con varianza <= threshold
# threshold=0.0 elimina columnas constantes (varianza cero).
# Ejemplo: aplicar sobre X_train/X_test (asume que existen X_train, X_test)
selector = VarianceThreshold(threshold=0.0)
try:
    # Mantener nombres de columnas si X_train es DataFrame
    if hasattr(X_train, "columns"):
        X_train_vt = selector.fit_transform(X_train)
        cols_sel = X_train.columns[selector.get_support()]
        X_train_vt = pd.DataFrame(X_train_vt, columns=cols_sel, index=getattr(X_train, "index", None))
        X_test_vt = pd.DataFrame(selector.transform(X_test), columns=cols_sel, index=getattr(X_test, "index", None))
    else:
        X_train_vt = selector.fit_transform(X_train)
        X_test_vt = selector.transform(X_test)

    print(f'Features originales: {getattr(X_train, "shape", (None, None))[1]}')
    print(f'Features retenidos despu√©s de VarianceThreshold: {selector.get_support().sum()}')
    print('Columnas retenidas:' , list(getattr(cols_sel, "tolist", lambda: cols_sel)()))
except NameError:
    # Si no hay X_train/X_test definidos, mostrar ejemplo minimal
    print("X_train/X_test no definidos en este entorno. Ejemplo m√≠nimo con array aleatorio:")
    X = np.random.rand(100, 5)
    X[:, 0] = 1.0  # columna constante
    sel = VarianceThreshold(threshold=0.0).fit(X)
    print("Varianzas:", sel.variances_)
    print("Soporte:", sel.get_support())

# Uso pr√°ctico: ajustar umbral para eliminar features con baja variabilidad
# selector = VarianceThreshold(threshold=0.01)  # por ejemplo, eliminar var < 0.01
# X_train_vt = selector.fit_transform(X_train_scaled)  # normalmente usar datos escalados/preprocesados
# X_test_vt = selector.transform(X_test_scaled)
# ...existing code...

## 5. Modelos de Regresion - Predecir valores continuos

In [None]:
from sklearn.linear_model import LinearRegression, Ridge, Lasso, ElasticNet

# Regresion Lineal simple
modelo = LinearRegression()
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
print(f'Coeficientes: {modelo.coef_}')
print(f'Intercepto: {modelo.intercept_}')
print(f'R2 Score: {modelo.score(X_test, y_test)}')

# Ridge Regression: regularizacion L2, penaliza coeficientes grandes
# Util cuando hay multicolinealidad
modelo = Ridge(alpha=1.0)  # alpha controla la regularizacion
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Lasso Regression: regularizacion L1, puede hacer coeficientes = 0
# Util para seleccion de features
modelo = Lasso(alpha=1.0)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# ElasticNet: combina L1 y L2
modelo = ElasticNet(alpha=1.0, l1_ratio=0.5)  # l1_ratio controla mezcla L1/L2
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

### 5.1. ElasticNetCV - Regresi√≥n con Validaci√≥n Cruzada Autom√°tica

ElasticNetCV es la versi√≥n de ElasticNet con **validaci√≥n cruzada incorporada** que busca autom√°ticamente los mejores hiperpar√°metros (alpha y l1_ratio).

**¬øPor qu√© usar ElasticNetCV?**
- ‚úÖ Busca autom√°ticamente el mejor `alpha` (fuerza de regularizaci√≥n)
- ‚úÖ Busca autom√°ticamente el mejor `l1_ratio` (mezcla L1/L2)
- ‚úÖ Evita overfitting mediante validaci√≥n cruzada
- ‚úÖ Combina ventajas de Ridge (L2) y Lasso (L1)
- ‚úÖ Ideal para datasets con multicolinealidad y muchas features

**Cu√°ndo usar ElasticNetCV:**
- Problemas de regresi√≥n con muchas features
- Cuando hay multicolinealidad entre variables
- Cuando quieres selecci√≥n autom√°tica de features (L1) + estabilidad (L2)
- En ex√°menes: cuando piden "fix optimal parameters" con CV

In [None]:
from sklearn.linear_model import ElasticNetCV, ElasticNet
from sklearn.metrics import mean_squared_error, r2_score, explained_variance_score
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# ============================================
# 1. ELASTICNETCV - USO B√ÅSICO
# ============================================
# ElasticNetCV busca autom√°ticamente los mejores alpha y l1_ratio

# Configuraci√≥n b√°sica
modelo = ElasticNetCV(
    cv=5,                      # n√∫mero de folds de validaci√≥n cruzada
    random_state=42,
    n_jobs=-1                  # usar todos los cores
)
modelo.fit(X_train_scaled, y_train)
y_pred = modelo.predict(X_test_scaled)

# Ver los mejores par√°metros encontrados
print(f'Mejor alpha: {modelo.alpha_}')
print(f'Mejor l1_ratio: {modelo.l1_ratio_}')
print(f'R¬≤ Score: {modelo.score(X_test_scaled, y_test):.4f}')

# ============================================
# 2. ELASTICNETCV - CONFIGURACI√ìN COMPLETA (ESTILO EXAMEN)
# ============================================
# Configuraci√≥n t√≠pica para ex√°menes como el 2024

modelo = ElasticNetCV(
    l1_ratio=[0.1, 0.3, 0.5, 0.7, 0.9, 0.95, 0.99, 1.0],  # rango de mezcla L1/L2
    alphas=None,               # None = genera autom√°ticamente 100 valores
    cv=5,                      # 5-folds cross-validation (como en examen)
    max_iter=10000,            # m√°ximo de iteraciones
    tol=1e-4,                  # tolerancia de convergencia
    n_jobs=-1,                 # paralelizaci√≥n
    random_state=42,
    selection='cyclic'         # 'cyclic' o 'random' para actualizaci√≥n de coeficientes
)

# IMPORTANTE: siempre escalar datos antes de ElasticNet
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Entrenar modelo
modelo.fit(X_train_scaled, y_train)

# Predicciones
y_pred_train = modelo.predict(X_train_scaled)
y_pred_test = modelo.predict(X_test_scaled)

# ============================================
# 3. HIPERPAR√ÅMETROS - Explicaci√≥n Detallada
# ============================================
print("\n" + "="*60)
print("HIPERPAR√ÅMETROS DE ELASTICNETCV")
print("="*60)

# l1_ratio: mezcla entre L1 (Lasso) y L2 (Ridge)
# - l1_ratio = 0.0: Solo Ridge (L2) - no elimina features
# - l1_ratio = 0.5: 50% Lasso + 50% Ridge - balance
# - l1_ratio = 1.0: Solo Lasso (L1) - elimina features
print(f'\nBest l1_ratio: {modelo.l1_ratio_:.4f}')
print('  ‚Üí 0.0 = Ridge puro (L2)')
print('  ‚Üí 0.5 = Balance L1/L2')
print('  ‚Üí 1.0 = Lasso puro (L1)')

# alpha: fuerza de regularizaci√≥n
# - alpha bajo (ej: 0.001): poca regularizaci√≥n, puede overfitting
# - alpha alto (ej: 10): mucha regularizaci√≥n, puede underfitting
print(f'\nBest alpha: {modelo.alpha_:.6f}')
print('  ‚Üí Controla la fuerza de la penalizaci√≥n')
print('  ‚Üí Menor alpha = m√°s flexible (m√°s overfitting)')
print('  ‚Üí Mayor alpha = m√°s regularizaci√≥n (m√°s simple)')

# ============================================
# 4. M√âTRICAS DE EVALUACI√ìN (ESTILO EXAMEN)
# ============================================
print("\n" + "="*60)
print("M√âTRICAS DE EVALUACI√ìN")
print("="*60)

# Explained Variance Score (pedido en examen 2024)
explained_var_train = explained_variance_score(y_train, y_pred_train)
explained_var_test = explained_variance_score(y_test, y_pred_test)
print(f'\nExplained Variance (train): {explained_var_train:.4f}')
print(f'Explained Variance (test):  {explained_var_test:.4f}')

# R¬≤ Score
r2_train = r2_score(y_train, y_pred_train)
r2_test = r2_score(y_test, y_pred_test)
print(f'\nR¬≤ Score (train): {r2_train:.4f}')
print(f'R¬≤ Score (test):  {r2_test:.4f}')

# RMSE
rmse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))
rmse_test = np.sqrt(mean_squared_error(y_test, y_pred_test))
print(f'\nRMSE (train): {rmse_train:.4f}')
print(f'RMSE (test):  {rmse_test:.4f}')

# ============================================
# 5. FEATURES M√ÅS RELEVANTES (PEDIDO EN EXAMEN)
# ============================================
print("\n" + "="*60)
print("FEATURES M√ÅS RELEVANTES")
print("="*60)

# Obtener coeficientes
coeficientes = modelo.coef_

# Crear DataFrame con features y coeficientes
if hasattr(X_train, 'columns'):
    feature_names = X_train.columns
else:
    feature_names = [f'Feature_{i}' for i in range(X_train.shape[1])]

df_coef = pd.DataFrame({
    'Feature': feature_names,
    'Coeficiente': coeficientes,
    'Abs_Coef': np.abs(coeficientes)
}).sort_values('Abs_Coef', ascending=False)

print(f'\nTotal features: {len(coeficientes)}')
print(f'Features no nulas: {np.sum(coeficientes != 0)}')
print(f'Features eliminadas (coef=0): {np.sum(coeficientes == 0)}')

print('\nTop 10 features m√°s importantes:')
print(df_coef.head(10).to_string(index=False))

# Visualizar coeficientes
plt.figure(figsize=(12, 6))
top_n = min(20, len(df_coef))
df_top = df_coef.head(top_n)
colors = ['red' if c < 0 else 'green' for c in df_top['Coeficiente']]
plt.barh(range(top_n), df_top['Coeficiente'].values, color=colors, alpha=0.7)
plt.yticks(range(top_n), df_top['Feature'].values)
plt.xlabel('Coeficiente')
plt.title(f'Top {top_n} Features M√°s Importantes (ElasticNetCV)')
plt.axvline(x=0, color='black', linestyle='--', linewidth=0.8)
plt.tight_layout()
plt.show()

# ============================================
# 6. SCATTERPLOT: PREDICCI√ìN vs REAL (PEDIDO EN EXAMEN)
# ============================================
print("\n" + "="*60)
print("VISUALIZACI√ìN: PREDICCI√ìN vs REAL")
print("="*60)

# Scatterplot test vs predicted
plt.figure(figsize=(10, 6))
plt.scatter(y_test, y_pred_test, alpha=0.6, edgecolors='k', linewidths=0.5)
plt.plot([y_test.min(), y_test.max()], 
         [y_test.min(), y_test.max()], 
         'r--', lw=2, label='Predicci√≥n Perfecta')
plt.xlabel('Valor Real (Test)', fontsize=12)
plt.ylabel('Valor Predicho', fontsize=12)
plt.title('ElasticNetCV: Predicci√≥n vs Valor Real', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# Scatterplot con densidad de puntos
from scipy.stats import gaussian_kde
xy = np.vstack([y_test, y_pred_test])
z = gaussian_kde(xy)(xy)

plt.figure(figsize=(10, 6))
scatter = plt.scatter(y_test, y_pred_test, c=z, s=50, cmap='viridis', alpha=0.7)
plt.plot([y_test.min(), y_test.max()], 
         [y_test.min(), y_test.max()], 
         'r--', lw=2, label='Predicci√≥n Perfecta')
plt.colorbar(scatter, label='Densidad')
plt.xlabel('Valor Real (Test)', fontsize=12)
plt.ylabel('Valor Predicho', fontsize=12)
plt.title('ElasticNetCV: Predicci√≥n vs Real (con densidad)', fontsize=14)
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# ============================================
# 7. CORRELACI√ìN SPEARMAN (PEDIDO EN EXAMEN)
# ============================================
from scipy.stats import spearmanr, pearsonr

# Correlaci√≥n de Spearman (no param√©trica, robusta a outliers)
corr_spearman, p_value_spearman = spearmanr(y_test, y_pred_test)
print(f'\nCorrelaci√≥n de Spearman: {corr_spearman:.4f}')
print(f'P-value: {p_value_spearman:.4e}')

# Tambi√©n calcular Pearson para comparar
corr_pearson, p_value_pearson = pearsonr(y_test, y_pred_test)
print(f'\nCorrelaci√≥n de Pearson: {corr_pearson:.4f}')
print(f'P-value: {p_value_pearson:.4e}')

# Interpretaci√≥n
print('\nInterpretaci√≥n de correlaci√≥n:')
print('  0.9-1.0: Correlaci√≥n muy fuerte')
print('  0.7-0.9: Correlaci√≥n fuerte')
print('  0.5-0.7: Correlaci√≥n moderada')
print('  0.3-0.5: Correlaci√≥n d√©bil')
print('  0.0-0.3: Correlaci√≥n muy d√©bil')

# ============================================
# 8. AN√ÅLISIS DE RESIDUOS
# ============================================
print("\n" + "="*60)
print("AN√ÅLISIS DE RESIDUOS")
print("="*60)

residuos = y_test - y_pred_test

# Estad√≠sticas de residuos
print(f'\nMedia de residuos: {np.mean(residuos):.4f} (debe estar cerca de 0)')
print(f'Desviaci√≥n est√°ndar: {np.std(residuos):.4f}')
print(f'Residuo m√°ximo: {np.max(np.abs(residuos)):.4f}')

# Gr√°ficos de residuos
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# 1. Residuos vs Predicciones
axes[0, 0].scatter(y_pred_test, residuos, alpha=0.6, edgecolors='k', linewidths=0.5)
axes[0, 0].axhline(y=0, color='r', linestyle='--', linewidth=2)
axes[0, 0].set_xlabel('Predicciones')
axes[0, 0].set_ylabel('Residuos')
axes[0, 0].set_title('Residuos vs Predicciones')
axes[0, 0].grid(True, alpha=0.3)

# 2. Histograma de residuos
axes[0, 1].hist(residuos, bins=30, edgecolor='black', alpha=0.7)
axes[0, 1].axvline(x=0, color='r', linestyle='--', linewidth=2)
axes[0, 1].set_xlabel('Residuos')
axes[0, 1].set_ylabel('Frecuencia')
axes[0, 1].set_title('Distribuci√≥n de Residuos')
axes[0, 1].grid(True, alpha=0.3, axis='y')

# 3. Q-Q Plot (normalidad de residuos)
from scipy import stats
stats.probplot(residuos, dist="norm", plot=axes[1, 0])
axes[1, 0].set_title('Q-Q Plot (Normalidad de Residuos)')
axes[1, 0].grid(True, alpha=0.3)

# 4. Residuos absolutos vs Predicciones
axes[1, 1].scatter(y_pred_test, np.abs(residuos), alpha=0.6, edgecolors='k', linewidths=0.5)
axes[1, 1].set_xlabel('Predicciones')
axes[1, 1].set_ylabel('|Residuos|')
axes[1, 1].set_title('Residuos Absolutos vs Predicciones')
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# ============================================
# 9. COMPARAR CON OTROS MODELOS
# ============================================
print("\n" + "="*60)
print("COMPARACI√ìN CON OTROS MODELOS")
print("="*60)

from sklearn.linear_model import LinearRegression, Ridge, Lasso, RidgeCV, LassoCV

modelos = {
    'Linear Regression': LinearRegression(),
    'Ridge': Ridge(alpha=1.0),
    'Lasso': Lasso(alpha=1.0),
    'RidgeCV': RidgeCV(cv=5),
    'LassoCV': LassoCV(cv=5, random_state=42),
    'ElasticNetCV': modelo  # ya entrenado
}

resultados = []
for nombre, mod in modelos.items():
    if nombre != 'ElasticNetCV':
        mod.fit(X_train_scaled, y_train)
    
    y_pred = mod.predict(X_test_scaled)
    
    resultados.append({
        'Modelo': nombre,
        'R¬≤': r2_score(y_test, y_pred),
        'RMSE': np.sqrt(mean_squared_error(y_test, y_pred)),
        'Explained Variance': explained_variance_score(y_test, y_pred),
        'Features Activas': np.sum(mod.coef_ != 0) if hasattr(mod, 'coef_') else 'N/A'
    })

df_resultados = pd.DataFrame(resultados).sort_values('R¬≤', ascending=False)
print('\nComparaci√≥n de modelos:')
print(df_resultados.to_string(index=False))

# Visualizar comparaci√≥n
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# R¬≤ Scores
axes[0].barh(df_resultados['Modelo'], df_resultados['R¬≤'], color='skyblue', edgecolor='black')
axes[0].set_xlabel('R¬≤ Score')
axes[0].set_title('Comparaci√≥n de R¬≤ Scores')
axes[0].grid(True, alpha=0.3, axis='x')

# RMSE
axes[1].barh(df_resultados['Modelo'], df_resultados['RMSE'], color='salmon', edgecolor='black')
axes[1].set_xlabel('RMSE')
axes[1].set_title('Comparaci√≥n de RMSE (menor es mejor)')
axes[1].grid(True, alpha=0.3, axis='x')

plt.tight_layout()
plt.show()

# ============================================
# 10. RUTA DE REGULARIZACI√ìN (PATH PLOT)
# ============================================
print("\n" + "="*60)
print("RUTA DE REGULARIZACI√ìN")
print("="*60)

# Mostrar c√≥mo cambian los coeficientes con diferentes alphas
from sklearn.linear_model import enet_path

# Calcular path
alphas, coefs, _ = enet_path(X_train_scaled, y_train, 
                             l1_ratio=modelo.l1_ratio_, 
                             eps=1e-6, n_alphas=100)

# Visualizar
plt.figure(figsize=(12, 6))
for i in range(coefs.shape[0]):
    plt.plot(alphas, coefs[i, :], linewidth=2)
plt.axvline(modelo.alpha_, color='r', linestyle='--', linewidth=2, 
           label=f'Alpha √≥ptimo ({modelo.alpha_:.4f})')
plt.xscale('log')
plt.xlabel('Alpha (log scale)')
plt.ylabel('Coeficientes')
plt.title(f'Ruta de Regularizaci√≥n ElasticNet (l1_ratio={modelo.l1_ratio_:.2f})')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

# ============================================
# 11. WORKFLOW COMPLETO TIPO EXAMEN 2024
# ============================================
print("\n" + "="*60)
print("WORKFLOW COMPLETO - ESTILO EXAMEN")
print("="*60)

"""
PASOS COMPLETOS PARA UN PROBLEMA COMO EL EXAMEN 2024:

1. Preparar datos (limpieza, encoding)
2. Separar X e y
3. Codificar variables categ√≥ricas (OneHotEncoder)
4. Convertir sparse a denso (.toarray())
5. Dividir train/test (test_size=0.8 para 1/5 train, 4/5 test)
6. Escalar datos (StandardScaler)
7. Entrenar ElasticNetCV con cv=5
8. Evaluar con explained_variance_score
9. Identificar features m√°s relevantes
10. Graficar scatterplot predicci√≥n vs real
11. Calcular correlaci√≥n de Spearman
"""

# Ejemplo de c√≥digo completo
def workflow_examen_elasticnet(df, target_col, cat_cols=None, test_size=0.8):
    """
    Workflow completo de ElasticNetCV estilo examen
    
    Parameters:
    -----------
    df : DataFrame
        Dataset completo
    target_col : str
        Nombre de la columna objetivo
    cat_cols : list, optional
        Lista de columnas categ√≥ricas a codificar
    test_size : float
        Proporci√≥n de test (0.8 = 4/5)
    """
    from sklearn.preprocessing import OneHotEncoder, StandardScaler
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNetCV
    from sklearn.metrics import explained_variance_score
    from scipy.stats import spearmanr
    
    # 1. Separar X e y
    y = df[target_col].values
    X = df.drop(columns=target_col)
    
    # 2. Codificar variables categ√≥ricas si hay
    if cat_cols:
        encoder = OneHotEncoder(sparse_output=False, drop='first', handle_unknown='ignore')
        X_encoded = encoder.fit_transform(X[cat_cols])
        X_numeric = X.drop(columns=cat_cols).values
        X = np.hstack([X_numeric, X_encoded])
    else:
        X = X.values
    
    # 3. Split train/test
    X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=test_size, random_state=42
    )
    
    # 4. Escalar
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # 5. ElasticNetCV con 5-folds CV
    modelo = ElasticNetCV(
        l1_ratio=[0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1.0],
        cv=5,
        n_jobs=-1,
        random_state=42,
        max_iter=10000
    )
    modelo.fit(X_train_scaled, y_train)
    
    # 6. Predicciones
    y_pred_test = modelo.predict(X_test_scaled)
    
    # 7. M√©tricas
    explained_var = explained_variance_score(y_test, y_pred_test)
    corr_spearman, _ = spearmanr(y_test, y_pred_test)
    
    # 8. Features importantes
    n_features_activas = np.sum(modelo.coef_ != 0)
    
    # 9. Visualizaci√≥n
    plt.figure(figsize=(10, 6))
    plt.scatter(y_test, y_pred_test, alpha=0.6, edgecolors='k', linewidths=0.5)
    plt.plot([y_test.min(), y_test.max()], 
            [y_test.min(), y_test.max()], 
            'r--', lw=2, label='Predicci√≥n Perfecta')
    plt.xlabel('Valor Real (Test)')
    plt.ylabel('Valor Predicho')
    plt.title(f'ElasticNetCV: R¬≤={modelo.score(X_test_scaled, y_test):.4f}')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()
    
    # 10. Resultados
    print(f"\n{'='*60}")
    print("RESULTADOS FINALES")
    print(f"{'='*60}")
    print(f"Mejor alpha: {modelo.alpha_:.6f}")
    print(f"Mejor l1_ratio: {modelo.l1_ratio_:.4f}")
    print(f"Explained Variance: {explained_var:.4f}")
    print(f"R¬≤ Score: {modelo.score(X_test_scaled, y_test):.4f}")
    print(f"Correlaci√≥n Spearman: {corr_spearman:.4f}")
    print(f"Features activas: {n_features_activas}/{len(modelo.coef_)}")
    
    return modelo, scaler, X_train_scaled, X_test_scaled, y_train, y_test, y_pred_test

# Ejemplo de uso:
# modelo, scaler, X_train_scaled, X_test_scaled, y_train, y_test, y_pred = \
#     workflow_examen_elasticnet(df_clean, 'price_eur')

# ============================================
# 12. TIPS Y ERRORES COMUNES
# ============================================
print("\n" + "="*60)
print("‚ö†Ô∏è TIPS Y ERRORES COMUNES")
print("="*60)

print("""
‚úÖ HACER:
1. SIEMPRE escalar datos antes de ElasticNet
2. Convertir matrices sparse a densas antes de escalar (.toarray())
3. Usar cv=5 cuando el examen lo pida
4. Guardar el scaler para usar en test
5. Verificar que train y test tengan mismas columnas despu√©s de encoding

‚ùå EVITAR:
1. No escalar los datos ‚Üí coeficientes incorrectos
2. Escalar matriz sparse sin convertir a densa ‚Üí ERROR
3. Fit scaler en X completo en vez de solo X_train ‚Üí data leakage
4. Usar test_size=0.2 cuando piden 4/5 test ‚Üí debe ser 0.8
5. No usar handle_unknown='ignore' en OneHotEncoder ‚Üí error en test

üéØ M√âTRICAS T√çPICAS DE EXAMEN:
- Explained Variance Score: explained_variance_score()
- R¬≤ Score: r2_score() o modelo.score()
- Correlaci√≥n Spearman: spearmanr()
- RMSE: np.sqrt(mean_squared_error())

üìä VISUALIZACIONES T√çPICAS:
- Scatterplot: predicci√≥n vs real
- Gr√°fico de coeficientes: top features
- Residuos: para verificar supuestos

üîß PAR√ÅMETROS CLAVE:
- cv=5: n√∫mero de folds (t√≠pico en ex√°menes)
- l1_ratio: [0.1, 0.5, 0.7, 0.9, 0.95, 0.99, 1.0]
- max_iter=10000: evitar warnings de no convergencia
- random_state=42: reproducibilidad
""")

## 6. Regresion Logistica - Clasificacion binaria y multiclase

In [None]:
from sklearn.linear_model import LogisticRegression

# Clasificacion binaria
modelo = LogisticRegression(random_state=42)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)  # probabilidades de cada clase

# Clasificacion multiclase
# solver: algoritmo de optimizacion
# 'lbfgs': bueno para datasets peque√±os
# 'saga': bueno para datasets grandes
# 'newton-cg': preciso pero lento
modelo = LogisticRegression(multi_class='multinomial', solver='lbfgs', random_state=42)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Con regularizacion
modelo = LogisticRegression(penalty='l2', C=1.0, random_state=42)  # C es inverso de alpha
modelo.fit(X_train, y_train)

# Parametros importantes:
# C: inverso de la fuerza de regularizacion (menor C = mas regularizacion)
# penalty: 'l1', 'l2', 'elasticnet', 'none'
# max_iter: numero maximo de iteraciones (aumentar si no converge)

### 6.1 LogisticRegressionCV

In [None]:
from sklearn.linear_model import LogisticRegressionCV
from sklearn.metrics import roc_auc_score, accuracy_score

# LogisticRegressionCV: Regresion Logistica con validacion cruzada incorporada
# Busca automaticamente el mejor valor de C (inverso de regularizacion)

# Uso basico - busca mejor C automaticamente
modelo = LogisticRegressionCV(cv=5, random_state=42)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)

# Ver el mejor C encontrado
print(f'Mejor C: {modelo.C_}')
print(f'Scores por fold: {modelo.scores_}')

# Especificar rango de valores C a probar
Cs = [0.001, 0.01, 0.1, 1, 10, 100]
modelo = LogisticRegressionCV(Cs=Cs, cv=5, random_state=42)
modelo.fit(X_train, y_train)

# ============================================
# RIDGE (L2) - Regularizacion L2
# ============================================
# Penaliza coeficientes grandes, util contra multicolinealidad
# penalty='l2' es el valor por defecto
modelo_ridge = LogisticRegressionCV(
    Cs=10,                    # numero de valores C a probar
    cv=5,
    penalty='l2',
    solver='lbfgs',           # 'lbfgs', 'newton-cg', 'sag', 'saga'
    scoring='accuracy',
    max_iter=1000,
    random_state=42
)
modelo_ridge.fit(X_train, y_train)
print(f'Ridge - Mejor C: {modelo_ridge.C_}')
print(f'Ridge - Coeficientes: {modelo_ridge.coef_}')

# ============================================
# LASSO (L1) - Regularizacion L1
# ============================================
# Puede hacer coeficientes = 0, util para seleccion de features
# IMPORTANTE: solo funciona con solver='liblinear' o 'saga'
modelo_lasso = LogisticRegressionCV(
    Cs=10,
    cv=5,
    penalty='l1',
    solver='liblinear',       # 'liblinear' o 'saga' para L1
    scoring='accuracy',
    max_iter=1000,
    random_state=42
)
modelo_lasso.fit(X_train, y_train)
print(f'Lasso - Mejor C: {modelo_lasso.C_}')
print(f'Lasso - Coeficientes: {modelo_lasso.coef_}')

# Ver features con coeficiente = 0 (eliminadas por Lasso)
coefs_zero = np.sum(modelo_lasso.coef_ == 0)
print(f'Features eliminadas por Lasso: {coefs_zero}')

# ============================================
# ELASTIC NET - Combinacion L1 + L2
# ============================================
# Combina ventajas de Ridge y Lasso
# l1_ratio controla la mezcla: 0=Ridge, 1=Lasso, 0.5=50% cada uno
# IMPORTANTE: solo funciona con solver='saga'
modelo_elastic = LogisticRegressionCV(
    Cs=10,
    cv=5,
    penalty='elasticnet',
    solver='saga',            # OBLIGATORIO para elasticnet
    l1_ratios=[0.1, 0.3, 0.5, 0.7, 0.9],  # proporcion de L1
    scoring='accuracy',
    max_iter=1000,
    random_state=42
)
modelo_elastic.fit(X_train, y_train)
print(f'ElasticNet - Mejor C: {modelo_elastic.C_}')
print(f'ElasticNet - Mejor l1_ratio: {modelo_elastic.l1_ratio_}')
print(f'ElasticNet - Coeficientes: {modelo_elastic.coef_}')

# ============================================
# Comparar los tres modelos
# ============================================
modelos = {
    'Ridge (L2)': modelo_ridge,
    'Lasso (L1)': modelo_lasso,
    'ElasticNet': modelo_elastic
}

for nombre, modelo in modelos.items():
    y_pred = modelo.predict(X_test)
    y_pred_proba = modelo.predict_proba(X_test)
    
    accuracy = accuracy_score(y_test, y_pred)
    
    # ROC AUC para clasificacion binaria
    if len(np.unique(y_test)) == 2:
        roc_auc = roc_auc_score(y_test, y_pred_proba[:, 1])
        print(f'{nombre} - Accuracy: {accuracy:.4f}, ROC AUC: {roc_auc:.4f}')
    else:
        # ROC AUC para multiclase
        roc_auc = roc_auc_score(y_test, y_pred_proba, multi_class='ovr', average='weighted')
        print(f'{nombre} - Accuracy: {accuracy:.4f}, ROC AUC: {roc_auc:.4f}')
    
    # Contar features no nulas
    non_zero = np.sum(modelo.coef_ != 0)
    print(f'{nombre} - Features activas: {non_zero}')
    print()

# ============================================
# Para clasificacion multiclase
# ============================================
modelo_multi = LogisticRegressionCV(
    cv=5,
    multi_class='multinomial',  # 'ovr' o 'multinomial'
    penalty='l2',
    solver='lbfgs',
    random_state=42
)
modelo_multi.fit(X_train, y_train)

# ============================================
# Tips para elegir penalizacion
# ============================================
# Ridge (L2): cuando todas las features son importantes
# Lasso (L1): cuando quieres seleccion automatica de features
# ElasticNet: cuando tienes muchas features correlacionadas
# 
# Valores de C:
# - C alto (ej: 100): poca regularizacion, puede overfitting
# - C bajo (ej: 0.01): mucha regularizacion, puede underfitting
# - LogisticRegressionCV encuentra el mejor C automaticamente

# IMPORTANTE: siempre escalar los datos antes de usar regularizacion
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

modelo = LogisticRegressionCV(cv=5, penalty='elasticnet', solver='saga', 
                               l1_ratios=[0.5], random_state=42)
modelo.fit(X_train_scaled, y_train)

In [None]:
# Self-contained scikit-learn demo: Confusion Matrix, ROC, and Precision-Recall
# - Generates an imbalanced binary classification dataset
# - Trains Logistic Regression
# - Shows confusion matrix (default threshold 0.5)
# - Plots ROC curve with AUC
# - Plots Precision-Recall curve with Average Precision
# - Finds a better threshold by maximizing F1 on validation set and shows new confusion matrix

import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    confusion_matrix,
    ConfusionMatrixDisplay,
    roc_curve,
    roc_auc_score,
    precision_recall_curve,
    average_precision_score,
    classification_report,
    f1_score,
)

# 1) Data: Imbalanced to highlight PR behavior
X, y = make_classification(
    n_samples=6000,
    n_features=20,
    n_informative=6,
    n_redundant=4,
    n_repeated=0,
    n_clusters_per_class=2,
    weights=[0.90, 0.10],  # 10% positive class
    flip_y=0.01,
    class_sep=1.2,
    random_state=42,
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

# 2) Model
clf = LogisticRegression(max_iter=2000, n_jobs=None)
clf.fit(X_train, y_train)

# 3) Predictions (probabilities + default threshold 0.5)
y_proba = clf.predict_proba(X_test)[:, 1]
y_pred_default = (y_proba >= 0.5).astype(int)

# 4) Confusion matrix at threshold 0.5
cm_default = confusion_matrix(y_test, y_pred_default, labels=[1, 0])  # rows: actual 1,0
disp_default = ConfusionMatrixDisplay(
    confusion_matrix=cm_default, display_labels=[1, 0]
)
plt.figure()
disp_default.plot()
plt.title("Confusion Matrix (threshold = 0.5)")
plt.show()

print("Classification report (threshold = 0.5):\n")
print(classification_report(y_test, y_pred_default, digits=3))

# 5) ROC curve + AUC
fpr, tpr, roc_thresholds = roc_curve(y_test, y_proba)
auc_roc = roc_auc_score(y_test, y_proba)

plt.figure()
plt.plot(fpr, tpr, linewidth=2)
plt.plot([0, 1], [0, 1], linestyle="--")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate (Recall)")
plt.title(f"ROC Curve (AUC = {auc_roc:.3f})")
plt.grid(True, linestyle=":")
plt.show()

# 6) Precision-Recall + Average Precision
prec, rec, pr_thresholds = precision_recall_curve(y_test, y_proba)
ap = average_precision_score(y_test, y_proba)

plt.figure()
plt.plot(rec, prec, linewidth=2)
plt.xlabel("Recall")
plt.ylabel("Precision")
plt.title(f"Precision-Recall Curve (AP = {ap:.3f})")
plt.grid(True, linestyle=":")
plt.show()

# 7) Pick a better threshold by maximizing F1 on a grid of thresholds
# (Exclude the last element of rec/prec which corresponds to threshold = -inf)
threshold_grid = np.linspace(0.01, 0.99, 99)
f1_scores = []
for thr in threshold_grid:
    y_pred_thr = (y_proba >= thr).astype(int)
    f1_scores.append(f1_score(y_test, y_pred_thr))

best_idx = int(np.argmax(f1_scores))
best_thr = float(threshold_grid[best_idx])
best_f1 = float(f1_scores[best_idx])

print(f"\nBest threshold by F1 on test set: {best_thr:.3f} (F1 = {best_f1:.3f})")

y_pred_best = (y_proba >= best_thr).astype(int)
cm_best = confusion_matrix(y_test, y_pred_best, labels=[1, 0])
disp_best = ConfusionMatrixDisplay(confusion_matrix=cm_best, display_labels=[1, 0])
plt.figure()
disp_best.plot()
plt.title(f"Confusion Matrix (best F1 threshold = {best_thr:.3f})")
plt.show()

print("\nClassification report (best F1 threshold):\n")
print(classification_report(y_test, y_pred_best, digits=3))


## 7. Arboles de Decision

In [None]:
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor

# Clasificacion
modelo = DecisionTreeClassifier(
    max_depth=5,              # profundidad maxima del arbol
    min_samples_split=20,     # minimo de muestras para dividir un nodo
    min_samples_leaf=10,      # minimo de muestras en hoja
    random_state=42
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Feature importance: importancia de cada variable
importancias = modelo.feature_importances_
for i, imp in enumerate(importancias):
    print(f'Feature {i}: {imp}')

# Regresion
modelo = DecisionTreeRegressor(max_depth=5, random_state=42)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Visualizar el arbol
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(20,10))
plot_tree(modelo, filled=True, feature_names=['feat1', 'feat2'])
plt.show()

## 8. Random Forest - Ensemble de arboles

In [None]:
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor

# Clasificacion
modelo = RandomForestClassifier(
    n_estimators=100,         # numero de arboles
    max_depth=10,             # profundidad maxima de cada arbol
    min_samples_split=20,
    min_samples_leaf=10,
    max_features='sqrt',      # numero de features aleatorias: 'sqrt', 'log2', int, float
    random_state=42,
    n_jobs=-1                 # usar todos los cores del CPU
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)

# Feature importance
importancias = pd.DataFrame({
    'feature': X_train.columns,
    'importance': modelo.feature_importances_
}).sort_values('importance', ascending=False)

# Regresion
modelo = RandomForestRegressor(
    n_estimators=100,
    max_depth=10,
    random_state=42,
    n_jobs=-1
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Out-of-bag score: estimacion de error sin validacion cruzada
modelo = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=42)
modelo.fit(X_train, y_train)
print(f'OOB Score: {modelo.oob_score_}')

## 9. Gradient Boosting - Ensemble secuencial

In [None]:
from sklearn.ensemble import GradientBoostingClassifier, GradientBoostingRegressor

# Clasificacion
modelo = GradientBoostingClassifier(
    n_estimators=100,         # numero de arboles
    learning_rate=0.1,        # tasa de aprendizaje (menor = mas conservador)
    max_depth=3,              # profundidad maxima de cada arbol
    min_samples_split=20,
    min_samples_leaf=10,
    subsample=0.8,            # proporcion de muestras para cada arbol
    random_state=42
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)

# Regresion
modelo = GradientBoostingRegressor(
    n_estimators=100,
    learning_rate=0.1,
    max_depth=3,
    random_state=42
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Feature importance
importancias = modelo.feature_importances_

## 10. Support Vector Machines (SVM)

In [None]:
from sklearn.svm import SVC, SVR

# Clasificacion
modelo = SVC(
    kernel='rbf',             # 'linear', 'poly', 'rbf', 'sigmoid'
    C=1.0,                    # parametro de regularizacion
    gamma='scale',            # coeficiente del kernel: 'scale', 'auto', float
    random_state=42
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# SVC con probabilidades
modelo = SVC(kernel='rbf', probability=True, random_state=42)
modelo.fit(X_train, y_train)
y_pred_proba = modelo.predict_proba(X_test)

# Kernel lineal (mas rapido para datos linealmente separables)
modelo = SVC(kernel='linear', C=1.0, random_state=42)
modelo.fit(X_train, y_train)

# Kernel polinomial
modelo = SVC(kernel='poly', degree=3, C=1.0, random_state=42)
modelo.fit(X_train, y_train)

# Regresion
modelo = SVR(kernel='rbf', C=1.0, epsilon=0.1)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# IMPORTANTE: SVM requiere datos escalados para funcionar bien

## 11. K-Nearest Neighbors (KNN)

In [None]:
from sklearn.neighbors import KNeighborsClassifier, KNeighborsRegressor

# Clasificacion
modelo = KNeighborsClassifier(
    n_neighbors=5,            # numero de vecinos
    weights='uniform',        # 'uniform' o 'distance' (ponderacion por distancia)
    metric='minkowski',       # metrica de distancia: 'euclidean', 'manhattan', 'minkowski'
    p=2                       # p=1 Manhattan, p=2 Euclidean
)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)

# KNN con ponderacion por distancia
modelo = KNeighborsClassifier(n_neighbors=5, weights='distance')
modelo.fit(X_train, y_train)

# Regresion
modelo = KNeighborsRegressor(n_neighbors=5, weights='uniform')
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# IMPORTANTE: KNN requiere datos escalados para funcionar bien

## 12. Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB, MultinomialNB, BernoulliNB

# GaussianNB: para features continuas con distribucion normal
modelo = GaussianNB()
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)

# MultinomialNB: para features de conteo (texto, frecuencias)
# Requiere valores no negativos
modelo = MultinomialNB(alpha=1.0)  # alpha: parametro de suavizado
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# BernoulliNB: para features binarias (0/1)
modelo = BernoulliNB(alpha=1.0)
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

## 13. Clustering - Aprendizaje no supervisado

In [None]:
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.metrics import silhouette_score

# KMeans: particiona datos en K clusters
modelo = KMeans(
    n_clusters=3,             # numero de clusters
    init='k-means++',         # metodo de inicializacion
    n_init=10,                # numero de veces que se ejecuta con diferentes centroides
    max_iter=300,
    random_state=42
)
clusters = modelo.fit_predict(X)
centroides = modelo.cluster_centers_

# Evaluar calidad del clustering
inercia = modelo.inertia_  # suma de distancias al cuadrado al centroide mas cercano
silhouette = silhouette_score(X, clusters)  # entre -1 y 1, mayor es mejor

# Metodo del codo para encontrar K optimo
inertias = []
for k in range(1, 11):
    km = KMeans(n_clusters=k, random_state=42)
    km.fit(X)
    inertias.append(km.inertia_)

# DBSCAN: clustering basado en densidad, no requiere especificar K
modelo = DBSCAN(
    eps=0.5,                  # distancia maxima entre puntos del mismo cluster
    min_samples=5             # minimo de puntos para formar un cluster
)
clusters = modelo.fit_predict(X)
# cluster -1 son outliers

# Hierarchical Clustering
modelo = AgglomerativeClustering(
    n_clusters=3,             # numero de clusters
    linkage='ward'            # 'ward', 'complete', 'average', 'single'
)
clusters = modelo.fit_predict(X)

### 13.2. Hierarchical Clustering - Gu√≠a Completa (dendrogram y linkage)

In [None]:
from scipy.cluster.hierarchy import dendrogram, linkage, fcluster, cophenet
from scipy.spatial.distance import pdist
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# ============================================
# HIERARCHICAL CLUSTERING B√ÅSICO con scipy
# ============================================
# linkage() crea el √°rbol jer√°rquico
# dendrogram() visualiza el √°rbol

# M√©todo b√°sico con linkage 'ward' y distancia euclidiana
Z = linkage(X, method='ward', metric='euclidean')

# Visualizar dendrogram
plt.figure(figsize=(12, 6))
dendrogram(Z)
plt.title('Dendrograma - Hierarchical Clustering (Ward)')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.show()

# ============================================
# M√âTODOS DE LINKAGE (method)
# ============================================
# 'ward': minimiza varianza dentro clusters (solo con euclidean)
# 'complete': m√°xima distancia entre pares de puntos
# 'average': promedio de distancias entre pares
# 'single': m√≠nima distancia entre pares (sensible a outliers)
# 'centroid': distancia entre centroides
# 'median': mediana de distancias
# 'weighted': weighted average

# Complete linkage con distancia euclidiana
Z_complete = linkage(X, method='complete', metric='euclidean')
plt.figure(figsize=(12, 6))
dendrogram(Z_complete)
plt.title('Dendrograma - Complete Linkage (Euclidean)')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.show()

# Complete linkage con distancia Manhattan (cityblock)
Z_manhattan = linkage(X, method='complete', metric='cityblock')
plt.figure(figsize=(12, 6))
dendrogram(Z_manhattan)
plt.title('Dendrograma - Complete Linkage (Manhattan/Cityblock)')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.show()

# Single linkage con distancia euclidiana
Z_single = linkage(X, method='single', metric='euclidean')
plt.figure(figsize=(12, 6))
dendrogram(Z_single)
plt.title('Dendrograma - Single Linkage (Euclidean)')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.show()

# ============================================
# M√âTRICAS DE DISTANCIA DISPONIBLES
# ============================================
# 'euclidean': distancia euclidiana (L2)
# 'cityblock' o 'manhattan': distancia Manhattan (L1)
# 'cosine': similaridad coseno
# 'correlation': correlaci√≥n
# 'hamming': para datos binarios
# 'jaccard': para datos binarios
# 'chebyshev': distancia Chebyshev
# 'minkowski': distancia Minkowski generalizada

# Ejemplo con distancia coseno
Z_cosine = linkage(X, method='average', metric='cosine')
plt.figure(figsize=(12, 6))
dendrogram(Z_cosine)
plt.title('Dendrograma - Average Linkage (Cosine)')
plt.show()

# ============================================
# CORTAR EL DENDROGRAM - Obtener clusters
# ============================================
# Opci√≥n 1: Especificar n√∫mero de clusters
n_clusters = 3
clusters = fcluster(Z, n_clusters, criterion='maxclust')
print(f'Clusters (n={n_clusters}): {clusters}')
print(f'Conteo por cluster: {np.bincount(clusters)}')

# Opci√≥n 2: Especificar altura de corte (distancia)
altura_corte = 10
clusters_altura = fcluster(Z, altura_corte, criterion='distance')
print(f'Clusters (altura={altura_corte}): {clusters_altura}')

# Visualizar dendrogram con l√≠nea de corte
plt.figure(figsize=(12, 6))
dendrogram(Z)
plt.axhline(y=altura_corte, c='red', linestyle='--', label=f'Corte en altura={altura_corte}')
plt.title('Dendrograma con L√≠nea de Corte')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.legend()
plt.show()

# ============================================
# PERSONALIZAR DENDROGRAM
# ============================================
plt.figure(figsize=(12, 6))
dendrogram(Z,
          truncate_mode='lastp',  # mostrar solo √∫ltimas p fusiones
          p=12,                   # n√∫mero de fusiones a mostrar
          leaf_rotation=90,       # rotaci√≥n de etiquetas
          leaf_font_size=10,      # tama√±o de fuente
          show_contracted=True,   # mostrar altura de nodos contra√≠dos
          color_threshold=15)     # colorear clusters por altura
plt.title('Dendrograma Personalizado (Truncado)')
plt.xlabel('√çndice de muestra o (tama√±o del cluster)')
plt.ylabel('Distancia')
plt.show()

# Dendrogram horizontal
plt.figure(figsize=(8, 10))
dendrogram(Z, orientation='left')
plt.title('Dendrograma Horizontal')
plt.xlabel('Distancia')
plt.ylabel('√çndice de muestra')
plt.show()

# ============================================
# VISUALIZAR CLUSTERS EN 2D
# ============================================
def plot_hierarchical_clusters(X, Z, n_clusters):
    """Visualiza clusters de hierarchical clustering"""
    clusters = fcluster(Z, n_clusters, criterion='maxclust')
    
    plt.figure(figsize=(10, 6))
    scatter = plt.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=50, alpha=0.6)
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title(f'Hierarchical Clustering (K={n_clusters})')
    plt.colorbar(scatter, label='Cluster')
    plt.grid(True, alpha=0.3)
    plt.show()

# Visualizar con 3 clusters
plot_hierarchical_clusters(X, Z, n_clusters=3)

# ============================================
# USAR sklearn.cluster.AgglomerativeClustering
# ============================================
# Alternativa a scipy con API de sklearn

# Ward linkage
agg_ward = AgglomerativeClustering(n_clusters=3, linkage='ward')
labels_ward = agg_ward.fit_predict(X)

# Complete linkage
agg_complete = AgglomerativeClustering(n_clusters=3, linkage='complete')
labels_complete = agg_complete.fit_predict(X)

# Average linkage
agg_average = AgglomerativeClustering(n_clusters=3, linkage='average')
labels_average = agg_average.fit_predict(X)

# Single linkage
agg_single = AgglomerativeClustering(n_clusters=3, linkage='single')
labels_single = agg_single.fit_predict(X)

# Con distancia espec√≠fica (solo para linkage != 'ward')
agg_manhattan = AgglomerativeClustering(
    n_clusters=3, 
    linkage='complete', 
    metric='manhattan'  # tambi√©n: 'euclidean', 'cosine', 'l1', 'l2'
)
labels_manhattan = agg_manhattan.fit_predict(X)

print(f'N√∫mero de clusters formados: {agg_ward.n_clusters_}')
print(f'N√∫mero de hojas en el √°rbol: {agg_ward.n_leaves_}')

# ============================================
# M√âTODO DE LA SILUETA para elegir K √≥ptimo
# ============================================
def evaluar_hierarchical_silhouette(X, method='ward', metric='euclidean', max_k=10):
    """Eval√∫a hierarchical clustering con diferentes K usando silueta"""
    Z = linkage(X, method=method, metric=metric)
    silhouette_scores = []
    K_range = range(2, max_k + 1)
    
    for k in K_range:
        clusters = fcluster(Z, k, criterion='maxclust')
        score = silhouette_score(X, clusters)
        silhouette_scores.append(score)
        print(f'K={k}: Silhouette Score = {score:.4f}')
    
    # Graficar
    plt.figure(figsize=(10, 6))
    plt.plot(K_range, silhouette_scores, 'ro-')
    plt.xlabel('N√∫mero de clusters (K)')
    plt.ylabel('Silhouette Score')
    plt.title(f'M√©todo de la Silueta - Hierarchical ({method.capitalize()} Linkage)')
    plt.xticks(K_range)
    plt.grid(True)
    plt.show()
    
    best_k = K_range[np.argmax(silhouette_scores)]
    print(f'\nMejor K seg√∫n Silhouette: {best_k}')
    return best_k, silhouette_scores

# Evaluar con complete linkage y distancia euclidiana
best_k_complete, scores_complete = evaluar_hierarchical_silhouette(
    X, method='complete', metric='euclidean', max_k=10
)

# Evaluar con single linkage y distancia euclidiana
best_k_single, scores_single = evaluar_hierarchical_silhouette(
    X, method='single', metric='euclidean', max_k=10
)

# ============================================
# COMPARAR M√öLTIPLES M√âTODOS
# ============================================
def comparar_linkages(X, n_clusters=3):
    """Compara diferentes m√©todos de linkage"""
    metodos = ['ward', 'complete', 'average', 'single']
    
    fig, axes = plt.subplots(2, 2, figsize=(14, 12))
    axes = axes.ravel()
    
    for idx, metodo in enumerate(metodos):
        # Crear linkage
        if metodo == 'ward':
            Z = linkage(X, method=metodo, metric='euclidean')
        else:
            Z = linkage(X, method=metodo, metric='euclidean')
        
        # Obtener clusters
        clusters = fcluster(Z, n_clusters, criterion='maxclust')
        
        # Calcular silhouette
        sil_score = silhouette_score(X, clusters)
        
        # Graficar
        axes[idx].scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=50, alpha=0.6)
        axes[idx].set_title(f'{metodo.capitalize()} Linkage\nSilhouette: {sil_score:.4f}')
        axes[idx].set_xlabel('Feature 1')
        axes[idx].set_ylabel('Feature 2')
        axes[idx].grid(True, alpha=0.3)
    
    plt.tight_layout()
    plt.show()

comparar_linkages(X, n_clusters=3)

# ============================================
# EVALUACI√ìN DE LA CALIDAD DEL CLUSTERING
# ============================================
# Coeficiente de correlaci√≥n cofen√©tica: qu√© tan bien preserva las distancias
Z = linkage(X, method='ward')
c, coph_dists = cophenet(Z, pdist(X))
print(f'Coeficiente de correlaci√≥n cofen√©tica: {c:.4f}')
# Valores cercanos a 1 = buena preservaci√≥n de distancias

# ============================================
# DENDROGRAMA CON COLORES PERSONALIZADOS
# ============================================
from scipy.cluster.hierarchy import set_link_color_palette

plt.figure(figsize=(12, 6))
set_link_color_palette(['red', 'blue', 'green', 'orange', 'purple'])
dendrogram(Z, 
          color_threshold=20,
          above_threshold_color='gray')
plt.title('Dendrograma con Colores Personalizados')
plt.xlabel('√çndice de muestra')
plt.ylabel('Distancia')
plt.show()

# ============================================
# EJEMPLO COMPLETO: Workflow t√≠pico
# ============================================
def workflow_hierarchical(X, visualizar=True):
    """Workflow completo de hierarchical clustering"""
    
    # 1. Probar diferentes m√©todos de linkage
    metodos = {
        'Ward': linkage(X, method='ward', metric='euclidean'),
        'Complete-Euclidean': linkage(X, method='complete', metric='euclidean'),
        'Complete-Manhattan': linkage(X, method='complete', metric='cityblock'),
        'Single-Euclidean': linkage(X, method='single', metric='euclidean'),
        'Average': linkage(X, method='average', metric='euclidean')
    }
    
    # 2. Calcular coeficiente cofen√©tico para cada m√©todo
    print("Coeficientes de correlaci√≥n cofen√©tica:")
    for nombre, Z in metodos.items():
        c, _ = cophenet(Z, pdist(X))
        print(f'{nombre}: {c:.4f}')
    
    # 3. Seleccionar mejor m√©todo (por ejemplo, Ward)
    Z_best = metodos['Ward']
    
    # 4. Usar silueta para encontrar K √≥ptimo
    print("\nBuscando K √≥ptimo...")
    best_k, scores = evaluar_hierarchical_silhouette(X, method='ward', max_k=10)
    
    # 5. Crear clusters finales
    clusters = fcluster(Z_best, best_k, criterion='maxclust')
    
    # 6. Visualizar resultados
    if visualizar and X.shape[1] == 2:
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))
        
        # Dendrogram
        dendrogram(Z_best, ax=ax1)
        ax1.set_title('Dendrograma (Ward Linkage)')
        ax1.set_xlabel('√çndice de muestra')
        ax1.set_ylabel('Distancia')
        
        # Clusters
        scatter = ax2.scatter(X[:, 0], X[:, 1], c=clusters, cmap='viridis', s=50, alpha=0.6)
        ax2.set_title(f'Clusters finales (K={best_k})')
        ax2.set_xlabel('Feature 1')
        ax2.set_ylabel('Feature 2')
        plt.colorbar(scatter, ax=ax2, label='Cluster')
        
        plt.tight_layout()
        plt.show()
    
    return clusters, best_k

# Ejecutar workflow
clusters_final, k_optimo = workflow_hierarchical(X)

# ============================================
# TIPS Y MEJORES PR√ÅCTICAS
# ============================================
# 1. Ward: mejor para clusters de tama√±o similar y forma esf√©rica
# 2. Complete: m√°s robusto a outliers que single
# 3. Single: sensible a outliers, puede crear "cadenas"
# 4. Average: buen balance, menos sensible a outliers
# 5. SIEMPRE escalar los datos antes de clustering
# 6. Usar coeficiente cofen√©tico para evaluar preservaci√≥n de distancias
# 7. Combinar dendrogram + silhouette para elegir K
# 8. Para datasets grandes, considerar KMeans (m√°s eficiente)
# 9. Hierarchical es determin√≠stico (no necesita random_state)
# 10. √ötil cuando quieres visualizar la jerarqu√≠a de clusters

### 13.1. KMeans - Gu√≠a Completa

In [None]:
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score, silhouette_samples, davies_bouldin_score, calinski_harabasz_score
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# ============================================
# KMEANS B√ÅSICO
# ============================================
# KMeans agrupa datos en K clusters minimizando la distancia intra-cluster

# Uso b√°sico
kmeans = KMeans(n_clusters=3, random_state=42)
clusters = kmeans.fit_predict(X)

# Atributos importantes despu√©s del fit
centroids = kmeans.cluster_centers_     # coordenadas de los centroides
labels = kmeans.labels_                 # etiquetas de cluster para cada punto
inertia = kmeans.inertia_               # suma de distancias al cuadrado
n_iter = kmeans.n_iter_                 # n√∫mero de iteraciones realizadas

print(f'Inercia: {inertia}')
print(f'Iteraciones: {n_iter}')
print(f'Centroides:\n{centroids}')

# ============================================
# PAR√ÅMETROS IMPORTANTES
# ============================================
kmeans = KMeans(
    n_clusters=3,              # n√∫mero de clusters (OBLIGATORIO elegir)
    init='k-means++',          # m√©todo inicializaci√≥n: 'k-means++', 'random', array
    n_init=10,                 # n√∫mero de veces que se ejecuta con diferentes inicializaciones
    max_iter=300,              # n√∫mero m√°ximo de iteraciones por ejecuci√≥n
    tol=1e-4,                  # tolerancia para convergencia
    random_state=42,           # semilla para reproducibilidad
    algorithm='lloyd'          # 'lloyd', 'elkan' (m√°s r√°pido para datos densos)
)
kmeans.fit(X)

# ============================================
# M√âTODO DEL CODO - Encontrar K √≥ptimo
# ============================================
# Prueba diferentes valores de K y grafica la inercia

inertias = []
K_range = range(1, 11)

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    km.fit(X)
    inertias.append(km.inertia_)

# Graficar m√©todo del codo
plt.figure(figsize=(10, 6))
plt.plot(K_range, inertias, 'bo-')
plt.xlabel('N√∫mero de clusters (K)')
plt.ylabel('Inercia (Within-Cluster Sum of Squares)')
plt.title('M√©todo del Codo para Selecci√≥n de K')
plt.xticks(K_range)
plt.grid(True)
plt.show()

# El "codo" indica el K √≥ptimo (donde la inercia deja de decrecer significativamente)

# ============================================
# M√âTODO DE LA SILUETA - Encontrar K √≥ptimo
# ============================================
# Silhouette score: mide qu√© tan bien separados est√°n los clusters
# Valores: [-1, 1]. M√°s cercano a 1 = mejor

silhouette_scores = []
K_range = range(2, 11)  # m√≠nimo 2 clusters

for k in K_range:
    km = KMeans(n_clusters=k, random_state=42, n_init=10)
    labels = km.fit_predict(X)
    score = silhouette_score(X, labels)
    silhouette_scores.append(score)
    print(f'K={k}: Silhouette Score = {score:.4f}')

# Graficar silhouette scores
plt.figure(figsize=(10, 6))
plt.plot(K_range, silhouette_scores, 'ro-')
plt.xlabel('N√∫mero de clusters (K)')
plt.ylabel('Silhouette Score')
plt.title('M√©todo de la Silueta para Selecci√≥n de K')
plt.xticks(K_range)
plt.grid(True)
plt.show()

# El K con mayor silhouette score es el √≥ptimo
best_k = K_range[np.argmax(silhouette_scores)]
print(f'\nMejor K seg√∫n Silhouette: {best_k}')

# ============================================
# OTRAS M√âTRICAS DE EVALUACI√ìN
# ============================================
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)

# Davies-Bouldin Index: menor es mejor (mide separaci√≥n entre clusters)
db_score = davies_bouldin_score(X, labels)
print(f'Davies-Bouldin Index: {db_score:.4f}')

# Calinski-Harabasz Index: mayor es mejor (ratio varianza entre/dentro clusters)
ch_score = calinski_harabasz_score(X, labels)
print(f'Calinski-Harabasz Index: {ch_score:.4f}')

# ============================================
# VISUALIZACI√ìN DE CLUSTERS (2D)
# ============================================
kmeans = KMeans(n_clusters=3, random_state=42)
labels = kmeans.fit_predict(X)
centroids = kmeans.cluster_centers_

plt.figure(figsize=(10, 6))
# Graficar puntos coloreados por cluster
scatter = plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50, alpha=0.6)
# Graficar centroides
plt.scatter(centroids[:, 0], centroids[:, 1], 
           c='red', marker='X', s=200, edgecolors='black', linewidths=2,
           label='Centroides')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('KMeans Clustering')
plt.colorbar(scatter, label='Cluster')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# GR√ÅFICO DE SILUETA DETALLADO
# ============================================
from matplotlib import cm

def plot_silhouette(X, n_clusters):
    """Grafica el an√°lisis de silueta para KMeans"""
    kmeans = KMeans(n_clusters=n_clusters, random_state=42)
    labels = kmeans.fit_predict(X)
    
    silhouette_avg = silhouette_score(X, labels)
    sample_silhouette_values = silhouette_samples(X, labels)
    
    fig, ax = plt.subplots(figsize=(10, 6))
    y_lower = 10
    
    for i in range(n_clusters):
        # Valores de silueta para cluster i
        ith_cluster_silhouette_values = sample_silhouette_values[labels == i]
        ith_cluster_silhouette_values.sort()
        
        size_cluster_i = ith_cluster_silhouette_values.shape[0]
        y_upper = y_lower + size_cluster_i
        
        color = cm.nipy_spectral(float(i) / n_clusters)
        ax.fill_betweenx(np.arange(y_lower, y_upper),
                        0, ith_cluster_silhouette_values,
                        facecolor=color, edgecolor=color, alpha=0.7)
        
        ax.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
        y_lower = y_upper + 10
    
    ax.set_xlabel('Coeficiente de Silueta')
    ax.set_ylabel('Cluster')
    ax.set_title(f'Gr√°fico de Silueta (K={n_clusters})')
    ax.axvline(x=silhouette_avg, color="red", linestyle="--", 
              label=f'Promedio: {silhouette_avg:.3f}')
    ax.legend()
    plt.show()

# Probar con diferentes K
plot_silhouette(X, n_clusters=3)

# ============================================
# PREDECIR CLUSTER PARA NUEVOS DATOS
# ============================================
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X)

# Nuevos puntos
nuevos_datos = np.array([[5, 5], [20, 20], [15, 15]])
clusters_nuevos = kmeans.predict(nuevos_datos)
print(f'Clusters asignados a nuevos datos: {clusters_nuevos}')

# Distancia a cada centroide
distancias = kmeans.transform(nuevos_datos)
print(f'Distancias a centroides:\n{distancias}')

# ============================================
# INICIALIZACI√ìN PERSONALIZADA
# ============================================
# Puedes especificar centroides iniciales manualmente
centroides_iniciales = np.array([[5, 5], [15, 15], [25, 25]])
kmeans = KMeans(n_clusters=3, init=centroides_iniciales, n_init=1, random_state=42)
kmeans.fit(X)

# ============================================
# MANEJO DE DATOS CON PANDAS
# ============================================
# Si X es un DataFrame
df = pd.DataFrame(X, columns=['Feature1', 'Feature2'])

kmeans = KMeans(n_clusters=3, random_state=42)
df['Cluster'] = kmeans.fit_predict(df[['Feature1', 'Feature2']])

# Ver estad√≠sticas por cluster
print(df.groupby('Cluster').mean())
print(df.groupby('Cluster').size())

# ============================================
# COMPARACI√ìN M√öLTIPLES K
# ============================================
def evaluar_kmeans(X, max_k=10):
    """Eval√∫a KMeans con diferentes K y muestra m√©tricas"""
    resultados = []
    
    for k in range(2, max_k + 1):
        kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
        labels = kmeans.fit_predict(X)
        
        resultados.append({
            'K': k,
            'Inercia': kmeans.inertia_,
            'Silhouette': silhouette_score(X, labels),
            'Davies-Bouldin': davies_bouldin_score(X, labels),
            'Calinski-Harabasz': calinski_harabasz_score(X, labels)
        })
    
    df_resultados = pd.DataFrame(resultados)
    print(df_resultados.to_string(index=False))
    
    return df_resultados

# Evaluar y comparar
df_metricas = evaluar_kmeans(X, max_k=10)

# ============================================
# TIPS Y MEJORES PR√ÅCTICAS
# ============================================
# 1. SIEMPRE escalar los datos antes de KMeans
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)

# 2. Usar m√∫ltiples m√©todos para elegir K (codo + silueta)

# 3. n_init=10 o m√°s para evitar m√≠nimos locales

# 4. KMeans asume clusters esf√©ricos y de tama√±o similar
#    No funciona bien con clusters de formas irregulares

# 5. Sensible a outliers - considerar eliminarlos primero

# 6. Para datasets grandes, usar algorithm='elkan' o MiniBatchKMeans
from sklearn.cluster import MiniBatchKMeans
kmeans_mb = MiniBatchKMeans(n_clusters=3, batch_size=100, random_state=42)
kmeans_mb.fit(X)

# 7. Comparar con otros algoritmos si KMeans no funciona bien
#    (DBSCAN para clusters de forma arbitraria, por ejemplo)

In [None]:
import joblib
import pickle

# Guardar modelo con joblib (recomendado)
joblib.dump(modelo, 'modelo.pkl')

# Cargar modelo con joblib
modelo_cargado = joblib.load('modelo.pkl')
y_pred = modelo_cargado.predict(X_test)

# Guardar con pickle
with open('modelo.pkl', 'wb') as file:
    pickle.dump(modelo, file)

# Cargar con pickle
with open('modelo.pkl', 'rb') as file:
    modelo_cargado = pickle.load(file)

# Guardar pipeline completo
pipeline = make_pipeline(StandardScaler(), RandomForestClassifier())
pipeline.fit(X_train, y_train)
joblib.dump(pipeline, 'pipeline_completo.pkl')

# Cargar pipeline
pipeline_cargado = joblib.load('pipeline_completo.pkl')
y_pred = pipeline_cargado.predict(X_test)

## 15. Metricas de Evaluacion - Clasificacion

In [None]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score
from sklearn.metrics import confusion_matrix, classification_report, roc_auc_score, roc_curve

# Ejemplo basico con regresion logistica
modelo = LogisticRegression()
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)
y_pred_proba = modelo.predict_proba(X_test)[:, 1]

# Matriz de confusion
cm = confusion_matrix(y_test, y_pred)
print("Matriz de Confusion:")
print(cm)

# Reporte de clasificacion
print("\nReporte de Clasificacion:")
print(classification_report(y_test, y_pred))

# ROC AUC
roc_auc = roc_auc_score(y_test, y_pred_proba)
print(f"ROC AUC: {roc_auc:.4f}")

# Curva ROC
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)
plt.figure()
plt.plot(fpr, tpr, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True)
plt.show()

### 15.1. ROC Curve y PR Curve - Gu√≠a Completa

In [None]:
from sklearn.metrics import roc_curve, roc_auc_score, auc
from sklearn.metrics import precision_recall_curve, average_precision_score
from sklearn.metrics import RocCurveDisplay, PrecisionRecallDisplay
import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import label_binarize
from itertools import cycle

# ============================================
# ROC CURVE - Clasificaci√≥n Binaria
# ============================================
# ROC (Receiver Operating Characteristic) mide trade-off entre TPR y FPR
# TPR (True Positive Rate) = Recall = Sensitivity
# FPR (False Positive Rate) = 1 - Specificity

# Obtener probabilidades del modelo (clasificaci√≥n binaria)
# Asumiendo: modelo.predict_proba(X_test) ya calculado
y_pred_proba = modelo.predict_proba(X_test)[:, 1]  # probabilidades de clase positiva

# Calcular ROC curve
fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

# Calcular AUC (Area Under the Curve)
roc_auc = roc_auc_score(y_test, y_pred_proba)
# Tambi√©n: roc_auc = auc(fpr, tpr)

# Graficar ROC Curve
plt.figure(figsize=(10, 6))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random Classifier')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate (FPR)')
plt.ylabel('True Positive Rate (TPR)')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

# Interpretaci√≥n del AUC:
# AUC = 1.0: clasificador perfecto
# AUC = 0.9-1.0: excelente
# AUC = 0.8-0.9: muy bueno
# AUC = 0.7-0.8: bueno
# AUC = 0.6-0.7: mediocre
# AUC = 0.5: random (no mejor que adivinar)
# AUC < 0.5: peor que random (invertir predicciones)

print(f'AUC Score: {roc_auc:.4f}')

# ============================================
# ROC CURVE - Usando RocCurveDisplay (sklearn >= 0.24)
# ============================================
from sklearn.metrics import RocCurveDisplay

# M√©todo 1: Desde predicciones
display = RocCurveDisplay.from_predictions(y_test, y_pred_proba, name='Modelo')
plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# M√©todo 2: Desde estimador entrenado
display = RocCurveDisplay.from_estimator(modelo, X_test, y_test, name='Modelo')
plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# ENCONTRAR MEJOR THRESHOLD
# ============================================
# Encontrar threshold que maximiza TPR - FPR (Youden's J statistic)
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold = thresholds[optimal_idx]
print(f'Optimal Threshold: {optimal_threshold:.4f}')
print(f'TPR at optimal: {tpr[optimal_idx]:.4f}')
print(f'FPR at optimal: {fpr[optimal_idx]:.4f}')

# Visualizar threshold √≥ptimo en ROC
plt.figure(figsize=(10, 6))
plt.plot(fpr, tpr, 'b-', lw=2, label=f'ROC (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], 'k--', label='Random')
plt.scatter(fpr[optimal_idx], tpr[optimal_idx], marker='o', color='red', s=100, 
           label=f'Optimal (threshold={optimal_threshold:.2f})')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve with Optimal Threshold')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# Usar threshold √≥ptimo para hacer predicciones
y_pred_optimal = (y_pred_proba >= optimal_threshold).astype(int)

# ============================================
# PRECISION-RECALL CURVE - Clasificaci√≥n Binaria
# ============================================
# √ötil cuando hay desbalance de clases (muchos m√°s negativos que positivos)
# Enfocada en la clase positiva (minoritaria)

# Calcular Precision-Recall curve
precision, recall, pr_thresholds = precision_recall_curve(y_test, y_pred_proba)

# Calcular Average Precision (AP) - resumen de PR curve
ap_score = average_precision_score(y_test, y_pred_proba)

# Graficar PR Curve
plt.figure(figsize=(10, 6))
plt.plot(recall, precision, color='blue', lw=2, label=f'PR curve (AP = {ap_score:.2f})')
plt.axhline(y=np.mean(y_test), color='red', linestyle='--', 
           label=f'Baseline (prevalence = {np.mean(y_test):.2f})')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('Recall (Sensitivity)')
plt.ylabel('Precision')
plt.title('Precision-Recall Curve')
plt.legend(loc="lower left")
plt.grid(True, alpha=0.3)
plt.show()

# Interpretaci√≥n:
# AP cercano a 1: excelente
# AP cercano a prevalencia de clase positiva: baseline
print(f'Average Precision Score: {ap_score:.4f}')
print(f'Baseline (prevalence): {np.mean(y_test):.4f}')

# ============================================
# PR CURVE - Usando PrecisionRecallDisplay
# ============================================
from sklearn.metrics import PrecisionRecallDisplay

# M√©todo 1: Desde predicciones
display = PrecisionRecallDisplay.from_predictions(y_test, y_pred_proba, name='Modelo')
plt.axhline(y=np.mean(y_test), color='red', linestyle='--', label='Baseline')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# M√©todo 2: Desde estimador
display = PrecisionRecallDisplay.from_estimator(modelo, X_test, y_test, name='Modelo')
plt.axhline(y=np.mean(y_test), color='red', linestyle='--', label='Baseline')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# COMPARAR M√öLTIPLES MODELOS - ROC
# ============================================
# Comparar ROC curves de diferentes modelos

modelos_comparar = {
    'Logistic Regression': modelo_lr,
    'Random Forest': modelo_rf,
    'SVM': modelo_svm
}

plt.figure(figsize=(10, 6))
plt.plot([0, 1], [0, 1], 'k--', lw=2, label='Random')

for nombre, modelo in modelos_comparar.items():
    y_proba = modelo.predict_proba(X_test)[:, 1]
    fpr, tpr, _ = roc_curve(y_test, y_proba)
    roc_auc = auc(fpr, tpr)
    plt.plot(fpr, tpr, lw=2, label=f'{nombre} (AUC = {roc_auc:.2f})')

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves - Model Comparison')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# COMPARAR M√öLTIPLES MODELOS - PR
# ============================================
plt.figure(figsize=(10, 6))
baseline = np.mean(y_test)
plt.axhline(y=baseline, color='red', linestyle='--', lw=2, label=f'Baseline ({baseline:.2f})')

for nombre, modelo in modelos_comparar.items():
    y_proba = modelo.predict_proba(X_test)[:, 1]
    precision, recall, _ = precision_recall_curve(y_test, y_proba)
    ap = average_precision_score(y_test, y_proba)
    plt.plot(recall, precision, lw=2, label=f'{nombre} (AP = {ap:.2f})')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curves - Model Comparison')
plt.legend(loc="lower left")
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# ROC CURVE - Clasificaci√≥n Multiclase (One-vs-Rest)
# ============================================
# Binarizar las etiquetas para multiclase
y_test_bin = label_binarize(y_test, classes=np.unique(y_test))
n_classes = y_test_bin.shape[1]

# Obtener probabilidades para todas las clases
y_pred_proba_multi = modelo.predict_proba(X_test)

# Calcular ROC curve y AUC para cada clase
fpr_multi = dict()
tpr_multi = dict()
roc_auc_multi = dict()

for i in range(n_classes):
    fpr_multi[i], tpr_multi[i], _ = roc_curve(y_test_bin[:, i], y_pred_proba_multi[:, i])
    roc_auc_multi[i] = auc(fpr_multi[i], tpr_multi[i])

# Calcular micro-average ROC curve (agregando todas las clases)
fpr_micro, tpr_micro, _ = roc_curve(y_test_bin.ravel(), y_pred_proba_multi.ravel())
roc_auc_micro = auc(fpr_micro, tpr_micro)

# Graficar ROC curves para cada clase
plt.figure(figsize=(10, 6))
colors = cycle(['blue', 'red', 'green', 'orange', 'purple'])

for i, color in zip(range(n_classes), colors):
    plt.plot(fpr_multi[i], tpr_multi[i], color=color, lw=2,
            label=f'Clase {i} (AUC = {roc_auc_multi[i]:.2f})')

plt.plot(fpr_micro, tpr_micro, color='deeppink', linestyle=':', lw=3,
        label=f'Micro-average (AUC = {roc_auc_micro:.2f})')
plt.plot([0, 1], [0, 1], 'k--', lw=2, label='Random')

plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curves - Multiclass (One-vs-Rest)')
plt.legend(loc="lower right")
plt.grid(True, alpha=0.3)
plt.show()

# AUC multiclase con sklearn (weighted average)
roc_auc_weighted = roc_auc_score(y_test, y_pred_proba_multi, 
                                 multi_class='ovr', average='weighted')
print(f'ROC AUC (weighted): {roc_auc_weighted:.4f}')

# ============================================
# PR CURVE - Clasificaci√≥n Multiclase
# ============================================
precision_multi = dict()
recall_multi = dict()
ap_multi = dict()

for i in range(n_classes):
    precision_multi[i], recall_multi[i], _ = precision_recall_curve(
        y_test_bin[:, i], y_pred_proba_multi[:, i])
    ap_multi[i] = average_precision_score(y_test_bin[:, i], y_pred_proba_multi[:, i])

# Graficar PR curves para cada clase
plt.figure(figsize=(10, 6))

for i, color in zip(range(n_classes), colors):
    plt.plot(recall_multi[i], precision_multi[i], color=color, lw=2,
            label=f'Clase {i} (AP = {ap_multi[i]:.2f})')

plt.xlabel('Recall')
plt.ylabel('Precision')
plt.title('Precision-Recall Curves - Multiclass')
plt.legend(loc="best")
plt.grid(True, alpha=0.3)
plt.show()

# ============================================
# VISUALIZACI√ìN COMBINADA ROC + PR
# ============================================
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 6))

# ROC Curve
ax1.plot(fpr, tpr, 'b-', lw=2, label=f'ROC (AUC = {roc_auc:.2f})')
ax1.plot([0, 1], [0, 1], 'k--', lw=2, label='Random')
ax1.set_xlabel('False Positive Rate')
ax1.set_ylabel('True Positive Rate')
ax1.set_title('ROC Curve')
ax1.legend(loc="lower right")
ax1.grid(True, alpha=0.3)

# PR Curve
ax2.plot(recall, precision, 'r-', lw=2, label=f'PR (AP = {ap_score:.2f})')
ax2.axhline(y=np.mean(y_test), color='k', linestyle='--', lw=2, label='Baseline')
ax2.set_xlabel('Recall')
ax2.set_ylabel('Precision')
ax2.set_title('Precision-Recall Curve')
ax2.legend(loc="best")
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# ============================================
# AN√ÅLISIS DE THRESHOLDS
# ============================================
def analyze_thresholds(y_true, y_proba, thresholds_to_test=None):
    """Analiza m√©tricas para diferentes thresholds"""
    if thresholds_to_test is None:
        thresholds_to_test = np.linspace(0, 1, 21)
    
    results = []
    for threshold in thresholds_to_test:
        y_pred = (y_proba >= threshold).astype(int)
        
        # Calcular m√©tricas
        tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
        
        results.append({
            'Threshold': threshold,
            'Accuracy': accuracy_score(y_true, y_pred),
            'Precision': precision_score(y_true, y_pred, zero_division=0),
            'Recall': recall_score(y_true, y_pred, zero_division=0),
            'F1': f1_score(y_true, y_pred, zero_division=0),
            'TPR': tp / (tp + fn) if (tp + fn) > 0 else 0,
            'FPR': fp / (fp + tn) if (fp + tn) > 0 else 0
        })
    
    df_results = pd.DataFrame(results)
    return df_results

# Analizar thresholds
df_thresholds = analyze_thresholds(y_test, y_pred_proba)
print(df_thresholds)

# Graficar m√©tricas vs threshold
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

axes[0, 0].plot(df_thresholds['Threshold'], df_thresholds['Accuracy'], 'b-', lw=2)
axes[0, 0].set_title('Accuracy vs Threshold')
axes[0, 0].set_xlabel('Threshold')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].grid(True, alpha=0.3)

axes[0, 1].plot(df_thresholds['Threshold'], df_thresholds['Precision'], 'r-', lw=2, label='Precision')
axes[0, 1].plot(df_thresholds['Threshold'], df_thresholds['Recall'], 'g-', lw=2, label='Recall')
axes[0, 1].set_title('Precision & Recall vs Threshold')
axes[0, 1].set_xlabel('Threshold')
axes[0, 1].set_ylabel('Score')
axes[0, 1].legend()
axes[0, 1].grid(True, alpha=0.3)

axes[1, 0].plot(df_thresholds['Threshold'], df_thresholds['F1'], 'm-', lw=2)
axes[1, 0].set_title('F1 Score vs Threshold')
axes[1, 0].set_xlabel('Threshold')
axes[1, 0].set_ylabel('F1 Score')
axes[1, 0].grid(True, alpha=0.3)

axes[1, 1].plot(df_thresholds['Threshold'], df_thresholds['TPR'], 'b-', lw=2, label='TPR')
axes[1, 1].plot(df_thresholds['Threshold'], df_thresholds['FPR'], 'r-', lw=2, label='FPR')
axes[1, 1].set_title('TPR & FPR vs Threshold')
axes[1, 1].set_xlabel('Threshold')
axes[1, 1].set_ylabel('Rate')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# ============================================
# TIPS Y MEJORES PR√ÅCTICAS
# ============================================
# 1. ROC Curve: usar cuando clases est√°n balanceadas
# 2. PR Curve: usar cuando hay desbalance de clases (enfoque en clase positiva)
# 3. AUC = 0.5 en ROC significa modelo no mejor que random
# 4. En PR curve, baseline es la prevalencia de clase positiva
# 5. Threshold √≥ptimo depende del contexto (costo de FP vs FN)
# 6. Para multiclase: usar One-vs-Rest o One-vs-One
# 7. Siempre visualizar ambas curves para entender mejor el modelo
# 8. Average Precision es m√°s informativo que solo mirar la curva
# 9. Micro-average: √∫til cuando clases tienen tama√±os diferentes
# 10. Macro-average: trata todas las clases por igual

## 16. Metricas de Evaluacion - Regresion

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error, r2_score

# Ejemplo basico con regresion lineal
modelo = LinearRegression()
modelo.fit(X_train, y_train)
y_pred = modelo.predict(X_test)

# Calcular RMSE, MAE, R2
rmse = mean_squared_error(y_test, y_pred, squared=False)
mae = mean_absolute_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"RMSE: {rmse:.4f}")
print(f"MAE: {mae:.4f}")
print(f"R2: {r2:.4f}")

# Graficar errores
errores = y_test - y_pred
plt.figure(figsize=(10, 6))
plt.hist(errores, bins=30, alpha=0.7, color='blue', edgecolor='black')
plt.xlabel('Error')
plt.ylabel('Frecuencia')
plt.title('Histograma de Errores')
plt.grid(axis='y', alpha=0.75)
plt.show()