# üìö Modelos Lineales con M√∫ltiples Par√°metros
## Regresi√≥n Lineal y Clasificaci√≥n Log√≠stica

**Curso:** IFCD093PO - Machine Learning

**Objetivo:** Dominar modelos lineales con m√∫ltiples caracter√≠sticas y su interpretaci√≥n

---

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.datasets import fetch_california_housing, load_diabetes
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LinearRegression, Ridge, Lasso, LogisticRegression
from sklearn.metrics import (r2_score, mean_squared_error, accuracy_score,
                             precision_score, recall_score, f1_score,
                             confusion_matrix, roc_curve, auc, classification_report)

import warnings
warnings.filterwarnings('ignore')

print('‚úÖ Todas las librer√≠as importadas correctamente')

---
# üéØ TEOR√çA: REGRESI√ìN LINEAL vs CLASIFICACI√ìN LOG√çSTICA

## Conceptos Fundamentales

### üìà Regresi√≥n Lineal
- **Objetivo:** Predecir valores **continuos** (n√∫meros reales)

- **Ejemplos:** Precio de casa, temperatura, salario

- **F√≥rmula:** $y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$

- **M√©trica:** R¬≤, RMSE

### üéØ Clasificaci√≥n Log√≠stica
- **Objetivo:** Predecir **categor√≠as binarias** (S√≠/No, 0/1)
- **Ejemplos:** Spam/No spam, Enfermo/Sano, Aprobado/Reprobado
- **Funci√≥n Sigmoide:** $P = \frac{1}{1 + e^{-z}}$ donde $z = \beta_0 + \beta_1 x_1 + ...$
- **Resultado:** Probabilidad entre 0 y 1
- **M√©trica:** Exactitud, Precisi√≥n, Recall, F1-Score, AUC-ROC


In [None]:
print('üéØ TEOR√çA: REGRESI√ìN LINEAL vs CLASIFICACI√ìN LOG√çSTICA\\n')

fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# REGRESI√ìN LINEAL
np.random.seed(42)
x_reg = np.linspace(0, 10, 100)
y_reg = 2*x_reg + 1 + np.random.normal(0, 1, 100)
axes[0].scatter(x_reg, y_reg, alpha=0.6, s=50)
z_reg = np.polyfit(x_reg, y_reg, 1)
p_reg = np.poly1d(z_reg)
axes[0].plot(x_reg, p_reg(x_reg), 'r--', linewidth=2.5)
axes[0].set_xlabel('Caracter√≠stica X', fontweight='bold')
axes[0].set_ylabel('Target Y (continuo)', fontweight='bold')
axes[0].set_title('REGRESI√ìN LINEAL\\nPredice valores continuos', fontweight='bold', fontsize=12)
axes[0].grid(True, alpha=0.3)

# CLASIFICACI√ìN LOG√çSTICA
x_clas = np.linspace(-3, 3, 100)
y_prob = 1 / (1 + np.exp(-(2*x_clas)))
y_clas = (np.random.random(len(x_clas)) < y_prob).astype(int)
axes[1].scatter(x_clas, y_clas, alpha=0.6, s=50, c=y_clas, cmap='coolwarm')
axes[1].plot(x_clas, y_prob, 'g-', linewidth=2.5, label='Sigmoide')
axes[1].axhline(y=0.5, color='black', linestyle='--', alpha=0.5)
axes[1].set_xlabel('Caracter√≠stica X', fontweight='bold')
axes[1].set_ylabel('Probabilidad', fontweight='bold')
axes[1].set_title('REGRESI√ìN LOG√çSTICA\\nPredice probabilidades', fontweight='bold', fontsize=12)
axes[1].set_ylim(-0.1, 1.1)
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print('\\nüí° DIFERENCIAS CLAVE:')
print('   ‚Ä¢ Regresi√≥n: Predice valores continuos')
print('   ‚Ä¢ Clasificaci√≥n: Predice probabilidades ‚Üí Categor√≠as')
print('   ‚Ä¢ Ambos: Modelos LINEALES e INTERPRETABLES')

---
# üè† EJERCICIO 1: CALIFORNIA HOUSING - REGRESI√ìN M√öLTIPLE

**Objetivo:** Predecir precios de viviendas usando m√∫ltiples caracter√≠sticas

**Dataset:** 20,640 casas de California con 8 caracter√≠sticas

In [None]:
print('üè† EJERCICIO 1: CALIFORNIA HOUSING - REGRESI√ìN M√öLTIPLE\\n')
print('='*70)

california = fetch_california_housing()
df_california = pd.DataFrame(california.data, columns=california.feature_names)
df_california['MedHouseVal'] = california.target

print('\\nüìä INFORMACI√ìN DEL DATASET:\\n')
print(f'   Forma: {df_california.shape}')
print(f'   Caracter√≠sticas: {list(df_california.columns)}')
print(f'\\nüìà ESTAD√çSTICAS DESCRIPTIVAS:')
print(df_california.describe())

In [None]:
plt.figure(figsize=(10, 8))
corr_matrix = df_california.corr()
sns.heatmap(corr_matrix[['MedHouseVal']].sort_values('MedHouseVal', ascending=False),
            annot=True, cmap='coolwarm', center=0, fmt='.3f', linewidths=1)
plt.title('Correlaci√≥n con Precio de Vivienda', fontweight='bold', fontsize=12)
plt.tight_layout()
plt.show()

print('\\nüîç CARACTER√çSTICAS M√ÅS CORRELACIONADAS:')
corr_target = corr_matrix['MedHouseVal'].drop('MedHouseVal').sort_values(ascending=False)
for feat, corr in corr_target.items():
    print(f'   ‚Ä¢ {feat:20s}: {corr:+.4f}')

## 1.2 Regresi√≥n Simple vs M√∫ltiple

Compararemos: **1 variable** vs **Todas las variables**

In [None]:
print('\\n' + '='*70)
print('COMPARANDO: REGRESI√ìN SIMPLE vs M√öLTIPLE')
print('='*70)

# REGRESI√ìN SIMPLE
print('\\n1Ô∏è‚É£ REGRESI√ìN SIMPLE (MedInc solo)\\n')
X_simple = df_california[['MedInc']]
y = df_california['MedHouseVal']
X_train, X_test, y_train, y_test = train_test_split(X_simple, y, test_size=0.2, random_state=42)

model_simple = LinearRegression()
model_simple.fit(X_train, y_train)
y_pred_simple = model_simple.predict(X_test)
r2_simple = r2_score(y_test, y_pred_simple)
rmse_simple = np.sqrt(mean_squared_error(y_test, y_pred_simple))

print(f'   R¬≤ Score: {r2_simple:.4f}')
print(f'   RMSE: {rmse_simple:.4f}')

# REGRESI√ìN M√öLTIPLE
print('\\n2Ô∏è‚É£ REGRESI√ìN M√öLTIPLE (8 variables)\\n')
X_multiple = df_california.drop('MedHouseVal', axis=1)
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(X_multiple, y, test_size=0.2, random_state=42)

model_multiple = LinearRegression()
model_multiple.fit(X_train_m, y_train_m)
y_pred_multiple = model_multiple.predict(X_test_m)
r2_multiple = r2_score(y_test_m, y_pred_multiple)
rmse_multiple = np.sqrt(mean_squared_error(y_test_m, y_pred_multiple))

print(f'   R¬≤ Score: {r2_multiple:.4f}')
print(f'   RMSE: {rmse_multiple:.4f}')

print(f'\\nüìä COMPARACI√ìN:')
print(f'   R¬≤ Simple: {r2_simple:.4f} vs R¬≤ M√∫ltiple: {r2_multiple:.4f}')
print(f'   Mejora: +{((r2_multiple - r2_simple) / r2_simple) * 100:.1f}%')

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

coef_df = pd.DataFrame({
    'Caracter√≠stica': X_multiple.columns,
    'Coeficiente': model_multiple.coef_
}).sort_values('Coeficiente', ascending=True)

colors = ['green' if x > 0 else 'red' for x in coef_df['Coeficiente']]
axes[0].barh(coef_df['Caracter√≠stica'], coef_df['Coeficiente'], color=colors, alpha=0.7)
axes[0].set_xlabel('Coeficiente', fontweight='bold')
axes[0].set_title('Coeficientes\\n(Verde=Aumenta, Rojo=Disminuye)', fontweight='bold')
axes[0].axvline(x=0, color='black', linestyle='-', linewidth=1)
axes[0].grid(True, alpha=0.3, axis='x')

models = ['Simple (1)', 'M√∫ltiple (8)']
bars = axes[1].bar(models, [r2_simple, r2_multiple], color=['lightblue', 'lightgreen'], alpha=0.8, edgecolor='black', linewidth=2)
axes[1].set_ylabel('R¬≤ Score', fontweight='bold')
axes[1].set_title('Comparaci√≥n de Rendimiento', fontweight='bold')
axes[1].set_ylim(0, 0.7)
axes[1].grid(True, alpha=0.3, axis='y')

for bar, score in zip(bars, [r2_simple, r2_multiple]):
    height = bar.get_height()
    axes[1].text(bar.get_x() + bar.get_width()/2., height + 0.02,
                f'{score:.4f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print('\\nüí° CONCLUSI√ìN: Las m√∫ltiples caracter√≠sticas mejoran significativamente el modelo')

---
# ü©∫ EJERCICIO 2: DIABETES - CLASIFICACI√ìN M√öLTIPLE

**Objetivo:** Predecir si un paciente tiene diabetes avanzada

**Dataset:** 442 pacientes con 10 caracter√≠sticas cl√≠nicas

In [None]:
print('\\nü©∫ EJERCICIO 2: DIABETES - CLASIFICACI√ìN M√öLTIPLE\\n')
print('='*70)

diabetes = load_diabetes()
df_diabetes = pd.DataFrame(diabetes.data, columns=diabetes.feature_names)
df_diabetes['progression'] = diabetes.target
df_diabetes['diabetes_avanzada'] = (df_diabetes['progression'] > df_diabetes['progression'].median()).astype(int)

print('\\nüìä INFORMACI√ìN DEL DATASET:')
print(f'   Forma: {df_diabetes.shape}')
print(f'   Proporci√≥n diabetes avanzada: {df_diabetes["diabetes_avanzada"].mean()*100:.1f}%')

In [None]:
X_diab = df_diabetes.drop(['progression', 'diabetes_avanzada'], axis=1)
y_diab = df_diabetes['diabetes_avanzada']

X_diab_train, X_diab_test, y_diab_train, y_diab_test = train_test_split(
    X_diab, y_diab, test_size=0.3, random_state=42, stratify=y_diab
)

scaler_diab = StandardScaler()
X_diab_train_scaled = scaler_diab.fit_transform(X_diab_train)
X_diab_test_scaled = scaler_diab.transform(X_diab_test)

log_reg = LogisticRegression(random_state=42, max_iter=1000)
log_reg.fit(X_diab_train_scaled, y_diab_train)

y_pred_log = log_reg.predict(X_diab_test_scaled)
accuracy = accuracy_score(y_diab_test, y_pred_log)
f1 = f1_score(y_diab_test, y_pred_log)

print(f'\\nüìä RESULTADOS:')
print(f'   Exactitud: {accuracy:.4f} ({accuracy*100:.1f}%)')
print(f'   F1-Score: {f1:.4f}')

In [None]:
coef_diabetes = pd.DataFrame({
    'Caracter√≠stica': X_diab.columns,
    'Coeficiente': log_reg.coef_[0],
    'Odds Ratio': np.exp(log_reg.coef_[0]),
    'Impacto': np.abs(log_reg.coef_[0])
}).sort_values('Impacto', ascending=False)

print(f'\\nüéØ COEFICIENTES (Top 5):')
print(coef_diabetes[['Caracter√≠stica', 'Odds Ratio']].head().to_string(index=False))

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(16, 5))

colors = ['green' if x > 0 else 'red' for x in coef_diabetes['Coeficiente']]
axes[0].barh(coef_diabetes['Caracter√≠stica'], coef_diabetes['Coeficiente'], color=colors, alpha=0.7)
axes[0].set_xlabel('Coeficiente', fontweight='bold')
axes[0].set_title('Coeficientes Log√≠sticos', fontweight='bold')
axes[0].axvline(x=0, color='black', linestyle='-', linewidth=1)
axes[0].grid(True, alpha=0.3, axis='x')

cm = confusion_matrix(y_diab_test, y_pred_log)
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', ax=axes[1], cbar=False,
            xticklabels=['No Avanzada', 'Avanzada'],
            yticklabels=['No Avanzada', 'Avanzada'],
            annot_kws={'fontsize': 12, 'fontweight': 'bold'})
axes[1].set_title('Matriz de Confusi√≥n', fontweight='bold')
axes[1].set_ylabel('Real', fontweight='bold')
axes[1].set_xlabel('Predicci√≥n', fontweight='bold')

plt.tight_layout()
plt.show()

## 2.5 Regularizaci√≥n en Clasificaci√≥n

Probaremos Ridge (L2) y Lasso (L1) para mejorar generalizaci√≥n

In [None]:
param_grid = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

log_reg_cv = LogisticRegression(random_state=42, max_iter=1000)
grid_search = GridSearchCV(log_reg_cv, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_diab_train_scaled, y_diab_train)

best_log_reg = grid_search.best_estimator_
y_pred_best = best_log_reg.predict(X_diab_test_scaled)
accuracy_best = accuracy_score(y_diab_test, y_pred_best)

print(f'‚úÖ Mejores par√°metros: {grid_search.best_params_}')
print(f'üìä Exactitud original: {accuracy:.4f}')
print(f'üìä Exactitud optimizada: {accuracy_best:.4f}')
print(f'üéØ Mejora: +{(accuracy_best - accuracy)*100:.1f}%')

---
# üö¢ EJERCICIO 3: TITANIC - CLASIFICACI√ìN CON FEATURE ENGINEERING

**Objetivo:** Predecir supervivencia en el Titanic usando feature engineering

**Desaf√≠o:** Crear caracter√≠sticas interpretables a partir de datos crudos

In [None]:
print('\\nüö¢ EJERCICIO 3: TITANIC - CLASIFICACI√ìN CON FEATURE ENGINEERING\\n')
print('='*70)

try:
    df_titanic = sns.load_dataset('titanic')
except:
    url = 'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv'
    df_titanic = pd.read_csv(url)

print('\\nüìä INFORMACI√ìN INICIAL:')
print(f'   Forma: {df_titanic.shape}')
print(f'   Columnas: {list(df_titanic.columns)}')

# Feature Engineering
print('\\nüîß REALIZANDO FEATURE ENGINEERING...')
df_titanic_clean = df_titanic.copy()

df_titanic_clean['age'].fillna(df_titanic_clean['age'].median(), inplace=True)
df_titanic_clean['embarked'].fillna(df_titanic_clean['embarked'].mode()[0], inplace=True)
df_titanic_clean.drop(columns=['deck'], inplace=True, errors='ignore')

df_titanic_clean['family_size'] = df_titanic_clean['sibsp'] + df_titanic_clean['parch'] + 1
df_titanic_clean['is_alone'] = (df_titanic_clean['family_size'] == 1).astype(int)
df_titanic_clean['title'] = df_titanic_clean['name'].str.extract(' ([A-Za-z]+)\\.', expand=False)

title_mapping = {
    'Mr': 'Mr', 'Miss': 'Miss', 'Mrs': 'Mrs', 'Master': 'Master',
    'Dr': 'Rare', 'Rev': 'Rare', 'Col': 'Rare', 'Major': 'Rare',
    'Mlle': 'Miss', 'Countess': 'Rare', 'Ms': 'Miss', 'Lady': 'Rare',
    'Jonkheer': 'Rare', 'Don': 'Rare', 'Dona': 'Rare', 'Mme': 'Mrs',
    'Capt': 'Rare', 'Sir': 'Rare'
}
df_titanic_clean['title'] = df_titanic_clean['title'].map(title_mapping)

# Codificar categ√≥ricas
categorical_cols = ['sex', 'embarked', 'title', 'class', 'who', 'adult_male', 'embark_town', 'alive', 'alone']
for col in categorical_cols:
    if col in df_titanic_clean.columns:
        le = LabelEncoder()
        df_titanic_clean[col] = le.fit_transform(df_titanic_clean[col].astype(str))

features_titanic = ['pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked',
                    'family_size', 'is_alone', 'title', 'who', 'adult_male']
X_titanic = df_titanic_clean[features_titanic]
y_titanic = df_titanic_clean['survived']

print(f'‚úÖ Feature Engineering completado')
print(f'   Caracter√≠sticas: {len(features_titanic)}')
print(f'   Tama√±o final: {X_titanic.shape}')

In [None]:
X_titanic_train, X_titanic_test, y_titanic_train, y_titanic_test = train_test_split(
    X_titanic, y_titanic, test_size=0.3, random_state=42, stratify=y_titanic
)

scaler_titanic = StandardScaler()
X_titanic_train_scaled = scaler_titanic.fit_transform(X_titanic_train)
X_titanic_test_scaled = scaler_titanic.transform(X_titanic_test)

log_reg_titanic = LogisticRegression(random_state=42, max_iter=1000)
log_reg_titanic.fit(X_titanic_train_scaled, y_titanic_train)

y_pred_titanic = log_reg_titanic.predict(X_titanic_test_scaled)
y_pred_proba_titanic = log_reg_titanic.predict_proba(X_titanic_test_scaled)[:, 1]

accuracy_titanic = accuracy_score(y_titanic_test, y_pred_titanic)
f1_titanic = f1_score(y_titanic_test, y_pred_titanic)

print(f'\\nüìä RESULTADOS - TITANIC:')
print(f'   Exactitud: {accuracy_titanic:.4f} ({accuracy_titanic*100:.1f}%)')
print(f'   F1-Score: {f1_titanic:.4f}')

coef_titanic = pd.DataFrame({
    'Caracter√≠stica': features_titanic,
    'Coeficiente': log_reg_titanic.coef_[0],
    'Odds Ratio': np.exp(log_reg_titanic.coef_[0]),
    'Impacto': np.abs(log_reg_titanic.coef_[0])
}).sort_values('Impacto', ascending=False)

print(f'\\nüéØ TOP 5 CARACTER√çSTICAS M√ÅS IMPORTANTES:')
print(coef_titanic[['Caracter√≠stica', 'Odds Ratio']].head().to_string(index=False))

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(16, 12))

# Coeficientes
colors = ['green' if x > 0 else 'red' for x in coef_titanic['Coeficiente']]
axes[0, 0].barh(coef_titanic['Caracter√≠stica'], coef_titanic['Coeficiente'], color=colors, alpha=0.7)
axes[0, 0].set_xlabel('Coeficiente', fontweight='bold')
axes[0, 0].set_title('Coeficientes\\n(Verde=Aumenta supervivencia)', fontweight='bold')
axes[0, 0].axvline(x=0, color='black', linestyle='-', linewidth=1)
axes[0, 0].grid(True, alpha=0.3, axis='x')

# Odds Ratios
colors = ['green' if x > 1 else 'red' for x in coef_titanic['Odds Ratio']]
axes[0, 1].barh(coef_titanic['Caracter√≠stica'], coef_titanic['Odds Ratio'], color=colors, alpha=0.7)
axes[0, 1].set_xlabel('Odds Ratio', fontweight='bold')
axes[0, 1].set_title('Odds Ratios\\n(>1=Mayor supervivencia)', fontweight='bold')
axes[0, 1].axvline(x=1, color='black', linestyle='-', linewidth=1)
axes[0, 1].grid(True, alpha=0.3, axis='x')

# Matriz de confusi√≥n
cm_titanic = confusion_matrix(y_titanic_test, y_pred_titanic)
sns.heatmap(cm_titanic, annot=True, fmt='d', cmap='Blues', ax=axes[1, 0], cbar=False,
            xticklabels=['Pred: No', 'Pred: S√≠'],
            yticklabels=['Real: No', 'Real: S√≠'],
            annot_kws={'fontsize': 12, 'fontweight': 'bold'})
axes[1, 0].set_title('Matriz de Confusi√≥n', fontweight='bold')

# Curva ROC
fpr_titanic, tpr_titanic, _ = roc_curve(y_titanic_test, y_pred_proba_titanic)
roc_auc_titanic = auc(fpr_titanic, tpr_titanic)
axes[1, 1].plot(fpr_titanic, tpr_titanic, color='darkorange', lw=2.5, label=f'AUC={roc_auc_titanic:.3f}')
axes[1, 1].plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--', label='Random')
axes[1, 1].set_xlabel('Tasa de Falsos Positivos', fontweight='bold')
axes[1, 1].set_ylabel('Tasa de Verdaderos Positivos', fontweight='bold')
axes[1, 1].set_title('Curva ROC', fontweight='bold')
axes[1, 1].legend()
axes[1, 1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

In [None]:
param_grid_titanic = {
    'C': [0.001, 0.01, 0.1, 1, 10, 100],
    'penalty': ['l1', 'l2'],
    'solver': ['liblinear']
}

log_reg_titanic_cv = LogisticRegression(random_state=42, max_iter=1000)
grid_search_titanic = GridSearchCV(log_reg_titanic_cv, param_grid_titanic, cv=5, scoring='accuracy', n_jobs=-1)
grid_search_titanic.fit(X_titanic_train_scaled, y_titanic_train)

best_log_reg_titanic = grid_search_titanic.best_estimator_
y_pred_best_titanic = best_log_reg_titanic.predict(X_titanic_test_scaled)
accuracy_best_titanic = accuracy_score(y_titanic_test, y_pred_best_titanic)

print(f'‚úÖ Mejores par√°metros: {grid_search_titanic.best_params_}')
print(f'üìä Exactitud original: {accuracy_titanic:.4f}')
print(f'üìä Exactitud optimizada: {accuracy_best_titanic:.4f}')
print(f'üéØ Mejora: +{(accuracy_best_titanic - accuracy_titanic)*100:.1f}%')

cv_scores = cross_val_score(best_log_reg_titanic, X_titanic_train_scaled, y_titanic_train, cv=5, scoring='accuracy')
print(f'\\nüîç Validaci√≥n cruzada (5-fold): {cv_scores.mean():.4f} +/- {cv_scores.std():.4f}')

---
# üìä RESUMEN COMPARATIVO FINAL

Comparando resultados de los 3 ejercicios

In [None]:
print('\\n' + '='*80)
print('üìä RESUMEN COMPARATIVO FINAL')
print('='*80)

resumen = pd.DataFrame({
    'Dataset': ['California', 'California', 'Diabetes', 'Diabetes', 'Titanic', 'Titanic'],
    'Tipo': ['Regresi√≥n', 'Regresi√≥n', 'Clasificaci√≥n', 'Clasificaci√≥n', 'Clasificaci√≥n', 'Clasificaci√≥n'],
    'Modelo': ['Simple', 'M√∫ltiple', 'Base', 'Optimizado', 'Base', 'Optimizado'],
    'M√©trica': [r2_simple, r2_multiple, accuracy, accuracy_best, accuracy_titanic, accuracy_best_titanic]
})

print(resumen.to_string(index=False))

fig, axes = plt.subplots(1, 3, figsize=(18, 6))

axes[0].bar(['Simple', 'M√∫ltiple'], [r2_simple, r2_multiple], color=['lightblue', 'lightgreen'])
axes[0].set_title('California Housing\\n(R¬≤)', fontweight='bold')
axes[0].grid(True, alpha=0.3, axis='y')

axes[1].bar(['Base', 'Optimizado'], [accuracy, accuracy_best], color=['lightblue', 'lightgreen'])
axes[1].set_title('Diabetes\\n(Exactitud)', fontweight='bold')
axes[1].grid(True, alpha=0.3, axis='y')

axes[2].bar(['Base', 'Optimizado'], [accuracy_titanic, accuracy_best_titanic], color=['lightblue', 'lightgreen'])
axes[2].set_title('Titanic\\n(Exactitud)', fontweight='bold')
axes[2].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print(f'''\\nüéØ CONCLUSIONES CLAVE:\\n\\n1. Regresi√≥n M√∫ltiple >> Regresi√≥n Simple\\n   Mejora: +{((r2_multiple - r2_simple) / r2_simple) * 100:.1f}%\\n\\n2. Optimizaci√≥n ayuda pero con retornos decrecientes\\n   Diabetes: +{(accuracy_best - accuracy)*100:.1f}%\\n   Titanic: +{(accuracy_best_titanic - accuracy_titanic)*100:.1f}%\\n\\n3. Feature engineering es CR√çTICO\\n   Titanic sin FE vs con FE: {accuracy_best_titanic*100:.0f}% de exactitud\\n\\n4. Modelos lineales son INTERPRETABLES\\n   Podemos entender cada decisi√≥n del modelo\\n\\n5. Siempre validar con datos independientes\\n   Validaci√≥n cruzada: Media ¬± Desv Est\\n''')

---
# üß™ EJERCICIOS ADICIONALES PARA PRACTICAR

In [None]:
print('\\n' + '='*80)
print('üß™ EJERCICIOS ADICIONALES')
print('='*80)

ejercicios = '''\\nüéØ EJERCICIO 4: Interpretaci√≥n Profunda
\\n1. Selecciona una casa/paciente/pasajero
2. Explica c√≥mo cada caracter√≠stica contribuye a la predicci√≥n
3. ¬øQu√© cambio ser√≠a necesario para cambiar la predicci√≥n?

üîç EJERCICIO 5: An√°lisis de Errores
\\n1. ¬øD√≥nde falla el modelo?
2. ¬øHay patrones en los fallos?
3. ¬øC√≥mo podr√≠as mejorar el feature engineering?

üìà EJERCICIO 6: Experimentaci√≥n
\\n1. Prueba diferentes regularizaciones (L1 vs L2)
2. Crea interacciones entre variables
3. Aplica transformaciones no lineales

üöÄ EJERCICIO 7: Aplicaci√≥n Real
\\n1. Encuentra un dataset similar
2. Aplica el mismo pipeline completo
3. Compara con nuestros resultados
'''

print(ejercicios)

print(f'''\\nüéâ ¬°FELICITACIONES!\\n\\nHas aprendido:\\n‚úÖ Regresi√≥n Lineal M√∫ltiple\\n‚úÖ Regresi√≥n Log√≠stica para Clasificaci√≥n\\n‚úÖ Feature Engineering pr√°ctico\\n‚úÖ Regularizaci√≥n (Ridge/Lasso)\\n‚úÖ Optimizaci√≥n de Hiperpar√°metros\\n‚úÖ Evaluaci√≥n comprehensiva\\n‚úÖ Interpretaci√≥n de Modelos\\n\\n¬°Los modelos lineales son la BASE del ML!\\nDom√≠nalos antes de avanzar a t√©cnicas complejas. üöÄ\\n''')