<a href="https://colab.research.google.com/github/rubuntu/Taller_Introduccion_a_Ciencia_de_Datos_IA_e_Ingenieria_de_Datos/blob/main/sesion_05_regresion_lineal_regresion_logistica.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üìò Comparativo: Regresi√≥n Lineal vs Regresi√≥n Log√≠stica

## 1. Regresi√≥n Lineal

La regresi√≥n lineal busca ajustar una recta que explique la relaci√≥n entre $X$ y $Y$ (continuo):

$$
Y = \beta_0 + \beta_1 X + \epsilon
$$

### C√≥digo en Python



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

X, y = make_regression(n_samples=300, n_features=1, noise=15.0, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

lin_model = LinearRegression().fit(X_train, y_train)
y_pred = lin_model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
r2 = r2_score(y_test, y_pred)

plt.scatter(X_test, y_test, alpha=0.5)
order = np.argsort(X_test[:, 0])
plt.plot(X_test[order], y_pred[order], color="red")
plt.title(f"Regresi√≥n Lineal: RMSE={rmse:.2f}, R¬≤={r2:.3f}")
plt.xlabel("X"); plt.ylabel("y")
plt.show()


In [None]:
# --- Gr√°fico Predicci√≥n vs Realidad ---
plt.scatter(y_test, y_pred, alpha=0.5)
plt.plot([y_test.min(), y_test.max()],
         [y_test.min(), y_test.max()],
         'r--', lw=2)  # l√≠nea ideal
plt.xlabel("Valor Real (y_test)")
plt.ylabel("Predicci√≥n (y_pred)")
plt.title("Predicci√≥n vs Realidad")
plt.show()

## 2. Regresi√≥n Log√≠stica

La regresi√≥n log√≠stica usa la funci√≥n **sigmoide** para mapear cualquier valor real a una probabilidad entre 0 y 1:

$$
P(Y=1|X) = \frac{1}{1 + e^{-(\beta_0 + \beta_1X)}}
$$

### C√≥digo en Python


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, roc_auc_score, RocCurveDisplay

# Dataset binario 1D (v√°lido)
Xc, yc = make_classification(
    n_samples=10000,
    n_features=1,
    n_redundant=0,
    n_informative=1,
    n_classes=2,
    n_clusters_per_class=1,
    class_sep=0.8,      # ‚Üì separa menos las clases
    flip_y=0.2,         # 20% de etiquetas ruidosas
    random_state=42
)

Xc_train, Xc_test, yc_train, yc_test = train_test_split(
    Xc, yc, test_size=0.25, stratify=yc, random_state=42
)

# Modelo
log_model = LogisticRegression().fit(Xc_train, yc_train)

y_pred = log_model.predict(Xc_test)
y_proba = log_model.predict_proba(Xc_test)[:, 1]

print("Accuracy:", accuracy_score(yc_test, y_pred))
print("F1:", f1_score(yc_test, y_pred))
print("AUC:", roc_auc_score(yc_test, y_proba))

# Curva sigmoide
xs = np.linspace(Xc_test.min(), Xc_test.max(), 200).reshape(-1, 1)
sig = log_model.predict_proba(xs)[:, 1]

plt.scatter(Xc_test, yc_test, alpha=0.5)
plt.plot(xs, sig)
plt.title("Regresi√≥n Log√≠stica: Probabilidad P(Y=1|X)")
plt.xlabel("X"); plt.ylabel("Probabilidad")
plt.show()

# Curva ROC
RocCurveDisplay.from_predictions(yc_test, y_proba)
plt.title("Curva ROC")
plt.show()


In [None]:
from sklearn.metrics import (
    classification_report, confusion_matrix, ConfusionMatrixDisplay,
    PrecisionRecallDisplay, average_precision_score
)

# --- Reporte de clasificaci√≥n ---
print("\n=== Classification Report ===")
print(classification_report(yc_test, y_pred, digits=3))

# --- Matriz de confusi√≥n (texto + plot) ---
cm = confusion_matrix(yc_test, y_pred)
print("\n=== Confusion Matrix ===\n", cm)

disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot()
plt.title("Matriz de Confusi√≥n")
plt.show()

# --- Curva Precision-Recall + Average Precision ---
ap = average_precision_score(yc_test, y_proba)
PrecisionRecallDisplay.from_predictions(yc_test, y_proba)
plt.title(f"Curva Precision-Recall (AP = {ap:.3f})")
plt.show()


## 3. Comparaci√≥n Resumida

| Aspecto          | Regresi√≥n Lineal    | Regresi√≥n Log√≠stica        |
| ---------------- | ------------------- | -------------------------- |
| Tipo de salida   | Variable continua   | Probabilidad (0 a 1)       |
| Funci√≥n          | Recta               | Sigmoide (S)               |
| Problema t√≠pico  | Predicci√≥n num√©rica | Clasificaci√≥n binaria      |
| Ejemplo          | Precio de casas     | ¬øCliente abandona? (S√≠/No) |
| M√©tricas comunes | RMSE, R¬≤            | AUC, F1, Accuracy          |


