# 🍷 Random Forest en Regresión y Clasificación

En este notebook vamos a aplicar **Random Forest** para:

- Predecir la calidad del vino (problema de **regresión**).
- Clasificar la calidad del vino en clases (problema de **clasificación**).

Dataset: `wine-quality.csv`

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import mean_squared_error, r2_score, classification_report

## 🔹 1. Cargar los datos

In [None]:
df = pd.read_csv("../data/wine-quality.csv")
df.head()

In [None]:
# Revisar valores nulos
df.isnull().sum()

## 🔹 2. Exploración rápida de los datos

In [None]:
sns.countplot(x="quality", data=df)
plt.title("Distribución de la variable objetivo (quality)")
plt.show()

## 🔹 3. Random Forest - Regresión

In [None]:
# Variables predictoras y objetivo
X = df.drop("quality", axis=1)
y_reg = df["quality"]

# División train/test
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X, y_reg, test_size=0.2, random_state=42
)

# Entrenar modelo
regressor = RandomForestRegressor(n_estimators=100, random_state=42)
regressor.fit(X_train_reg, y_train_reg)

# Predicciones
y_pred_reg = regressor.predict(X_test_reg)

# Métricas
mse = mean_squared_error(y_test_reg, y_pred_reg)
r2 = r2_score(y_test_reg, y_pred_reg)

print(f"MSE: {mse:.3f}")
print(f"R²: {r2:.3f}")

## 🔹 4. Random Forest - Clasificación

In [None]:
y_clf = df["quality"]

X_train_clf, X_test_clf, y_train_clf, y_test_clf = train_test_split(
    X, y_clf, test_size=0.2, random_state=42, stratify=y_clf
)

classifier = RandomForestClassifier(n_estimators=100, random_state=42)
classifier.fit(X_train_clf, y_train_clf)

y_pred_clf = classifier.predict(X_test_clf)

print(classification_report(y_test_clf, y_pred_clf))

## 🔹 5. Importancia de las variables

In [None]:
importances = classifier.feature_importances_
features = X.columns

feat_importances = pd.Series(importances, index=features)
feat_importances.nlargest(10).plot(kind="barh")
plt.title("Top 10 características más importantes")
plt.show()