# üß† Entrenamiento del Modelo XGBoost

Este notebook entrena un modelo **XGBoost Regressor** usando el dataset procesado del proyecto.

Incluye:

- Carga robusta del dataset  
- Divisi√≥n Train/Test  
- Entrenamiento del modelo  
- Evaluaci√≥n (MAE, RMSE, R¬≤)  
- Guardado del modelo entrenado y preprocesador  

---


In [None]:
from xgboost import XGBRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
import numpy as np
import joblib
import pandas as pd
import os


## üìÇ 1. Carga del dataset procesado

Se construyen rutas robustas basadas en el directorio ra√≠z del proyecto.


In [None]:
# Detectar ra√≠z del proyecto
current_dir = os.getcwd()
repo_root = os.path.dirname(current_dir)

# Rutas
processed_dir = os.path.join(repo_root, "data", "processed")
model_dir = os.path.join(repo_root, "models")

data_path = os.path.join(processed_dir, "car_sales_processed.csv")
preprocessor_path = os.path.join(processed_dir, "preprocessor.pkl")

# Cargar dataset
df = pd.read_csv(data_path)

TARGET = "Price"
X = df.drop(columns=[TARGET])
y = df[TARGET]

df.head()


## ‚úÇÔ∏è 2. Divisi√≥n Train/Test


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

X_train.shape, X_test.shape


## ‚öôÔ∏è 3. Definici√≥n del modelo XGBoost


In [None]:
model = XGBRegressor(
    n_estimators=500,
    learning_rate=0.05,
    max_depth=6,
    subsample=0.8,
    colsample_bytree=0.8,
    random_state=42,
)


## üèãÔ∏è 4. Entrenamiento del modelo


In [None]:
model.fit(X_train, y_train)


## üìä 5. Evaluaci√≥n del modelo

Se miden las m√©tricas principales: MAE, RMSE y R¬≤.


In [None]:
y_pred = model.predict(X_test)

mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = np.sqrt(mse)
r2 = r2_score(y_test, y_pred)

{
    "MAE": round(mae, 2),
    "RMSE": round(rmse, 2),
    "R2": round(r2, 4),
}


## üíæ 6. Guardado del modelo y preprocesador


In [None]:
model_output_path = os.path.join(model_dir, "xgb_model.pkl")
preprocessor_output_path = os.path.join(model_dir, "preprocessor.pkl")

joblib.dump(model, model_output_path)

# Copiamos el preprocesador procesado original
preprocessor = joblib.load(preprocessor_path)
joblib.dump(preprocessor, preprocessor_output_path)

model_output_path, preprocessor_output_path
