Objetivo: Considernado el dataset UCI_Credit_Card.csv, se incorpora al estudio de los modelos XGBoost. Se evaluan métricas sobre su performance.

In [1]:
# Importa las librerias necesarias
import pandas as pd
import os
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [2]:
# Carga la ruta al archivo de datos
ruta = os.path.dirname((os.path.abspath('Ensamble')))
ruta_datos = os.path.join(ruta, "datasets/UCI_Credit_Card.csv")

# Lectura del archivo a DataFrame
credit = pd.read_csv(ruta_datos)

In [3]:
credit.drop(columns=['ID', 'LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6','PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6'], inplace=True)

In [4]:
# Busca informacion del dataset
credit.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 2 columns):
 #   Column                      Non-Null Count  Dtype
---  ------                      --------------  -----
 0   PAY_0                       30000 non-null  int64
 1   default.payment.next.month  30000 non-null  int64
dtypes: int64(2)
memory usage: 468.9 KB


In [5]:
# Muestra los primeros registros
credit.head()

Unnamed: 0,PAY_0,default.payment.next.month
0,2,1
1,-1,1
2,0,0
3,0,0
4,-1,0


In [6]:
# Muestra metricas
credit.describe()

Unnamed: 0,PAY_0,default.payment.next.month
count,30000.0,30000.0
mean,-0.0167,0.2212
std,1.123802,0.415062
min,-2.0,0.0
25%,-1.0,0.0
50%,0.0,0.0
75%,0.0,0.0
max,8.0,1.0


Observaciones: A partir del análisis y de la ingeniería de carcterísticas previa, se determinó que la característica relevante es 'PAY_0'.

In [7]:
# Importa las clases necesarias para aplicar modelos predictivos
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_absolute_error, root_mean_squared_error

In [8]:
# Separa las predictoras de la variable a predecir
X = credit.loc[:, ['PAY_0']].values
y = credit.loc[:, "default.payment.next.month"].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0, stratify=y)

In [9]:
# Estandariza las caracteristicas de los sets de entrenamiento y de prueba
sc_X = StandardScaler()

X_train = sc_X.fit_transform(X_train) 
X_test = sc_X.transform(X_test)

In [10]:
# Importa los modulos necesarios
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error

In [11]:
# Inicializa y entrena el modelo XGBoost
model = XGBRegressor()
model.fit(X_train, y_train)

In [12]:
# Realiza predicciones en los conjuntos de entrenamiento y prueba
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)

In [13]:
# Calcula metricas para el conjunto de entrenamiento
mae_train = mean_absolute_error(y_train, y_train_pred)
mse_train = mean_squared_error(y_train, y_train_pred)
rmse_train = np.sqrt(mse_train)

# Calcula metricas para el conjunto de prueba
mae_test = mean_absolute_error(y_test, y_test_pred)
mse_test = mean_squared_error(y_test, y_test_pred)
rmse_test = np.sqrt(mse_test)

In [14]:
# Muestra las metricas
print(f"Conjunto de Entrenamiento - MAE: {mae_train}, MSE: {mse_train}, RMSE: {rmse_train}")
print(f"Conjunto de Prueba - MAE: {mae_test}, MSE: {mse_test}, RMSE: {rmse_test}")

Conjunto de Entrenamiento - MAE: 0.2828633263806502, MSE: 0.14143026632363537, RMSE: 0.3760721557409367
Conjunto de Prueba - MAE: 0.2822741461177667, MSE: 0.1415778439317064, RMSE: 0.3762683137492531


| Métrica | Train | Test |
|---------|-------|------|
| MAE     | 0.28  | 0.28 |
| MSE     | 0.14  | 0.14 |
| RMSE    | 0.38  | 0.38 |


Aplicando el mismo método de ensamble, pero mostrando las métricas para cada miembro del mismo:

In [15]:
# Inicializa y entrena el modelo XGBoost
model = XGBRegressor(objective='reg:squarederror', n_estimators=100, eval_metric=["mae", "rmse"])

# Almacena las metricas en cada iteracion
eval_set = [(X_train, y_train), (X_test, y_test)]

# Entrena el modelo con evaluacion en cada iteracion
model.fit(X_train, y_train, eval_set=eval_set, verbose=True)

# Extrae las metricas en cada iteracion
results = model.evals_result()

# Muestra las metricas por cada iteracion
iterations = len(results['validation_0']['mae'])
for i in range(iterations):
    train_mae = results['validation_0']['mae'][i]
    test_mae = results['validation_1']['mae'][i]
    train_rmse = results['validation_0']['rmse'][i]
    test_rmse = results['validation_1']['rmse'][i]
    train_mse = train_rmse ** 2
    test_mse = test_rmse ** 2

    print(f"Iteración {i + 1}")
    print(f"Conjunto de Entrenamiento - MAE: {train_mae}, MSE: {train_mse}, RMSE: {train_rmse}")
    print(f"Conjunto de Prueba - MAE: {test_mae}, MSE: {test_mse}, RMSE: {test_rmse}")
    print("=" * 50)

[0]	validation_0-mae:0.32606	validation_0-rmse:0.39568	validation_1-mae:0.32588	validation_1-rmse:0.39597
[1]	validation_0-mae:0.31311	validation_0-rmse:0.38582	validation_1-mae:0.31281	validation_1-rmse:0.38621
[2]	validation_0-mae:0.30404	validation_0-rmse:0.38089	validation_1-mae:0.30365	validation_1-rmse:0.38128
[3]	validation_0-mae:0.29769	validation_0-rmse:0.37844	validation_1-mae:0.29724	validation_1-rmse:0.37881
[4]	validation_0-mae:0.29325	validation_0-rmse:0.37724	validation_1-mae:0.29275	validation_1-rmse:0.37757
[5]	validation_0-mae:0.29013	validation_0-rmse:0.37665	validation_1-mae:0.28961	validation_1-rmse:0.37694
[6]	validation_0-mae:0.28795	validation_0-rmse:0.37635	validation_1-mae:0.28741	validation_1-rmse:0.37663
[7]	validation_0-mae:0.28643	validation_0-rmse:0.37621	validation_1-mae:0.28587	validation_1-rmse:0.37646
[8]	validation_0-mae:0.28536	validation_0-rmse:0.37614	validation_1-mae:0.28479	validation_1-rmse:0.37638
[9]	validation_0-mae:0.28461	validation_0-rmse

Observaciones: Cada iteración permite que el valor de las métricas de error convergan a un determinado valor que ya se alcanza aproximadamente en las primeras 4 ó 5 iteraciones.