# Evaluacion de los modelos de predicción.

<a id="0"></a> <br>
### Índice:
1. [Introdución](#1)  
2. [Evaluación de modelo de regresion lineal pólinomica](#2)  
3. [Evaluación de modelo de arbol de decisión](#3)  
4. [Análisis de métricas](#4)

<a id="1"></a> <br>
## Introducción

Aquí vamos a evaluar los errores que podemos tener en este problema de regresion.  
Para ello vamos a evaluar estas caracteristicas:

-   MAE (Mean Absolute Error): El MAE es la media del valor absoluto de las diferencias entre las predicciones y los valores reales. Cuanto menor   sea el MAE, mejor será el modelo en términos de precisión.

-   MAPE (Mean Absolute Percentage Error): El MAPE es la media del porcentaje absoluto de las diferencias entre las predicciones y los valores reales, en relación con los valores reales. Es una medida relativa que proporciona una idea del error porcentual promedio. Al igual que el MAE, cuanto menor sea el MAPE, mejor será el modelo.

-   MSE (Mean Squared Error): El MSE es la media de los errores al cuadrado entre las predicciones y los valores reales. El MSE asigna un mayor peso a los errores más grandes, lo que implica que los valores extremos tienen un impacto más significativo en la métrica. Cuanto menor sea el MSE, mejor será el modelo en términos de precisión.

-   RMSE (Root Mean Squared Error): El RMSE es la raíz cuadrada del MSE. Se utiliza para tener una medida del error en la misma unidad que la variable objetivo, lo que lo hace más fácil de interpretar. Al igual que el MSE, un valor de RMSE más bajo indica un modelo más preciso.

-   R2 score (R-squared): El R2 score, también conocido como coeficiente de determinación, indica qué tan bien se ajustan los valores predichos por el modelo a los valores reales. Varía entre 0 y 1, donde 1 indica un ajuste perfecto del modelo. Un R2 score más alto sugiere una mejor capacidad de predicción del modelo.

Primero vamos a crear un data frame vacio para ir recogiendo toda la información que obtenemos de las evaluaciones para despues hacer un análisis de todos ellos y ver cual es el mejor modelo.

In [2]:
# Cargamos librerias
import pickle
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_absolute_percentage_error, mean_squared_error, r2_score
from sklearn.preprocessing import PolynomialFeatures
import yaml


In [3]:
df_conc = pd.DataFrame({"Métricas": ["MAE","MAPE","MSE","RMSE", "R2Score"]}).set_index("Métricas")

<a id="2"></a> <br>
### 1. Evaluación del modelo de regresión lineal con polinomio.

In [4]:

# Cargamos el modelo entrenado y sus caracteristicas para arreglar con conversion polinómica.

with open('../models/modelo_lineal/model_config_lin.yaml', 'r') as file:
    model_config = yaml.safe_load(file)

model_path = '../models/modelo_lineal/trained_pol_3.pkl'

with open(model_path, 'rb') as f:
        lin_reg = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Conversión a polinomica

poly_feats = PolynomialFeatures(degree = model_config['degree'])
poly_feats.fit(X_train)
X_test_poly = poly_feats.transform(X_test)

# Realizar las predicciones

predictions = lin_reg.predict(X_test_poly)

# Calcular las métricas de evaluación

mae_lin = mean_absolute_error(y_test, predictions)
mape_lin = mean_absolute_percentage_error(y_test, predictions)
mse_lin = mean_squared_error(y_test, predictions)
rmse_lin = mean_squared_error(y_test, predictions, squared=False)
r2_lin = r2_score(y_test, predictions)

# Imprimir las métricas

print("Mean Absolute Error (MAE):", round(mae_lin,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_lin,4))
print("Mean Squared Error (MSE):", round(mse_lin,4))
print("Root Mean Squared Error (RMSE):", round(rmse_lin,4))
print("R-squared (R2) Score:", round(r2_lin,4))

list_lin = [round(mae_lin,4),round(mape_lin,4),round(mse_lin,4),round(rmse_lin,4),round(r2_lin,4)]


Mean Absolute Error (MAE): 0.3275
Mean Absolute Percentage Error (MAPE): 0.0535
Mean Squared Error (MSE): 0.2633
Root Mean Squared Error (RMSE): 0.5131
R-squared (R2) Score: 0.6987


<a id="3"></a> <br>
### Evaluación del modelo de arbol de decisión

In [5]:
# Carga de modelo
model_path = '../models/arbol_decision/dtr_gs.pkl'

with open(model_path, 'rb') as f:
        dtr_gs = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Obtener el mejor modelo entrenado

y_pred_dtr = dtr_gs.best_estimator_.predict(X_test)

mae_dtr = mean_absolute_error(y_test, y_pred_dtr)
mape_dtr = mean_absolute_percentage_error(y_test, y_pred_dtr)
mse_dtr = mean_squared_error(y_test, y_pred_dtr)
rmse_dtr = mean_squared_error(y_test, y_pred_dtr, squared=False)
r2_dtr = r2_score(y_test, y_pred_dtr)
print("Mean Absolute Error (MAE):", round(mae_dtr,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_dtr,4))
print("Mean Squared Error (MSE):", round(mse_dtr,4))
print("Root Mean Squared Error (RMSE):", round(rmse_dtr,4))
print("R-squared (R2) Score:", round(r2_dtr,4))

list_dtr = [round(mae_dtr,4),round(mape_dtr,4),round(mse_dtr,4),round(rmse_dtr,4),round(r2_dtr,4)]


Mean Absolute Error (MAE): 0.2205
Mean Absolute Percentage Error (MAPE): 0.0348
Mean Squared Error (MSE): 0.1225
Root Mean Squared Error (RMSE): 0.35
R-squared (R2) Score: 0.8598


In [6]:
# Modelo Random Forest

# Carga de modelo
model_path = '../models/random_forest/rnd_ft.pkl'

with open(model_path, 'rb') as f:
        loaded_model_rdm_fs = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Obtener el mejor modelo entrenado

y_pred_rdm_fs = loaded_model_rdm_fs.best_estimator_.predict(X_test)

mae_rdm_fs = mean_absolute_error(y_test, y_pred_rdm_fs)
mape_rdm_fs = mean_absolute_percentage_error(y_test, y_pred_rdm_fs)
mse_rdm_fs = mean_squared_error(y_test, y_pred_rdm_fs)
rmse_rdm_fs = mean_squared_error(y_test, y_pred_rdm_fs, squared=False)
r2_rdm_fs = r2_score(y_test, y_pred_rdm_fs)
print("Métricas del modelo arbol de decisión","\n")
print("Mean Absolute Error (MAE):", round(mae_rdm_fs,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_rdm_fs,4))
print("Mean Squared Error (MSE):", round(mse_rdm_fs,4))
print("Root Mean Squared Error (RMSE):", round(rmse_rdm_fs,4))
print("R-squared (R2) Score:", round(r2_rdm_fs,4),"\n")

list_rdm_fs = [round(mae_rdm_fs,4),round(mape_rdm_fs,4),round(mse_rdm_fs,4),round(rmse_rdm_fs,4),round(r2_rdm_fs,4)]

Métricas del modelo arbol de decisión 

Mean Absolute Error (MAE): 0.1495
Mean Absolute Percentage Error (MAPE): 0.0231
Mean Squared Error (MSE): 0.0651
Root Mean Squared Error (RMSE): 0.2552
R-squared (R2) Score: 0.9254 



In [7]:
# Modelo Ada Boost

# Carga de modelo
model_path = '../models/ada_gs/ada_gs.pkl'

with open(model_path, 'rb') as f:
        loaded_model_ada_gs = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Obtener el mejor modelo entrenado

y_pred_ada_gs = loaded_model_ada_gs.best_estimator_.predict(X_test)

mae_ada_gs = mean_absolute_error(y_test, y_pred_ada_gs)
mape_ada_gs = mean_absolute_percentage_error(y_test, y_pred_ada_gs)
mse_ada_gs = mean_squared_error(y_test, y_pred_ada_gs)
rmse_ada_gs = mean_squared_error(y_test, y_pred_ada_gs, squared=False)
r2_ada_gs = r2_score(y_test, y_pred_ada_gs)
print("Métricas del modelo arbol de decisión","\n")
print("Mean Absolute Error (MAE):", round(mae_ada_gs,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_ada_gs,4))
print("Mean Squared Error (MSE):", round(mse_ada_gs,4))
print("Root Mean Squared Error (RMSE):", round(rmse_ada_gs,4))
print("R-squared (R2) Score:", round(r2_ada_gs,4),"\n")

list_ada_gs = [round(mae_ada_gs,4),round(mape_ada_gs,4),round(mse_ada_gs,4),round(rmse_ada_gs,4),round(r2_ada_gs,4)]

Métricas del modelo arbol de decisión 

Mean Absolute Error (MAE): 0.377
Mean Absolute Percentage Error (MAPE): 0.0597
Mean Squared Error (MSE): 0.2442
Root Mean Squared Error (RMSE): 0.4942
R-squared (R2) Score: 0.7205 



In [8]:
# Modelo Gradient Boosting Regressor

# Carga de modelo
model_path = '../models/gbrt/gbrt.pkl'

with open(model_path, 'rb') as f:
        loaded_model_gbrt = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Obtener el mejor modelo entrenado

y_pred_gbrt = loaded_model_gbrt.best_estimator_.predict(X_test)

mae_gbrt = mean_absolute_error(y_test, y_pred_gbrt)
mape_gbrt = mean_absolute_percentage_error(y_test, y_pred_gbrt)
mse_gbrt = mean_squared_error(y_test, y_pred_gbrt)
rmse_gbrt = mean_squared_error(y_test, y_pred_gbrt, squared=False)
r2_gbrt = r2_score(y_test, y_pred_gbrt)
print("Métricas del modelo Gradient Boosting Regressor","\n")
print("Mean Absolute Error (MAE):", round(mae_gbrt,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_gbrt,4))
print("Mean Squared Error (MSE):", round(mse_gbrt,4))
print("Root Mean Squared Error (RMSE):", round(rmse_gbrt,4))
print("R-squared (R2) Score:", round(r2_gbrt,4),"\n")

list_gbrt = [round(mae_gbrt,4),round(mape_gbrt,4),round(mse_gbrt,4),round(rmse_gbrt,4),round(r2_gbrt,4)]

Métricas del modelo Gradient Boosting Regressor 

Mean Absolute Error (MAE): 0.1373
Mean Absolute Percentage Error (MAPE): 0.021
Mean Squared Error (MSE): 0.0608
Root Mean Squared Error (RMSE): 0.2465
R-squared (R2) Score: 0.9304 



In [10]:
# Modelo PCA Random Forest Regressor

# Carga de modelo

model_path = '../models/pca_rf/pca_rf.pkl'

with open(model_path, 'rb') as f:
        loaded_model_pca_rf = pickle.load(f)

# Cargamos data test

df_test = pd.read_csv('../data/test/test.csv')
df_train = pd.read_csv('../data/train/train.csv')

# Obtener las características (X_test) y las etiquetas (y_test)

X_test = df_test.drop('Rating Average', axis=1)
y_test = df_test['Rating Average']

X_train = df_train.drop('Rating Average', axis=1)
y_train = df_train['Rating Average']

# Obtener el mejor modelo entrenado

y_pred_pca_rf = loaded_model_pca_rf.best_estimator_.predict(X_test)

mae_pca_rf = mean_absolute_error(y_test, y_pred_pca_rf)
mape_pca_rf = mean_absolute_percentage_error(y_test, y_pred_pca_rf)
mse_pca_rf = mean_squared_error(y_test, y_pred_pca_rf)
rmse_pca_rf = mean_squared_error(y_test, y_pred_pca_rf, squared=False)
r2_pca_rf = r2_score(y_test, y_pred_pca_rf)
print("Métricas del modelo Gradient Boosting Regressor","\n")
print("Mean Absolute Error (MAE):", round(mae_pca_rf,4))
print("Mean Absolute Percentage Error (MAPE):", round(mape_pca_rf,4))
print("Mean Squared Error (MSE):", round(mse_pca_rf,4))
print("Root Mean Squared Error (RMSE):", round(rmse_pca_rf,4))
print("R-squared (R2) Score:", round(r2_pca_rf,4),"\n")

list_pca_rf = [round(mae_pca_rf,4),round(mape_pca_rf,4),round(mse_pca_rf,4),round(rmse_pca_rf,4),round(r2_pca_rf,4)]


Métricas del modelo Gradient Boosting Regressor 

Mean Absolute Error (MAE): 0.2486
Mean Absolute Percentage Error (MAPE): 0.0387
Mean Squared Error (MSE): 0.1375
Root Mean Squared Error (RMSE): 0.3707
R-squared (R2) Score: 0.8427 



<a id="4"></a> <br>
### Análisis de métricas

Todas las metricas para evaluarlas

In [11]:
df_conc["Lineal Regression"] = list_lin
df_conc["Decision Tree Regressor"] = list_dtr
df_conc["Random Forest"] = list_rdm_fs
df_conc["Ada Boost Regressor"] = list_ada_gs
df_conc["Gradient Boosting Regressor"] = list_gbrt
df_conc["PCA con Random Forest Regressor"] = list_pca_rf

In [12]:
df_conc

Unnamed: 0_level_0,Lineal Regression,Decision Tree Regressor,Random Forest,Ada Boost Regressor,Gradient Boosting Regressor,PCA con Random Forest Regressor
Métricas,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
MAE,0.3275,0.2205,0.1495,0.377,0.1373,0.2486
MAPE,0.0535,0.0348,0.0231,0.0597,0.021,0.0387
MSE,0.2633,0.1225,0.0651,0.2442,0.0608,0.1375
RMSE,0.5131,0.35,0.2552,0.4942,0.2465,0.3707
R2Score,0.6987,0.8598,0.9254,0.7205,0.9304,0.8427
