# **Ejercicio 3: Modelos de Regresión**

**Modelos de Regresión.** Considere el conjunto de datos **Wind Speed**. Implemente la versión de regresión de cada uno de los modelos estudiados en clases, para predecir la velocidad del viento horaria (**VENTO, VELOCIDADE HORARIA (m/s)**) en el conjunto de datos suministrado. Construya una tabla de error con las métricas usuales de regresión: **MAPE, RMSE, R2** (ver **Table 2**). Además, agregue pruebas de independencia y normalidad para los residuos: **Ljung-Box p-value** y **Jarque-Bera p-value**. Realice particiones de entrenamiento, validación y prueba, basándose en lo descrito en la **Figura 2**. Estas particiones deben seguir la tendencia de la velocidad del viento. Genere una figura que represente la velocidad del viento y su predicción. Utilice las métricas **MAPE, RMSE, R2** en la fase de validación para seleccionar la mejor predicción. Identifique cuál de las métricas es la más adecuada. Use también la optimización bayesiana para seleccionar los mejores hiperparámetros del modelo de regresión óptimo. 

El pliegue de validación en cada partición debe estar siempre ubicado en el porcentaje final de cada partición, ya que el tiempo es fundamental en dichas predicciones. No tiene sentido predecir el pasado conocido con el futuro. Entre los períodos diarios **T = 7, 14, 21, 28**, indique cuál corresponde a la mejor ventana de predicción para el entrenamiento. Tenga en cuenta que **TimeSeriesSplit** no aplica en este problema, por lo que no debe utilizarla, dado que dichos pliegues no corresponden a los solicitados en este ejercicio. Defina una función para construir los pliegues para este ejercicio.

| Modelo                 | MAPE | RMSE | R²   | Ljung-Box p-value | Jarque-Bera p-value |
|-----------------------|------|------|------|--------------------|----------------------|
| K-NN                  | ...  | ...  | ...  | ...                | ...                  |
| Linear Regression      | ...  | ...  | ...  | ...                | ...                  |
| Ridge Regression       | ...  | ...  | ...  | ...                | ...                  |
| Lasso Regression       | ...  | ...  | ...  | ...                | ...                  |

**Cuadro 2:** Modelo de regresión para **mean sale price**.


## **Librerías y modulos necesarios**

In [233]:
import warnings
import numpy as np
from scipy.stats import jarque_bera
from statsmodels.stats.diagnostic import acorr_ljungbox
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.neighbors import KNeighborsRegressor
from sklearn.metrics import mean_absolute_percentage_error, mean_squared_error, r2_score
from skopt import gp_minimize
from skopt.space import Integer, Categorical, Real
from skopt.utils import use_named_args
import pandas as pd
warnings.filterwarnings("ignore")

# **Modelo de regresión: Wind Speed**

## **Datos**

Después de haber realizado un análisis descriptivo de la base de datos original (detallado [Wind Speed - Ejercicio 1](https://kmarcela11.github.io/Parcial2_MachineLearning/Ejercicio1.html#wind-speed)
), finalmente se ha obtenido una base de datos preparada para realizar el modelo, la cual ha sido almacenada en un archivo CSV.

### **Diferencias entre la base de datos original y la del modelo**

- La nueva base de datos presenta nombres de columnas más legibles y mejor organizados.
- Se han eliminado variables poco significativas para el modelo, como:
  - La variable de **hora**.
- Se han excluido variables altamente correlacionadas, tales como:
  - `hum_horaria`
  - `pres_atmo`


In [234]:
data = pd.read_csv('C:\\Users\\kamac\\OneDrive\\Desktop\\MachineLearningUN\\data_modelo.csv')

In [235]:
data

Unnamed: 0,dir_viento,vel_viento,hum_max,hum_min,temp_max,temp_min,precip_total,rafaga_max,pres_max,pres_min
0,0.809017,1.8,69.0,60.0,22.6,20.7,0.0,3.8,888.2,887.7
1,0.965926,2.7,62.0,55.0,24.2,22.5,0.0,4.7,888.4,888.2
2,0.891007,2.0,56.0,50.0,25.5,24.3,0.0,4.9,888.4,888.1
3,0.848048,2.5,52.0,44.0,27.4,25.0,0.0,5.8,888.1,887.4
4,0.224951,2.4,50.0,43.0,27.1,25.5,0.0,5.8,887.4,886.5
...,...,...,...,...,...,...,...,...,...,...
87688,-0.615661,5.6,83.0,78.0,21.8,21.1,0.0,12.3,879.8,879.1
87689,-0.469472,4.9,84.0,79.0,21.7,21.0,0.0,9.9,879.2,878.9
87690,-0.484810,4.5,86.0,82.0,21.2,20.6,0.0,8.9,879.8,879.2
87691,-0.484810,3.2,88.0,85.0,20.6,20.2,0.0,8.0,880.5,879.6


Nótese que la base de datos actual cuenta con **87693** observaciones y **10** columnas

Ahora, se asignan las variables predictoras y la variable objetivo, que en nuestro caso es la velocidad del viento (`vel_viento`).

In [236]:
X = data.drop(columns = ['vel_viento'])
y = data['vel_viento']

Ahora, usaremos un diccionario para almacenar todos los modelos que se van a entrenar.

## **Hiperparametrización: Optimización Bayesiana**

### **Regresión K-NN**

In [237]:
# Paso 1: Definir el espacio de búsqueda
knn_space = [
    Integer(1, 50, name = 'n_neighbors'),  # Número de vecinos entre 1 y 50
    Categorical(['uniform', 'distance'], name = 'weights')  # Opciones de pesos
]
# Paso 2: Definir la función objetivo para KNN
def knn_objective(params):
    n_neighbors = params[0] 
    weights = params[1]  
    model = KNeighborsRegressor(n_neighbors = n_neighbors, weights = weights)
    model.fit(X, y)
    y_pred = model.predict(X)
    rmse = np.sqrt(mean_squared_error(y, y_pred))
    return rmse

# Paso 3: Ejecutar la optimización bayesiana
knn_result = gp_minimize(
    knn_objective,  # La función objetivo
    knn_space,  # El espacio de búsqueda
    n_calls = 30,  # Número de evaluaciones
    random_state = 11
)
# Paso 4: Mostrar los mejores hiperparámetros encontrados
print("Mejores hiperparámetros:", knn_result.x)
print("Mejor valor de RMSE:", knn_result.fun)


Mejores hiperparámetros: [4, 'distance']
Mejor valor de RMSE: 0.0


### **Regresión lineal**

### **Regresión Ridge**

In [238]:
# Paso 1: Definir el espacio de búsqueda
ridge_space = [Real(0.01, 10, name = 'alpha')]

# Paso 2: Definir la función objetivo para Ridge
def ridge_objective(params):
    alpha = params[0]  
    model = Ridge(alpha = alpha)
    model.fit(X, y)
    y_pred = model.predict(X)
    rmse = np.sqrt(mean_squared_error(y, y_pred))
    return rmse

# Paso 3: Ejecutar la optimización bayesiana para Ridge
result_ridge = gp_minimize(
    ridge_objective,  # La función objetivo
    ridge_space,  # El espacio de búsqueda
    n_calls = 30,  # Número de evaluaciones
    random_state = 11
)

# Paso 4: Mostrar los mejores hiperparámetros encontrados
print("Mejores hiperparámetros para Ridge:", result_ridge.x)
print("Mejor valor de RMSE:", result_ridge.fun)


Mejores hiperparámetros para Ridge: [0.010133543758906174]
Mejor valor de RMSE: 0.6082881656906584


### **Regresión Lasso**

In [239]:
# Paso 1: Definir el espacio de búsqueda
lasso_space = [Real(0.01, 10, name = 'alpha')]

# Paso 2: Definir la función objetivo para Lasso
def lasso_objective(params):
    alpha = params[0] 
    model = Lasso(alpha=alpha)
    model.fit(X, y)
    y_pred = model.predict(X)
    rmse = np.sqrt(mean_squared_error(y, y_pred))
    return rmse

# Paso 3: Ejecutar la optimización bayesiana para Lasso
result_lasso = gp_minimize(
    lasso_objective,  # La función objetivo
    lasso_space,  # El espacio de búsqueda
    n_calls = 30,  # Número de evaluaciones
    random_state=0
)

# Paso 4: Mostrar los mejores hiperparámetros encontrados
print("Mejores hiperparámetros para Lasso:", result_lasso.x)
print("Mejor valor de RMSE:", result_lasso.fun)


Mejores hiperparámetros para Lasso: [0.01]
Mejor valor de RMSE: 0.6091278536982275


Un valor de alpha de 0.010222 en Ridge indica que el modelo necesita muy poca regularización. Esto suele ser una señal de que los datos no son ruidosos y las características son importantes para predecir el objetivo. Sin embargo, también es esencial verificar el rendimiento en el conjunto de prueba para confirmar que el modelo generaliza bien y no está sobreajustando.

En resumen, un alpha bajo como 0.01 en Lasso indica que el modelo necesita muy poca regularización, lo que sugiere que los datos son buenos, las relaciones entre las características y el objetivo son claras, y no hay mucho riesgo de sobreajuste. Sin embargo, debes asegurarte de que el rendimiento sea consistente en el conjunto de prueba o validación para confirmar que no hay sobreajuste en los datos de entrenamiento.

## **Modelos con la hiperparametrización**

In [286]:
# Definir los modelos
modelos = {
    'Regresión K-NN': KNeighborsRegressor(n_neighbors = 4, weights = 'distance'),
    'Regresión Lineal': LinearRegression(),
    'Regresión Ridge': Ridge(alpha = 0.010133543758906174),
    'Regresión Lasso': Lasso(alpha = 0.01)
}

In [251]:
resultados = []
def entrenar_y_evaluar(X_train, y_train, X_test, y_test, modelos, k, hora):
    for nombre, modelo in modelos.items(): 
        modelo.fit(X_train, y_train)
        y_pred = modelo.predict(X_test)
        RMSE = mean_squared_error(y_test, y_pred, squared=False)
        MAPE = mean_absolute_percentage_error(y_test, y_pred)
        R2 = r2_score(y_test, y_pred)
        resultados.append({ 
            'Modelo': nombre,
            'Hora': hora,
            'Ventana de predicción': k,
            'MAPE': MAPE,
            'RMSE': RMSE,
            'R^2': R2
        })


In [252]:
periodos = [7, 14, 21, 28]  # Vector que almacena el valor de los períodos díarios a tener en cuenta


def modelo_por_hora(X, y, modelos, hora):
    for k in periodos:  # Ciclo que recorre vector de periodos para que el proceso se realice para cada una de las ventanas de predicción (7, 14, 21, 28)
        train_inicial = 0 
        
        while (train_inicial + (k * 24) + hora) < len(X): 
            
            train_final = train_inicial + (k * 24) 
            
            # Filtrado booleano para X_train y y_train
            entrenamiento_indices = (X.index >= train_inicial) & (X.index < train_final)
            X_train = X[entrenamiento_indices]
            y_train = y[entrenamiento_indices]

            # Filtrado booleano para X_test y y_test
            test_indices = (X.index >= (train_final + hora)) & (X.index < (train_final + hora + 24))
            X_test = X[test_indices]
            y_test = y[test_indices]
            
            entrenar_y_evaluar(X_train, y_train, X_test, y_test, modelos, k, hora)
            train_inicial = train_final  # Actualizar el inicio para la próxima iteración 

### **Modelos individuales por horas (24 horas)**

In [253]:
hora1 = modelo_por_hora(X, y, modelos, 1) 
hora1 = pd.DataFrame(resultados)
hora1_mean = hora1.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora1_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,1,121276900000000.0,0.943734,-0.015716
1,K-NN Regressor,14,1,124253100000000.0,0.872554,0.128152
2,K-NN Regressor,21,1,114222500000000.0,0.841189,0.163101
3,K-NN Regressor,28,1,134688700000000.0,0.802341,0.245252
4,Lasso Regression,7,1,67134340000000.0,0.603769,0.584057
5,Lasso Regression,14,1,68366410000000.0,0.597289,0.582902
6,Lasso Regression,21,1,66136150000000.0,0.605674,0.563441
7,Lasso Regression,28,1,77033810000000.0,0.592457,0.571722
8,Linear Regression,7,1,67614800000000.0,0.673537,0.096999
9,Linear Regression,14,1,68248450000000.0,0.602642,0.571402


In [254]:
hora2 = modelo_por_hora(X, y, modelos, 2)
hora2 = pd.DataFrame(resultados)
hora2_mean = hora2.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora2_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,2,121934200000000.0,0.947041,-0.020686
1,K-NN Regressor,14,2,127894100000000.0,0.874224,0.137212
2,K-NN Regressor,21,2,112477400000000.0,0.839641,0.161173
3,K-NN Regressor,28,2,136392600000000.0,0.80259,0.263748
4,Lasso Regression,7,2,67239350000000.0,0.603293,0.586117
5,Lasso Regression,14,2,70674400000000.0,0.598154,0.586768
6,Lasso Regression,21,2,65990730000000.0,0.605262,0.560894
7,Lasso Regression,28,2,76990900000000.0,0.592933,0.578071
8,Linear Regression,7,2,68228350000000.0,0.673782,0.107158
9,Linear Regression,14,2,70686600000000.0,0.60387,0.57422


In [255]:
hora3 = modelo_por_hora(X, y, modelos, 3)
hora3 = pd.DataFrame(resultados)
hora3_mean = hora3.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora3_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,3,121886800000000.0,0.948277,-0.01798
1,K-NN Regressor,14,3,130319800000000.0,0.875087,0.146059
2,K-NN Regressor,21,3,109889100000000.0,0.841777,0.157703
3,K-NN Regressor,28,3,143610200000000.0,0.804603,0.272623
4,Lasso Regression,7,3,67695160000000.0,0.606227,0.583767
5,Lasso Regression,14,3,73125620000000.0,0.599112,0.591695
6,Lasso Regression,21,3,64646900000000.0,0.608355,0.559233
7,Lasso Regression,28,3,81393570000000.0,0.591637,0.588252
8,Linear Regression,7,3,69130730000000.0,0.677723,0.139853
9,Linear Regression,14,3,72677800000000.0,0.605045,0.578554


In [256]:
hora4 = modelo_por_hora(X, y, modelos, 4)
hora4 = pd.DataFrame(resultados)
hora4_mean = hora4.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora4_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,4,126035100000000.0,0.951306,-0.009563
1,K-NN Regressor,14,4,135586800000000.0,0.875728,0.156866
2,K-NN Regressor,21,4,112268300000000.0,0.842879,0.169246
3,K-NN Regressor,28,4,152322300000000.0,0.805011,0.287179
4,Lasso Regression,7,4,67572840000000.0,0.609485,0.584992
5,Lasso Regression,14,4,72516510000000.0,0.599322,0.599461
6,Lasso Regression,21,4,65341460000000.0,0.616503,0.551002
7,Lasso Regression,28,4,80571700000000.0,0.591386,0.599248
8,Linear Regression,7,4,69447420000000.0,0.682103,0.156629
9,Linear Regression,14,4,71525530000000.0,0.605303,0.586177


In [257]:
hora5 = modelo_por_hora(X, y, modelos, 5)
hora5 = pd.DataFrame(resultados)
hora5_mean = hora5.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora5_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,5,126690200000000.0,0.955201,-0.018099
1,K-NN Regressor,14,5,136959200000000.0,0.876894,0.152561
2,K-NN Regressor,21,5,109608700000000.0,0.843961,0.164957
3,K-NN Regressor,28,5,157163000000000.0,0.804915,0.292399
4,Lasso Regression,7,5,67656610000000.0,0.614329,0.569777
5,Lasso Regression,14,5,73636340000000.0,0.599167,0.601345
6,Lasso Regression,21,5,66086170000000.0,0.618089,0.545182
7,Lasso Regression,28,5,82156590000000.0,0.593208,0.602662
8,Linear Regression,7,5,70290450000000.0,0.68786,0.099099
9,Linear Regression,14,5,72652610000000.0,0.603855,0.589571


In [258]:
hora6 = modelo_por_hora(X, y, modelos, 6)
hora6 = pd.DataFrame(resultados)
hora6_mean = hora6.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora6_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,6,131558900000000.0,0.961862,-0.027896
1,K-NN Regressor,14,6,137963400000000.0,0.88014,0.150215
2,K-NN Regressor,21,6,112205200000000.0,0.847233,0.158145
3,K-NN Regressor,28,6,157463400000000.0,0.800859,0.3053
4,Lasso Regression,7,6,68915800000000.0,0.614478,0.572064
5,Lasso Regression,14,6,72272420000000.0,0.596579,0.608869
6,Lasso Regression,21,6,65529650000000.0,0.618124,0.548455
7,Lasso Regression,28,6,80058610000000.0,0.585605,0.619249
8,Linear Regression,7,6,72026820000000.0,0.68971,0.096982
9,Linear Regression,14,6,71261710000000.0,0.601134,0.596787


In [259]:
hora7 = modelo_por_hora(X, y, modelos, 7)
hora7 = pd.DataFrame(resultados)
hora7_mean = hora7.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora7_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,7,137214100000000.0,0.966095,-0.043145
1,K-NN Regressor,14,7,143112300000000.0,0.882028,0.147069
2,K-NN Regressor,21,7,112898400000000.0,0.846647,0.170969
3,K-NN Regressor,28,7,160720700000000.0,0.802191,0.29954
4,Lasso Regression,7,7,71633020000000.0,0.615363,0.568151
5,Lasso Regression,14,7,75961740000000.0,0.597042,0.609185
6,Lasso Regression,21,7,65754080000000.0,0.619252,0.545595
7,Lasso Regression,28,7,82785540000000.0,0.58604,0.618235
8,Linear Regression,7,7,76006850000000.0,0.691748,0.091118
9,Linear Regression,14,7,74960950000000.0,0.602048,0.595951


In [260]:
hora8 = modelo_por_hora(X, y, modelos, 8)
hora8 = pd.DataFrame(resultados)
hora8_mean = hora8.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora8_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,8,145068800000000.0,0.969981,-0.061259
1,K-NN Regressor,14,8,149299600000000.0,0.886137,0.130438
2,K-NN Regressor,21,8,121714500000000.0,0.850887,0.149754
3,K-NN Regressor,28,8,165141200000000.0,0.804287,0.296143
4,Lasso Regression,7,8,75600550000000.0,0.618185,0.545448
5,Lasso Regression,14,8,79627320000000.0,0.598821,0.604814
6,Lasso Regression,21,8,69182550000000.0,0.627026,0.520015
7,Lasso Regression,28,8,85193590000000.0,0.588156,0.618501
8,Linear Regression,7,8,80550700000000.0,0.695463,0.050164
9,Linear Regression,14,8,78771160000000.0,0.603697,0.59198


In [261]:
hora9 = modelo_por_hora(X, y, modelos, 9)
hora9 = pd.DataFrame(resultados)
hora9_mean = hora9.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora9_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,9,145464800000000.0,0.973517,-0.081194
1,K-NN Regressor,14,9,151048400000000.0,0.89153,0.118211
2,K-NN Regressor,21,9,121858800000000.0,0.857025,0.116744
3,K-NN Regressor,28,9,165701100000000.0,0.808854,0.287799
4,Lasso Regression,7,9,75829520000000.0,0.618848,0.542864
5,Lasso Regression,14,9,79193660000000.0,0.600951,0.612702
6,Lasso Regression,21,9,67961230000000.0,0.626426,0.51653
7,Lasso Regression,28,9,84595380000000.0,0.589583,0.620372
8,Linear Regression,7,9,81180270000000.0,0.699925,0.013978
9,Linear Regression,14,9,78110760000000.0,0.605001,0.602758


In [262]:
hora10 = modelo_por_hora(X, y, modelos, 10)
hora10 = pd.DataFrame(resultados)
hora10_mean = hora10.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora10_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,10,145155100000000.0,0.975755,-0.087377
1,K-NN Regressor,14,10,144756400000000.0,0.894661,0.118485
2,K-NN Regressor,21,10,122261300000000.0,0.861008,0.102938
3,K-NN Regressor,28,10,156018600000000.0,0.808224,0.293168
4,Lasso Regression,7,10,74370830000000.0,0.621382,0.539224
5,Lasso Regression,14,10,75958800000000.0,0.603952,0.613311
6,Lasso Regression,21,10,67234850000000.0,0.62985,0.509029
7,Lasso Regression,28,10,81767670000000.0,0.5935,0.618654
8,Linear Regression,7,10,79682480000000.0,0.702611,0.014051
9,Linear Regression,14,10,74372270000000.0,0.609154,0.601415


In [263]:
hora11 = modelo_por_hora(X, y, modelos, 11)
hora11 = pd.DataFrame(resultados)
hora11_mean = hora11.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora11_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,11,142297400000000.0,0.979605,-0.087642
1,K-NN Regressor,14,11,138802600000000.0,0.901829,0.118864
2,K-NN Regressor,21,11,118928200000000.0,0.861928,0.122948
3,K-NN Regressor,28,11,148055000000000.0,0.812757,0.293278
4,Lasso Regression,7,11,72756890000000.0,0.621759,0.540963
5,Lasso Regression,14,11,74146750000000.0,0.60775,0.609799
6,Lasso Regression,21,11,64005800000000.0,0.627154,0.52492
7,Lasso Regression,28,11,79428400000000.0,0.599263,0.610511
8,Linear Regression,7,11,78098170000000.0,0.703517,-0.001036
9,Linear Regression,14,11,72767010000000.0,0.613858,0.596068


In [264]:
hora12 = modelo_por_hora(X, y, modelos, 12)
hora12 = pd.DataFrame(resultados)
hora12_mean = hora12.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora12_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,12,142233600000000.0,0.982342,-0.084384
1,K-NN Regressor,14,12,134687800000000.0,0.904873,0.12359
2,K-NN Regressor,21,12,123712400000000.0,0.866742,0.126037
3,K-NN Regressor,28,12,132671300000000.0,0.81471,0.297818
4,Lasso Regression,7,12,72912420000000.0,0.626272,0.5267
5,Lasso Regression,14,12,71182380000000.0,0.612494,0.594134
6,Lasso Regression,21,12,68180090000000.0,0.640287,0.507409
7,Lasso Regression,28,12,71466180000000.0,0.596205,0.616471
8,Linear Regression,7,12,78326530000000.0,0.708169,-0.051892
9,Linear Regression,14,12,69633400000000.0,0.618759,0.578007


In [265]:
hora13 = modelo_por_hora(X, y, modelos, 13)
hora13 = pd.DataFrame(resultados)
hora13_mean = hora13.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora13_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,13,138290500000000.0,0.983685,-0.089978
1,K-NN Regressor,14,13,132288200000000.0,0.908784,0.111723
2,K-NN Regressor,21,13,121821600000000.0,0.869269,0.123655
3,K-NN Regressor,28,13,135395500000000.0,0.817973,0.302365
4,Lasso Regression,7,13,71630620000000.0,0.625155,0.523976
5,Lasso Regression,14,13,70617710000000.0,0.612712,0.589672
6,Lasso Regression,21,13,67767320000000.0,0.639705,0.508965
7,Lasso Regression,28,13,73988870000000.0,0.597333,0.621081
8,Linear Regression,7,13,77013900000000.0,0.703535,-0.053644
9,Linear Regression,14,13,69211720000000.0,0.618787,0.573374


In [266]:
hora14 = modelo_por_hora(X, y, modelos, 14)
hora14 = pd.DataFrame(resultados)
hora14_mean = hora14.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora14_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,14,139378800000000.0,0.986811,-0.091791
1,K-NN Regressor,14,14,132660800000000.0,0.915545,0.110736
2,K-NN Regressor,21,14,121474500000000.0,0.865752,0.124448
3,K-NN Regressor,28,14,134387500000000.0,0.827906,0.302065
4,Lasso Regression,7,14,71849860000000.0,0.626093,0.517732
5,Lasso Regression,14,14,70784470000000.0,0.616636,0.587986
6,Lasso Regression,21,14,67245610000000.0,0.635379,0.513543
7,Lasso Regression,28,14,73017510000000.0,0.600688,0.621572
8,Linear Regression,7,14,77143850000000.0,0.696285,0.068286
9,Linear Regression,14,14,69419220000000.0,0.622957,0.571752


In [269]:
hora15 = modelo_por_hora(X, y, modelos, 15)
hora15 = pd.DataFrame(resultados)
hora15_mean = hora15.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora15_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,15,140074600000000.0,0.988414,-0.093064
1,K-NN Regressor,14,15,134642900000000.0,0.921126,0.097177
2,K-NN Regressor,21,15,125452700000000.0,0.863296,0.127174
3,K-NN Regressor,28,15,133830300000000.0,0.830816,0.297875
4,Lasso Regression,7,15,72182740000000.0,0.62718,0.516771
5,Lasso Regression,14,15,72523020000000.0,0.618142,0.58331
6,Lasso Regression,21,15,71485160000000.0,0.633482,0.514223
7,Lasso Regression,28,15,76626390000000.0,0.603053,0.618039
8,Linear Regression,7,15,78124430000000.0,0.6943,0.142528
9,Linear Regression,14,15,71901860000000.0,0.624086,0.568919


In [268]:
hora16 = modelo_por_hora(X, y, modelos, 16)
hora16 = pd.DataFrame(resultados)
hora16_mean = hora16.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora16_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,15,160327700000000.0,1.000167,-0.087113
1,K-NN Regressor,7,16,140272900000000.0,0.990186,-0.100703
2,K-NN Regressor,14,16,133881700000000.0,0.920774,0.095169
3,K-NN Regressor,21,16,130662000000000.0,0.866723,0.121657
4,K-NN Regressor,28,16,134286200000000.0,0.830442,0.299379
5,Lasso Regression,7,15,81964510000000.0,0.630331,0.503385
6,Lasso Regression,7,16,71658910000000.0,0.628749,0.515909
7,Lasso Regression,14,16,72267750000000.0,0.61843,0.585556
8,Lasso Regression,21,16,74998380000000.0,0.633322,0.52069
9,Lasso Regression,28,16,77445470000000.0,0.605091,0.619744


In [270]:
hora17 = modelo_por_hora(X, y, modelos, 17)
hora17 = pd.DataFrame(resultados)
hora17_mean = hora17.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora17_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,17,141835500000000.0,0.992145,-0.090633
1,K-NN Regressor,14,17,133982400000000.0,0.93011,0.094392
2,K-NN Regressor,21,17,129744300000000.0,0.871436,0.130616
3,K-NN Regressor,28,17,133541200000000.0,0.836227,0.299626
4,Lasso Regression,7,17,71230590000000.0,0.628158,0.523139
5,Lasso Regression,14,17,71303110000000.0,0.617044,0.59305
6,Lasso Regression,21,17,75819340000000.0,0.631762,0.533704
7,Lasso Regression,28,17,76590020000000.0,0.60086,0.630398
8,Linear Regression,7,17,81444140000000.0,0.680632,0.306037
9,Linear Regression,14,17,70906140000000.0,0.624224,0.577174


In [271]:
hora18 = modelo_por_hora(X, y, modelos, 18)
hora18 = pd.DataFrame(resultados)
hora18_mean = hora18.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora18_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,18,146662500000000.0,0.995171,-0.087958
1,K-NN Regressor,14,18,137688900000000.0,0.933889,0.09984
2,K-NN Regressor,21,18,130591900000000.0,0.872082,0.133096
3,K-NN Regressor,28,18,138414800000000.0,0.841641,0.299281
4,Lasso Regression,7,18,73297090000000.0,0.629057,0.529429
5,Lasso Regression,14,18,73795600000000.0,0.616179,0.598993
6,Lasso Regression,21,18,76722360000000.0,0.636351,0.531614
7,Lasso Regression,28,18,79988110000000.0,0.595235,0.643994
8,Linear Regression,7,18,83650910000000.0,0.676481,0.349379
9,Linear Regression,14,18,73122290000000.0,0.623705,0.581839


In [272]:
hora19 = modelo_por_hora(X, y, modelos, 19)
hora19 = pd.DataFrame(resultados)
hora19_mean = hora19.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora19_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,19,143081700000000.0,0.997863,-0.088662
1,K-NN Regressor,14,19,133717000000000.0,0.932752,0.095385
2,K-NN Regressor,21,19,129756100000000.0,0.873193,0.134773
3,K-NN Regressor,28,19,131545300000000.0,0.837212,0.299284
4,Lasso Regression,7,19,70588260000000.0,0.630283,0.51956
5,Lasso Regression,14,19,69523250000000.0,0.616468,0.592412
6,Lasso Regression,21,19,74136840000000.0,0.636106,0.526281
7,Lasso Regression,28,19,74297880000000.0,0.597247,0.630158
8,Linear Regression,7,19,81638910000000.0,0.678187,0.326025
9,Linear Regression,14,19,69377550000000.0,0.623925,0.575193


In [273]:
hora20 = modelo_por_hora(X, y, modelos, 20)
hora20 = pd.DataFrame(resultados)
hora20_mean = hora20.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora20_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,20,148942900000000.0,1.003122,-0.093877
1,K-NN Regressor,14,20,135493500000000.0,0.935183,0.090013
2,K-NN Regressor,21,20,130897700000000.0,0.872138,0.141404
3,K-NN Regressor,28,20,134208800000000.0,0.835592,0.298952
4,Lasso Regression,7,20,73262150000000.0,0.630701,0.495221
5,Lasso Regression,14,20,70915540000000.0,0.615861,0.589582
6,Lasso Regression,21,20,75359020000000.0,0.631196,0.523596
7,Lasso Regression,28,20,77350410000000.0,0.596694,0.634311
8,Linear Regression,7,20,85145350000000.0,0.72025,-0.518052
9,Linear Regression,14,20,70963080000000.0,0.623309,0.571671


In [274]:
hora21 = modelo_por_hora(X, y, modelos, 21)
hora21 = pd.DataFrame(resultados)
hora21_mean = hora21.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora21_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,21,148665600000000.0,1.005467,-0.094325
1,K-NN Regressor,14,21,137025100000000.0,0.933464,0.087675
2,K-NN Regressor,21,21,130187600000000.0,0.869334,0.155715
3,K-NN Regressor,28,21,129728900000000.0,0.834411,0.297522
4,Lasso Regression,7,21,73519240000000.0,0.631555,0.492333
5,Lasso Regression,14,21,73233930000000.0,0.616986,0.586294
6,Lasso Regression,21,21,75079130000000.0,0.631804,0.520462
7,Lasso Regression,28,21,73467440000000.0,0.59852,0.63125
8,Linear Regression,7,21,86383240000000.0,0.775889,-2.924005
9,Linear Regression,14,21,73424290000000.0,0.624798,0.563134


In [275]:
hora22 = modelo_por_hora(X, y, modelos, 22)
hora22 = pd.DataFrame(resultados)  
hora22_mean = hora22.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora22_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,22,153497300000000.0,1.006244,-0.100969
1,K-NN Regressor,14,22,140564000000000.0,0.936957,0.067622
2,K-NN Regressor,21,22,129245000000000.0,0.871138,0.156231
3,K-NN Regressor,28,22,130466600000000.0,0.838255,0.281681
4,Lasso Regression,7,22,75018290000000.0,0.63149,0.489832
5,Lasso Regression,14,22,75079640000000.0,0.61728,0.578805
6,Lasso Regression,21,22,73245490000000.0,0.634232,0.516075
7,Lasso Regression,28,22,74034710000000.0,0.599177,0.626355
8,Linear Regression,7,22,88507100000000.0,0.777165,-2.947476
9,Linear Regression,14,22,75257350000000.0,0.625743,0.550819


In [276]:
hora23 = modelo_por_hora(X, y, modelos, 23)
hora23 = pd.DataFrame(resultados)
hora23_mean = hora23.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
resultados = []
hora23_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,23,154660400000000.0,1.011643,-0.115754
1,K-NN Regressor,14,23,141511200000000.0,0.940364,0.057992
2,K-NN Regressor,21,23,127695000000000.0,0.873191,0.161573
3,K-NN Regressor,28,23,128214200000000.0,0.840258,0.273725
4,Lasso Regression,7,23,74020550000000.0,0.632531,0.490176
5,Lasso Regression,14,23,74859520000000.0,0.61925,0.573877
6,Lasso Regression,21,23,71906160000000.0,0.649079,0.481218
7,Lasso Regression,28,23,72205860000000.0,0.600801,0.625274
8,Linear Regression,7,23,87482330000000.0,0.779976,-2.936036
9,Linear Regression,14,23,74939160000000.0,0.628314,0.545692


In [277]:
hora24 = modelo_por_hora(X, y, modelos, 24)
hora24 = pd.DataFrame(resultados)
hora24_mean = hora24.groupby(['Modelo', 'Ventana de predicción', 'Hora']).mean().reset_index()
hora24_mean

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,24,155224500000000.0,1.01302,-0.128838
1,K-NN Regressor,14,24,142984800000000.0,0.941596,0.045003
2,K-NN Regressor,21,24,123238000000000.0,0.872376,0.161778
3,K-NN Regressor,28,24,131355900000000.0,0.834806,0.275989
4,Lasso Regression,7,24,74115020000000.0,0.634015,0.484499
5,Lasso Regression,14,24,76088030000000.0,0.620284,0.568026
6,Lasso Regression,21,24,71606670000000.0,0.650407,0.474015
7,Lasso Regression,28,24,73393070000000.0,0.60223,0.624888
8,Linear Regression,7,24,87897840000000.0,0.784813,-3.235478
9,Linear Regression,14,24,76072380000000.0,0.630082,0.543308


### **Modelos en general**

In [278]:
knn_dataframes = []
linear_dataframes = []
ridge_dataframes = []
lasso_dataframes = []

for i in range(1, 25):  
    # Utilizar locals() para acceder a las variables dinámicamente
    hora_mean = locals()[f'hora{i}_mean']  # Esto obtiene hora1_mean, hora2_mean, ..., hora24_mean

    # Filtrar y almacenar en las listas correspondientes
    knn_dataframes.append(hora_mean[hora_mean['Modelo'] == 'K-NN Regressor'])
    linear_dataframes.append(hora_mean[hora_mean['Modelo'] == 'Linear Regression'])
    ridge_dataframes.append(hora_mean[hora_mean['Modelo'] == 'Ridge Regression'])
    lasso_dataframes.append(hora_mean[hora_mean['Modelo'] == 'Lasso Regression'])

# Concatenar los DataFrames filtrados por modelo
all_knn_data = pd.concat(knn_dataframes, ignore_index = True)
all_linear_data = pd.concat(linear_dataframes, ignore_index = True)
all_ridge_data = pd.concat(ridge_dataframes, ignore_index = True)
all_lasso_data = pd.concat(lasso_dataframes, ignore_index = True)


In [279]:
rmse_knn = all_knn_data.loc[all_knn_data['RMSE'].idxmin()]
all_knn_data

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,K-NN Regressor,7,1,1.212769e+14,0.943734,-0.015716
1,K-NN Regressor,14,1,1.242531e+14,0.872554,0.128152
2,K-NN Regressor,21,1,1.142225e+14,0.841189,0.163101
3,K-NN Regressor,28,1,1.346887e+14,0.802341,0.245252
4,K-NN Regressor,7,2,1.219342e+14,0.947041,-0.020686
...,...,...,...,...,...,...
92,K-NN Regressor,28,23,1.282142e+14,0.840258,0.273725
93,K-NN Regressor,7,24,1.552245e+14,1.013020,-0.128838
94,K-NN Regressor,14,24,1.429848e+14,0.941596,0.045003
95,K-NN Regressor,21,24,1.232380e+14,0.872376,0.161778


In [280]:
rmse_linear = all_linear_data.loc[all_linear_data['RMSE'].idxmin()]
all_linear_data

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,Linear Regression,7,1,6.761480e+13,0.673537,0.096999
1,Linear Regression,14,1,6.824845e+13,0.602642,0.571402
2,Linear Regression,21,1,6.496088e+13,0.611461,0.550204
3,Linear Regression,28,1,7.619893e+13,0.594453,0.568691
4,Linear Regression,7,2,6.822835e+13,0.673782,0.107158
...,...,...,...,...,...,...
92,Linear Regression,28,23,7.119151e+13,0.600333,0.624224
93,Linear Regression,7,24,8.789784e+13,0.784813,-3.235478
94,Linear Regression,14,24,7.607238e+13,0.630082,0.543308
95,Linear Regression,21,24,7.093243e+13,0.658181,0.451284


In [281]:
rmse_ridge = all_ridge_data.loc[all_ridge_data['RMSE'].idxmin()]
all_ridge_data

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,Ridge Regression,7,1,6.758647e+13,0.663066,0.216165
1,Ridge Regression,14,1,6.824910e+13,0.602632,0.571426
2,Ridge Regression,21,1,6.496206e+13,0.611457,0.550214
3,Ridge Regression,28,1,7.619929e+13,0.594451,0.568696
4,Ridge Regression,7,2,6.818607e+13,0.663295,0.217269
...,...,...,...,...,...,...
92,Ridge Regression,28,23,7.119086e+13,0.600207,0.624451
93,Ridge Regression,7,24,8.755558e+13,0.757605,-1.749240
94,Ridge Regression,14,24,7.602754e+13,0.630011,0.543498
95,Ridge Regression,21,24,7.093285e+13,0.658057,0.451525


In [282]:
rmse_lasso = all_lasso_data.loc[all_lasso_data['RMSE'].idxmin()]
all_lasso_data

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
0,Lasso Regression,7,1,6.713434e+13,0.603769,0.584057
1,Lasso Regression,14,1,6.836641e+13,0.597289,0.582902
2,Lasso Regression,21,1,6.613615e+13,0.605674,0.563441
3,Lasso Regression,28,1,7.703381e+13,0.592457,0.571722
4,Lasso Regression,7,2,6.723935e+13,0.603293,0.586117
...,...,...,...,...,...,...
92,Lasso Regression,28,23,7.220586e+13,0.600801,0.625274
93,Lasso Regression,7,24,7.411502e+13,0.634015,0.484499
94,Lasso Regression,14,24,7.608803e+13,0.620284,0.568026
95,Lasso Regression,21,24,7.160667e+13,0.650407,0.474015


### **Mejores modelos**

In [288]:
mejores_modelos = pd.DataFrame([rmse_knn, rmse_linear, rmse_ridge, rmse_lasso])
mejores_modelos

Unnamed: 0,Modelo,Ventana de predicción,Hora,MAPE,RMSE,R^2
23,K-NN Regressor,28,6,157463400000000.0,0.800859,0.3053
23,Linear Regression,28,6,79119160000000.0,0.586517,0.616344
23,Ridge Regression,28,6,79119830000000.0,0.586516,0.616347
23,Lasso Regression,28,6,80058610000000.0,0.585605,0.619249


In [287]:
k = 28  # Mejor periodo de predicción (28 días)
hora = 6  # Mejor hora
results = []

for nombre_modelo, modelo in modelos.items():
    
    train_inicial = 0
    train_final = train_inicial + (k * 24)

    entrenamiento_indices = (X.index >= train_inicial) & (X.index < train_final)
    X_train = X[entrenamiento_indices]
    y_train = y[entrenamiento_indices]

    test_indices = (X.index >= (train_final + hora)) & (X.index < (train_final + hora + 24))
    X_test = X[test_indices]
    y_test = y[test_indices]
    
    modelo.fit(X_train, y_train)
    y_pred = modelo.predict(X_test)

    residuals = y_test - y_pred # Residuos

    RMSE = mean_squared_error(y_test, y_pred, squared=False)
    MAPE = mean_absolute_percentage_error(y_test, y_pred)
    R2 = r2_score(y_test, y_pred)
    # Calcular las métricas para los resiudos
    jb_test_stat, jb_p_value = jarque_bera(residuals)
    lb_test = acorr_ljungbox(residuals, lags = [10], return_df = True)
    ljung_box_p_value = lb_test['lb_pvalue'].values[0]

    results.append({
        "Modelo": nombre_modelo,
        "Periodo de predicción (días)": k,
        "Hora": hora,
        "RMSE": RMSE,
        "MAPE": MAPE,
        "R^2": R2,
        "Ljung-Box p-value": ljung_box_p_value,
        "Jarque-Bera p-value": jb_p_value
    })

resultados = pd.DataFrame(results)
resultados


Unnamed: 0,Modelo,Periodo de predicción (días),Hora,RMSE,MAPE,R^2,Ljung-Box p-value,Jarque-Bera p-value
0,Regresión K-NN,28,6,0.655531,0.315985,0.606036,0.321228,0.441403
1,Regresión Lineal,28,6,0.453918,0.19994,0.811103,0.539782,0.696738
2,Regresión Ridge,28,6,0.453916,0.199942,0.811105,0.539745,0.696729
3,Regresión Lasso,28,6,0.44408,0.196319,0.819202,0.482946,0.687404
