### Sliding windows are applied as a training method for the time series, aiming to predict the variable y. Additionally, various ML models such as LGBM, Gradient Boosting Regressor, Linear Regressor, Random Forest Regressor, and CatBoost Regressor are trained to obtain the best possible model for the provided data. The model is also optimized using a predefined hyperparameter grid to ensure its robustness in a production environment.

## Libraries

In [1]:
from utils import *

## Data reading

In [2]:
data = pd.read_csv('train.csv')
data.columns.values[0] = 'Dates'

In [3]:
# Convertir la columna 'Fecha' al formato de fecha adecuado
data['Dates'] = pd.to_datetime(data['Dates'], format='%d.%m.%y')

## ABT

In [4]:
data = data.dropna()
df =  data[['y']]
columns_final = list(df.columns)
n_windows = 6
df_last = windowed_df(df,columns_final, n_windows)
df_last.head(10)

Unnamed: 0,y(t-5),y(t-4),y(t-3),y(t-2),y(t-1),y(t)
0,1.91157,1.44733,1.89355,2.03274,2.27843,1.95235
1,1.44733,1.89355,2.03274,2.27843,1.95235,2.66617
2,1.89355,2.03274,2.27843,1.95235,2.66617,2.32002
3,2.03274,2.27843,1.95235,2.66617,2.32002,1.92562
4,2.27843,1.95235,2.66617,2.32002,1.92562,2.45096
5,1.95235,2.66617,2.32002,1.92562,2.45096,1.97315
6,2.66617,2.32002,1.92562,2.45096,1.97315,1.85
7,2.32002,1.92562,2.45096,1.97315,1.85,2.49962
8,1.92562,2.45096,1.97315,1.85,2.49962,1.52566
9,2.45096,1.97315,1.85,2.49962,1.52566,1.60425


In [5]:
n_columns = len(columns_final)
modelos_x_vars = winner_model(df_last, columns_final, False ,n_columns)

['y(t-5)', 'y(t-4)', 'y(t-3)', 'y(t-2)', 'y(t-1)', 'y(t)']
['y(t-5)', 'y(t-4)', 'y(t-3)', 'y(t-2)', 'y(t-1)', 'y(t)']
['y(t-5)', 'y(t-4)', 'y(t-3)', 'y(t-2)', 'y(t-1)', 'y(t)']
Learning rate set to 0.021508
0:	learn: 0.3556745	total: 156ms	remaining: 2m 36s
1:	learn: 0.3533804	total: 158ms	remaining: 1m 18s
2:	learn: 0.3510483	total: 159ms	remaining: 52.8s
3:	learn: 0.3489351	total: 160ms	remaining: 39.9s
4:	learn: 0.3471534	total: 162ms	remaining: 32.2s
5:	learn: 0.3450244	total: 163ms	remaining: 27s
6:	learn: 0.3426762	total: 164ms	remaining: 23.3s
7:	learn: 0.3399324	total: 165ms	remaining: 20.5s
8:	learn: 0.3383883	total: 166ms	remaining: 18.3s
9:	learn: 0.3368308	total: 167ms	remaining: 16.5s
10:	learn: 0.3346285	total: 168ms	remaining: 15.1s
11:	learn: 0.3329188	total: 169ms	remaining: 13.9s
12:	learn: 0.3317827	total: 171ms	remaining: 12.9s
13:	learn: 0.3298662	total: 172ms	remaining: 12.1s
14:	learn: 0.3275796	total: 174ms	remaining: 11.4s
15:	learn: 0.3256071	total: 175ms	rema

## Predictions

In [6]:
# train with the historical data

n_columns = len(columns_final)
var_predict = ['y']  #var a predecir

df_info_model = winner_model(df_last, var_predict, True ,n_columns)
model = df_info_model['Modelo_Ganador'][0]
model_name = df_info_model['model_name'][0]

# train of the winner model with the historical data
X = df_last.iloc[:, :-n_columns]
y = df_last[var_predict[0]+'(t)']

mejor_modelo, best_mape = optimizar_best_model(X, y, model_name)

Learning rate set to 0.021508
0:	learn: 0.3556745	total: 1.22ms	remaining: 1.22s
1:	learn: 0.3533804	total: 2.48ms	remaining: 1.24s
2:	learn: 0.3510483	total: 3.59ms	remaining: 1.19s
3:	learn: 0.3489351	total: 4.1ms	remaining: 1.02s
4:	learn: 0.3471534	total: 4.76ms	remaining: 948ms
5:	learn: 0.3450244	total: 5.64ms	remaining: 935ms
6:	learn: 0.3426762	total: 6.56ms	remaining: 930ms
7:	learn: 0.3399324	total: 7.35ms	remaining: 912ms
8:	learn: 0.3383883	total: 8.02ms	remaining: 884ms
9:	learn: 0.3368308	total: 8.73ms	remaining: 864ms
10:	learn: 0.3346285	total: 9.4ms	remaining: 845ms
11:	learn: 0.3329188	total: 10.1ms	remaining: 832ms
12:	learn: 0.3317827	total: 11.2ms	remaining: 847ms
13:	learn: 0.3298662	total: 11.8ms	remaining: 832ms
14:	learn: 0.3275796	total: 12.8ms	remaining: 843ms
15:	learn: 0.3256071	total: 13.7ms	remaining: 842ms
16:	learn: 0.3238549	total: 15.2ms	remaining: 881ms
17:	learn: 0.3219061	total: 16.2ms	remaining: 884ms
18:	learn: 0.3197480	total: 16.8ms	remaining: 

In [7]:
df_info_model.loc[0, 'Modelo_Ganador'] = mejor_modelo
df_info_model.loc[0, 'MAPE'] = best_mape

In [8]:
df_para_predict = df_predict(df_last, df_info_model, n_periodos=12)
df_para_predict = df_para_predict[list(X.columns)]
df_para_predict

y
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.000123 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 120
[LightGBM] [Info] Number of data points in the train set: 69, number of used features: 5
[LightGBM] [Info] Start training from score 2.514723


Unnamed: 0,y(t-5),y(t-4),y(t-3),y(t-2),y(t-1)
0,2.21851,2.01204,1.97353,2.19952,2.9674
1,2.01204,1.97353,2.19952,2.9674,2.61838
2,1.97353,2.19952,2.9674,2.61838,2.601202
3,2.19952,2.9674,2.61838,2.601202,2.361429
4,2.9674,2.61838,2.601202,2.361429,2.239809
5,2.61838,2.601202,2.361429,2.239809,2.556505
6,2.601202,2.361429,2.239809,2.556505,2.453057
7,2.361429,2.239809,2.556505,2.453057,2.077378
8,2.239809,2.556505,2.453057,2.077378,2.17527
9,2.556505,2.453057,2.077378,2.17527,2.213707


In [9]:
predicciones = mejor_modelo.predict(df_para_predict)



## Output

In [10]:
fecha_maxima = data['Dates'].max()
# Generate 12 predictions
output_pred = pd.DataFrame()
output_pred['Dates'] = pd.date_range(start=fecha_maxima, periods=14, freq='M')[1:]
output_pred['Dates'] = output_pred['Dates'].apply(lambda x: x.replace(day=1))
output_pred['y'] = pd.DataFrame(predicciones, columns=['Valores'])
output_pred = output_pred.iloc[:-1]

In [11]:
Final_output = pd.concat([data, output_pred], ignore_index=True)

In [12]:
# File name
file_name = 'Real&Predicctions.csv'

# Save the file
Final_output.to_csv(file_name, index=False)

In [13]:
output_pred.to_csv('test.csv', index=False)