#### Este baseline se basara en armar un modelo LSTM por cada producto, con una optimizacion de hiper parametros escueta, para poder comparar con futuros experimientos. En caso de que esta alternativa funcione bien, seria recomendable incorporar parametros de optimizacion extra.

#### Imports

In [8]:
import warnings
warnings.filterwarnings('ignore')
from prophet import Prophet

import pandas as pd
import numpy as np

In [9]:
final_dataset = pd.read_csv('../../Datasets/final_dataset.csv', sep='\t')

In [10]:
final_dataset.head()

Unnamed: 0,periodo,product_id,plan_precios_cuidados,cust_request_qty,cust_request_tn,y,cat1,cat2,cat3,brand,sku_size,stock_final,close_quarter,age
0,201701,20001,0,479,937.72717,934.77222,HC,ROPA LAVADO,Liquido,ARIEL,3000,,0,0
1,201702,20001,0,432,833.72187,798.0162,HC,ROPA LAVADO,Liquido,ARIEL,3000,,0,1
2,201703,20001,0,509,1330.74697,1303.35771,HC,ROPA LAVADO,Liquido,ARIEL,3000,,1,2
3,201704,20001,0,279,1132.9443,1069.9613,HC,ROPA LAVADO,Liquido,ARIEL,3000,,0,3
4,201705,20001,0,701,1550.68936,1502.20132,HC,ROPA LAVADO,Liquido,ARIEL,3000,,0,4


In [11]:
columns = ['plan_precios_cuidados', 'cust_request_qty', 'cust_request_tn', 'close_quarter','y']

In [12]:
duplicates = final_dataset.duplicated(subset=['product_id', 'periodo'])
duplicate_rows = final_dataset[duplicates]

if duplicate_rows.empty:
    print("No hay registros duplicados por product_id y periodo.")
else:
    print("Registros duplicados encontrados por product_id y periodo:")
    display(duplicate_rows)


No hay registros duplicados por product_id y periodo.


#### Funcion para crear los modelos

#### Armado de los modelos

In [13]:
final_dataset['periodo'] = final_dataset['periodo'].astype(str)
final_dataset['periodo'] = pd.to_datetime(final_dataset['periodo'], format='%Y%m', errors='coerce')

In [14]:
import os
import joblib
from neuralprophet import NeuralProphet

product_ids = final_dataset['product_id'].unique()
predictions = []
    
for product_id in product_ids:
    product_data = final_dataset[final_dataset['product_id'] == product_id].sort_values(by='periodo')[['periodo', 'y']]
    
    product_data = product_data.rename(columns={'periodo': 'ds', 'y': 'y'})
    # product_data['y'] = np.log1p(product_data['y'])
    product_data['y'] = np.log1p(product_data['y'])
    
    # try:
    model = NeuralProphet(
        # yearly_seasonality=True,
        # n_changepoints=20,
        learning_rate=0.001,
        epochs=200,
        n_forecasts= 2,
        loss_func='MSE'
    )
    model.fit(product_data)
    # except:
    #     display(product_data)
    
    future = model.make_future_dataframe(product_data, periods=2)
    forecast = model.predict(future)
    
    forecast['yhat1'] = np.expm1(forecast['yhat1'])
    forecast['yhat1'] = forecast['yhat1'].clip(lower=0)

    os.makedirs('NeuralProphet_results', exist_ok=True)
    joblib.dump(model, f'NeuralProphet_results/model_product_{product_id}.pkl')
    print(forecast.head())  # Imprime las primeras filas del DataFrame forecast

    predicted_y = forecast.iloc[-1]['yhat1']
    predictions.append({'product_id': product_id, 'predicted_y': predicted_y})

    print(f'Modelo para el producto {product_id} entrenado y guardado. Predicción a 2 meses: {predicted_y}. Numero de predicciones ${len(predictions)}')

INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.


INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as MS
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling weekly seasonality. Run NeuralProphet with weekly_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 8


Epoch 200: 100%|██████████| 200/200 [00:00<00:00, 5647.07it/s, loss=0.137, v_num=1572, MAE=0.284, RMSE=0.332, Loss=0.134, RegLoss=0.000]  


INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - MS
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 182.42it/s]


INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as MS
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling weekly seasonality. Run NeuralProphet with weekly_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.


          ds     y        yhat1     trend  season_yearly
0 2020-01-01  None  2049.849609  7.668935      -0.042926
1 2020-02-01  None  2373.396484  7.690690       0.081809
Modelo para el producto 20001 entrenado y guardado. Predicción a 2 meses: 2373.396484375. Numero de predicciones $1


INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 8


Epoch 200: 100%|██████████| 200/200 [00:00<00:00, 5389.96it/s, loss=2.68, v_num=1573, MAE=1.880, RMSE=1.950, Loss=2.610, RegLoss=0.000]  

INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.





INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - MS
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 198.72it/s]

INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column





INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as MS
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.


          ds     y       yhat1     trend  season_yearly
0 2020-01-01  None  211.054626  4.939672       0.417171
1 2020-02-01  None   59.037865  4.939366      -0.844390
Modelo para el producto 20002 entrenado y guardado. Predicción a 2 meses: 59.037864685058594. Numero de predicciones $2


INFO - (NP.utils.set_auto_seasonalities) - Disabling weekly seasonality. Run NeuralProphet with weekly_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 8


Epoch 200: 100%|██████████| 200/200 [00:00<00:00, 5124.91it/s, loss=0.962, v_num=1574, MAE=0.979, RMSE=1.110, Loss=0.923, RegLoss=0.000]  


INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - MS
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 211.64it/s]


INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column


          ds     y       yhat1     trend  season_yearly
0 2020-01-01  None  161.443176  5.469905      -0.379577
1 2020-02-01  None  303.704712  5.444809       0.274534
Modelo para el producto 20003 entrenado y guardado. Predicción a 2 meses: 303.7047119140625. Numero de predicciones $3


INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as MS
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling weekly seasonality. Run NeuralProphet with weekly_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 8


Epoch 200: 100%|██████████| 200/200 [00:00<00:00, 5388.05it/s, loss=0.251, v_num=1575, MAE=0.495, RMSE=0.586, Loss=0.240, RegLoss=0.000]  

INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.





INFO - (NP.df_utils._infer_frequency) - Defined frequency is equal to major frequency - MS
INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [50.]% of the data.


Predicting DataLoader 0: 100%|██████████| 1/1 [00:00<00:00, 169.98it/s]

INFO - (NP.df_utils.return_df_in_original_format) - Returning df with no ID column
INFO - (NP.df_utils._infer_frequency) - Major frequency MS corresponds to [97.222]% of the data.
INFO - (NP.df_utils._infer_frequency) - Dataframe freq automatically defined as MS
INFO - (NP.config.init_data_params) - Setting normalization to global as only one dataframe provided for training.
INFO - (NP.utils.set_auto_seasonalities) - Disabling weekly seasonality. Run NeuralProphet with weekly_seasonality=True to override this.
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 8



          ds     y       yhat1     trend  season_yearly
0 2020-01-01  None  295.555267  6.207010      -0.514777
1 2020-02-01  None  437.954132  6.213947      -0.129552
Modelo para el producto 20004 entrenado y guardado. Predicción a 2 meses: 437.9541320800781. Numero de predicciones $4
Epoch 116:  58%|█████▊    | 116/200 [00:00<00:00, 69416.36it/s, loss=0.715, v_num=1576, MAE=0.843, RMSE=1.010, Loss=0.723, RegLoss=0.000]

In [None]:
predictions_df = pd.DataFrame(predictions)
predictions_df.to_csv('../../Datasets/predictions.csv', index=False)

print('Todas las predicciones han sido generadas y guardadas en predictions.csv.')

Todas las predicciones han sido generadas y guardadas en predictions.csv.
