## **Aplicação de Regressão Linear Simples e Múltipla**
Neste notebook são aplicados os modelos de regressão linear simples e múltipla, bem como a construção de features para captura de sazonalidade e outras características das séries temporais.
Para avaliação do desempenho do modelo, serão utilizadas as métricas **MAE**, **MSE**, **RMSE** e **R²**, comparados aos modelos baseline **SARIMA** e **ETS**. 

In [1]:
# confirgurando diretório
import sys
import os
sys.path.append(os.path.abspath('..'))

# importando bibliotecas
import warnings
import swifter # parelelismo
import pandas as pd
from statsforecast import StatsForecast 
from statsforecast.models import SeasonalNaive, AutoARIMA, AutoETS 
from src.evaluation import Evaluation


# configurando notebook
warnings.filterwarnings("ignore")
pd.set_option('display.float_format', lambda x: '%.2f' % x)


%load_ext autoreload
%autoreload 2

In [2]:
# constantes
EVAL_PATH = 'data/evaluation.csv'

## **Leitura dos Dados**
Leitura dos dados de treino e teste exportados no notebook *analise_exploratoria.ipynb*. Para cada série, foram separados os 70% primeiros pontos de dados para treino e o restante, ou seja, os dados mais recentes, para teste.

In [3]:
# leitura dos dados
train_df = pd.read_csv("data/features_train_df.csv", index_col=[0])
test_df = pd.read_csv("data/features_test_df.csv", index_col=[0])

## **Modelos Baseline**
Os modelos ARIMA e ETS são modelos próprios de séries temporais, logo, serão usados para comparar com os modelos de machine learning desenvolvidos na pesquisa. Além deles, o modelo SeasonalNaive, que apenas repete os últimos valores conhecidos com base no período sazonal também será usado.

In [4]:
models = [
    SeasonalNaive(season_length=7),
    AutoETS(season_length=7),
    AutoARIMA(season_length=7)
]

sf = StatsForecast(
    models=models,
    freq='D',
    n_jobs=-1
)

def baseline_models(df):
    """
    Aplica os modelos SeasonalNaive, ETS e SARIMA.
    """
    train = df.copy().reset_index(drop=True)
    test = test_df.query("unique_id == @train.unique_id[0]").reset_index(drop=True)

    sf.fit(df=train)
    predict = sf.predict(h=len(test)).reset_index(drop=True)
    
    test['SeasonalNaive'] = predict['SeasonalNaive']
    test['ETS'] = predict['AutoETS']
    test['ARIMA'] = predict['AutoARIMA']
    
    return test

In [5]:
baseline_fcst = train_df[['ds','y','unique_id']].swifter \
    .groupby("unique_id") \
    .apply(baseline_models) \
    .reset_index(drop=True)

  0%|          | 0/15 [00:00<?, ?it/s]

2025-04-21 23:39:53,996	INFO worker.py:1821 -- Started a local Ray instance.


In [6]:
baseline_fcst.head()

Unnamed: 0,unique_id,ds,y,dia,feriado,dia_da_semana,semana,mes,max_outliers,min_outliers,...,dia_da_semana_sin,dia_da_semana_cos,semana_sin,semana_cos,mes_sin,mes_cos,pandemia,SeasonalNaive,ETS,ARIMA
0,SKU_01,2023-10-19,360.0,19,0,4,42,10,0,0,...,-0.43,-0.9,-0.96,0.26,-0.87,0.5,0,30.0,332.54,303.26
1,SKU_01,2023-10-20,240.0,20,0,5,42,10,0,0,...,-0.97,-0.22,-0.96,0.26,-0.87,0.5,0,236.0,409.13,395.28
2,SKU_01,2023-10-21,68.0,21,0,6,42,10,0,0,...,-0.78,0.62,-0.96,0.26,-0.87,0.5,0,41.0,67.18,53.55
3,SKU_01,2023-10-22,56.0,22,0,7,42,10,0,0,...,-0.0,1.0,-0.96,0.26,-0.87,0.5,0,48.0,61.69,47.75
4,SKU_01,2023-10-23,344.0,23,0,1,43,10,0,0,...,0.78,0.62,-0.93,0.38,-0.87,0.5,0,385.0,454.68,443.33


In [7]:
naive_eval = Evaluation(df=baseline_fcst, y_pred_col='SeasonalNaive')
naive_eval.summary()
naive_eval.save_evaluation(EVAL_PATH, 'Naive Sazonal')
naive_eval.evaluation_df

Unnamed: 0,mae,mse,rmse,r2
SKU_01,141.77,38564.95,196.38,-0.39
SKU_02,143.07,46634.79,215.95,0.09
SKU_03,37.01,2937.74,54.2,-0.04
SKU_04,19.5,877.38,29.62,0.21
SKU_05,395.71,287627.79,536.31,-0.45
SKU_06,17.21,576.0,24.0,0.08
SKU_07,252.49,165274.65,406.54,-0.02
SKU_08,37.34,2862.31,53.5,-1.22
SKU_09,36.12,2801.73,52.93,-0.05
SKU_10,12.81,291.26,17.07,-0.8


In [8]:
sarima_eval = Evaluation(df=baseline_fcst, y_pred_col='ARIMA')
sarima_eval.summary()
sarima_eval.save_evaluation(EVAL_PATH, 'SARIMA')
sarima_eval.evaluation_df

Unnamed: 0,mae,mse,rmse,r2
SKU_01,81.43,14647.03,121.02,0.47
SKU_02,88.28,18303.74,135.29,0.64
SKU_03,21.26,1061.7,32.58,0.63
SKU_04,15.61,551.56,23.49,0.5
SKU_05,229.41,107218.2,327.44,0.46
SKU_06,15.46,459.4,21.43,0.27
SKU_07,382.83,291361.6,539.78,-0.79
SKU_08,40.44,2485.15,49.85,-0.92
SKU_09,43.98,2659.13,51.57,0.0
SKU_10,8.0,115.97,10.77,0.28


In [9]:
ets_eval = Evaluation(df=baseline_fcst, y_pred_col='ETS')
ets_eval.summary()
ets_eval.save_evaluation(EVAL_PATH, 'ETS')
ets_eval.evaluation_df

Unnamed: 0,mae,mse,rmse,r2
SKU_01,92.72,16606.96,128.87,0.4
SKU_02,87.71,18118.29,134.6,0.65
SKU_03,21.77,1053.49,32.46,0.63
SKU_04,14.92,470.53,21.69,0.58
SKU_05,264.45,129151.8,359.38,0.35
SKU_06,14.12,391.93,19.8,0.38
SKU_07,288.71,137181.42,370.38,0.16
SKU_08,40.67,2486.51,49.86,-0.93
SKU_09,25.68,1250.04,35.36,0.53
SKU_10,10.25,182.83,13.52,-0.13


In [10]:
# comparação dos modelos
metrics = pd.read_csv(EVAL_PATH)
order = metrics['model'].sort_values().unique()
metrics = metrics.pivot_table('rmse','unique_id','model')[order]

metrics.T

unique_id,SKU_01,SKU_02,SKU_03,SKU_04,SKU_05,SKU_06,SKU_07,SKU_08,SKU_09,SKU_10,SKU_11,SKU_12,SKU_13,SKU_14,SKU_15
model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
ETS,128.87,134.6,32.46,21.69,359.38,19.8,370.38,49.86,35.36,13.52,262.02,38.21,260.67,51.57,151.65
Naive Sazonal,196.38,215.95,54.2,29.62,536.31,24.0,406.54,53.5,52.93,17.07,316.51,63.68,344.07,63.21,280.05
SARIMA,121.02,135.29,32.58,23.49,327.44,21.43,539.78,49.85,51.57,10.77,238.89,39.88,322.87,54.82,202.56
