# Séries temporais

<b>Sumário</b>

- Exemplo de série temporal
- Índice com datas
- Agrupamento por intervalo de data e agregação
- Visualizando séries temporais
    - Visualizando séries temporais com Pandas
    - Visualizando séries temporais com Sktime
- Filtro por intervalo de data
- Janelas de tempos com shift
- Séries temporais como regressão supervisionada
    - Média móvel
- Horizonte de previsão com Sktime
- Modelagem com Sktime
    - Auto ARIMA
    - Exponencial Smoothing
    - Prophet (Facebook)
- Modelagem com Sktime e variáveis exôgeneas
    - Auto Arima com variáveis exôgenas
- Separação de treino e teste com Sktime
- Modelos de regressão do Sklearn com Sktime
    - Gradient boosting apenas com a endogênea
    - Gradient boosting com exôgeneas

In [1]:
# libraries

## data
import datetime

## data structure
import numpy as np
import pandas as pd

## ead
import matplotlib.pyplot as plt
from sktime.utils.plotting import plot_series

## preprocessing
from sktime.forecasting.compose import make_reduction

## train test split
from sktime.forecasting.model_selection import temporal_train_test_split
from sklearn.model_selection import train_test_split

## sktime models
from sktime.forecasting.naive import NaiveForecaster
from sktime.forecasting.arima import AutoARIMA
from sktime.forecasting.exp_smoothing import ExponentialSmoothing
from sktime.forecasting.fbprophet import Prophet

## sklearn regressor models
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import GradientBoostingRegressor

## LightGBM model
from lightgbm import LGBMRegressor

## feature selection
from sklearn.feature_selection import SelectKBest, r_regression

## forecasting horizon 
from sktime.forecasting.base import ForecastingHorizon

## metrics
from sklearn.metrics import mean_absolute_error
from sktime.performance_metrics.forecasting import (
    MeanAbsoluteError,
)

In [2]:
# Dados
df = pd.DataFrame({
    'date': ['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
             '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
             '2013-01-09', '2013-01-10', '2013-01-11', '2013-01-12',
             '2013-01-13', '2013-01-14', '2013-01-15', '2013-01-16',
             '2013-01-17', '2013-01-18', '2013-01-19', '2013-01-20',
             '2013-01-21', '2013-01-22', '2013-01-23', '2013-01-24',
             '2013-01-25', '2013-01-26', '2013-01-27', '2013-01-28',
             '2013-01-29', '2013-01-30'],
    'sales': [
        2,4,5,3,5,10,2,5,3,10,
        1,5,6,2,6,11,3,5,4,10,
        2,7,8,1,5,12,3,5,2,11
    ],
    'onpromotion': [
        1,0,1,0,0,5,0,0,0,4,
        0,2,1,1,0,4,0,0,1,6,
        0,3,2,0,1,8,0,0,1,4
    ]
})

## Data no índice

In [3]:
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.to_period('D') # sktime
df.set_index('date', inplace=True)

## Fitro por data

In [4]:
df[(df.index > '2013-01-20') & (df.index < '2013-01-23')]

In [5]:
df[df.index.to_series().between(
    '2013-01-13','2013-01-15')]

## Visualizando séries

In [6]:
plt.figure(figsize=(16,3))
df['sales'].plot()
plt.show()

In [7]:
plot_series(df['sales'])
plt.xticks(rotation=75)
plt.show()

## Janelas de tempos com shift

In [8]:
df['t'] = df['sales']
df['t-1'] = df['sales'].shift(1)
df['t-2'] = df['sales'].shift(2)

df_shifted = df[['t','t-1','t-2']].dropna()
df_shifted.head(5)

## Média móvel dos últimos dois dias

In [9]:
df_shifted['mean'] = (
    df_shifted['t-1'] + df_shifted['t-2'])/2

In [10]:
n = 6
y_train = df_shifted['t'][:-n]
y_test = df_shifted['t'][-n:]
y_pred = df_shifted['mean'][-n:]

In [11]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.show()

In [12]:
mean_absolute_error(y_pred, y_test)

## Modelagem com Sktime

### Modelo Naive

In [13]:
# model
forecaster = NaiveForecaster()

# forecast horizon
fh = np.arange(1,7) # 1,2,3,4,5,6

# train
forecaster.fit(y_train)

# predict
y_pred = forecaster.predict(fh)

In [14]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [15]:
mean_absolute_error(y_pred, y_test)

### Modelo auto ARIMA

In [16]:
# model
forecaster = AutoARIMA()

# forecast horizon
fh = np.arange(1,7)

# train
forecaster.fit(y_train)

# predict
y_pred = forecaster.predict(fh)

In [17]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.legend()
plt.show()

In [18]:
mean_absolute_error(y_pred, y_test)

### Modelo suavizamento exponencial

In [19]:
forecaster = ExponentialSmoothing()

fh = np.arange(1,7)

forecaster.fit(y_train)

y_pred = forecaster.predict(fh)

In [20]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [21]:
mean_absolute_error(y_pred, y_test)

### Modelagem com Prophet (Facebook)

In [22]:
forecaster = Prophet()

fh = np.arange(1,7)

forecaster.fit(y_train)

y_pred = forecaster.predict(fh)

In [23]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [24]:
mean_absolute_error(y_pred, y_test)

## Modelagem com Sktime e variáveis exôgeneas

### Auto ARIMA

In [25]:
y = df['sales'] # endôgenea
X = df[['onpromotion']] # exôgenea

y_train = y[:-n]
y_test = y[-n:]

X_train = X.iloc[:-n,:]
X_test = X.iloc[-n:,:]

In [26]:
# model
forecaster = AutoARIMA(suppress_warnings=True)

# train
forecaster.fit(y_train, X_train)

# forescat horizon
fh = np.arange(1,7)

# predict
y_pred = forecaster.predict(fh, X=X_test)

In [27]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

### Separação de treino e teste com Sktime

In [28]:
# forescat horizon
fh = ForecastingHorizon(
    pd.PeriodIndex(
        pd.date_range('2013-01-25', periods=6, freq="D")
    ),
    is_relative=False)

# train test split
y_train, y_test, X_train, X_test = temporal_train_test_split(
    y, X, fh=fh
)

# model
forecaster = AutoARIMA(suppress_warnings=True)

# train
forecaster.fit(y_train, X_train)

# forescat horizon
fh = np.arange(1,7)

# predict
y_pred = forecaster.predict(fh, X=X_test)

In [29]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [30]:
mean_absolute_error(y_pred, y_test)

## Regressão com Sktime

### Gradiente Boosting com apenas uma variável endogênea

In [31]:
# get model
regressor = GradientBoostingRegressor()

# convert data to tabular and ajust model
forecaster = make_reduction(
    regressor, 
    window_length=10, 
    strategy="recursive")

# fit model
forecaster.fit(y_train)

# get predictions
y_pred = forecaster.predict(fh)

In [32]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [33]:
mean_absolute_error(y_pred, y_test)

### Gradiente Boosting com variáveis exôgeneas

In [34]:
# get model
regressor = GradientBoostingRegressor()

# convert data to tabular and ajust model
forecaster = make_reduction(
    regressor, 
    window_length=10, 
    strategy="recursive")

# fit model
forecaster.fit(y_train, X_train)

# get predictions
y_pred = forecaster.predict(fh, X=X_test)

In [35]:
plot_series(
    y_train, y_test, y_pred,
    labels=['train', 'test', 'pred']
)
plt.xticks(rotation=70)
plt.legend()
plt.show()

In [36]:
mean_absolute_error(y_pred, y_test)

## Exercício

In [37]:
from sktime.datasets import load_lynx, load_airline, load_macroeconomic

Treine pelo menos três modelos (pelo menos um de regressão) para cada um dos datasets acima, observe que o macroeconomic possuem variáveis exôgeneas. Considere os 4 últimos períodos como conjunto de teste. Quais dos modelos que você treinou foram melhores em cada caso?