# Séries Temporais e Forecast - SARIMAX

Neste notebook, usaremos o pacote pmdarima para implementar o modelo estatístico SARIMAX para prever o valor das ações do Facebook usando dados entre o intervalo de 29/02/2016 e 25/02/2022.  
Fonte dos dados: https://finance.yahoo.com/quote/FB/history/

In [16]:
import pandas as pd
import plotly.express as px
import numpy as np
from pmdarima import auto_arima
from sklearn import metrics
df = pd.read_csv('FB.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2016-02-29,107.599998,108.910004,106.75,106.919998,106.919998,32779000
1,2016-03-01,107.830002,109.82,107.599998,109.82,109.82,26817300
2,2016-03-02,109.68,110.550003,108.769997,109.949997,109.949997,25670200
3,2016-03-03,110.25,110.300003,108.540001,109.580002,109.580002,21353100
4,2016-03-04,110.050003,110.050003,107.93,108.389999,108.389999,24938900


In [2]:
def timeseries_evaluation_metrics_func(y_true, y_pred):
    def mean_absolute_percentage_error(y_true, y_pred):
        y_true, y_pred = np.array(y_true), np.array(y_pred)
        return np.mean(np.abs((y_true - y_pred/y_true))) * 100
    print(f'MSE: {metrics.mean_squared_error(y_true, y_pred)}')
    print(f'MAE: {metrics.mean_absolute_error(y_true, y_pred)}')
    print(f'RMSE: {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
    print(f'MAPE: {mean_absolute_percentage_error(y_true, y_pred)}')
    print(f'R2: {metrics.r2_score(y_true, y_pred)}')

Abaixo, faremos a previsão de valores das ações do Facebook entre o período de 13/01/2022 e 25/02/2022

In [3]:
#30 últimos registros para a previsão de valores (forecast)
#13/01/2022 - 25/02/2022
X = df['Close']
train, test = X[:-30], X[-30:]
exoX = df['Open']
exotrain, exotest = exoX[:-30], exoX[-30:]

O modelo SARIMAX é usado para séries temporais univariadas (de uma varíavel), usando uma variável exógena (externa) para fazer as predições.  
Aqui, a variável de predição é o valor de fechamento da ação, e a variável exógena é o valor de abertura da ação.

In [5]:
model = auto_arima(
    np.array(train).reshape(-1, 1),
    exogenous = np.array(exotrain).reshape(-1, 1),
    start_p = 1,
    start_q = 1,
    max_p = 7,
    max_q = 7,
    seasonal = True,
    start_P = 1,
    start_Q = 1,
    max_P = 1,
    max_D = 1,
    max_Q = 7,
    d = None,
    D = None,
    trace = True,
    error_action = 'ignore',
    suppress_warnings = True,
    stepwise = True)

Performing stepwise search to minimize aic
 ARIMA(1,0,1)(0,0,0)[0] intercept   : AIC=7643.345, Time=0.21 sec
 ARIMA(0,0,0)(0,0,0)[0] intercept   : AIC=7656.984, Time=0.13 sec
 ARIMA(1,0,0)(0,0,0)[0] intercept   : AIC=7649.841, Time=0.26 sec
 ARIMA(0,0,1)(0,0,0)[0] intercept   : AIC=7649.492, Time=0.18 sec
 ARIMA(0,0,0)(0,0,0)[0]             : AIC=7655.822, Time=0.25 sec
 ARIMA(2,0,1)(0,0,0)[0] intercept   : AIC=7644.794, Time=0.89 sec
 ARIMA(1,0,2)(0,0,0)[0] intercept   : AIC=7645.567, Time=0.44 sec
 ARIMA(0,0,2)(0,0,0)[0] intercept   : AIC=7648.233, Time=0.21 sec
 ARIMA(2,0,0)(0,0,0)[0] intercept   : AIC=7649.025, Time=0.80 sec
 ARIMA(2,0,2)(0,0,0)[0] intercept   : AIC=7645.248, Time=0.65 sec
 ARIMA(1,0,1)(0,0,0)[0]             : AIC=7641.792, Time=0.22 sec
 ARIMA(0,0,1)(0,0,0)[0]             : AIC=7647.666, Time=0.20 sec
 ARIMA(1,0,0)(0,0,0)[0]             : AIC=7648.580, Time=0.07 sec
 ARIMA(2,0,1)(0,0,0)[0]             : AIC=7649.417, Time=0.22 sec
 ARIMA(1,0,2)(0,0,0)[0]          

In [6]:
model.summary()

0,1,2,3
Dep. Variable:,y,No. Observations:,1481.0
Model:,"SARIMAX(1, 0, 1)",Log Likelihood,-3816.896
Date:,"Tue, 01 Mar 2022",AIC,7641.792
Time:,12:58:42,BIC,7662.994
Sample:,0,HQIC,7649.696
,- 1481,,
Covariance Type:,opg,,

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
x1,1.0003,0.000,4104.867,0.000,1.000,1.001
ar.L1,0.6334,0.104,6.107,0.000,0.430,0.837
ma.L1,-0.7203,0.094,-7.645,0.000,-0.905,-0.536
sigma2,10.1599,0.217,46.813,0.000,9.735,10.585

0,1,2,3
Ljung-Box (L1) (Q):,0.0,Jarque-Bera (JB):,1245.77
Prob(Q):,0.98,Prob(JB):,0.0
Heteroskedasticity (H):,8.76,Skew:,-0.14
Prob(H) (two-sided):,0.0,Kurtosis:,7.48


In [14]:
#Gerando as predições e avaliando o modelo com várias métricas (MSE, MAE, RMSE, MAPE, R2)
forecast, conf_int = model.predict(n_periods = 30, exogenous = np.array(exotest).reshape(-1, 1), return_conf_int = True)
df_conf = pd.DataFrame(conf_int, columns = ['Upper_bound', 'Lower_bound'])
df_conf.index = range(1481, 1511)
forecast = pd.DataFrame(forecast, columns = ['Predição'])
forecast.index = range(1481, 1511)
timeseries_evaluation_metrics_func(test, forecast)
    

MSE: 55.75696148880931
MAE: 6.017359726964369
RMSE: 7.467058422753187
MAPE: 26090.352369604072
R2: 0.9762165457927177


In [15]:
#Plotando os resultados
fig1 = px.line(train.rename('Treino')[-30:], title = 'Predição de valores de ações do Facebook usando SARIMAX (12/01/2022 - 25/02/2022)')
fig1.data[0].line.color = "#0000ff"
fig1.data[0].x = pd.to_datetime(df['Date'])[-60:-30]
fig2 = px.line(test.rename('Teste'))
fig2.data[0].line.color = "#ff0000"
fig2.data[0].x = pd.to_datetime(df['Date'])[-30:]
fig1.add_trace(fig2.data[0])
fig3 = px.line(forecast)
fig3.data[0].line.color = "#ffa500"
fig3.data[0].x = pd.to_datetime(df['Date'])[-30:]
fig1.add_trace(fig3.data[0])
fig1.update_xaxes(title_text = 'Data')
fig1.update_yaxes(title_text = 'Valores (em US$)')
fig1.show()

