# Séries Temporais e Forecast - XGBoost

Neste notebook, usaremos o pacote XGBoost para prever o valor das ações do Facebook usando dados entre o intervalo de 29/02/2016 e 25/02/2022.  
Fonte dos dados: https://finance.yahoo.com/quote/FB/history/

In [12]:
import xgboost
import pandas as pd
import numpy as np
import plotly.express as px
from sklearn import metrics, preprocessing
df = pd.read_csv('FB.csv')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2016-02-29,107.599998,108.910004,106.75,106.919998,106.919998,32779000
1,2016-03-01,107.830002,109.82,107.599998,109.82,109.82,26817300
2,2016-03-02,109.68,110.550003,108.769997,109.949997,109.949997,25670200
3,2016-03-03,110.25,110.300003,108.540001,109.580002,109.580002,21353100
4,2016-03-04,110.050003,110.050003,107.93,108.389999,108.389999,24938900


In [2]:
def timeseries_evaluation_metrics_func(y_true, y_pred):
    def mean_absolute_percentage_error(y_true, y_pred):
        y_true, y_pred = np.array(y_true), np.array(y_pred)
        return np.mean(np.abs((y_true - y_pred/y_true))) * 100
    print(f'MSE: {metrics.mean_squared_error(y_true, y_pred)}')
    print(f'MAE: {metrics.mean_absolute_error(y_true, y_pred)}')
    print(f'RMSE: {np.sqrt(metrics.mean_squared_error(y_true, y_pred))}')
    print(f'MAPE: {mean_absolute_percentage_error(y_true, y_pred)}')
    print(f'R2: {metrics.r2_score(y_true, y_pred)}')

In [55]:
X = df[['Open', 'High', 'Low']]
y = df[['Close']]

Separando os conjuntos de treino e de teste (amostras de teste de 30 dias úteis entre o período de 13/01/2022 e 25/02/2022)

In [57]:
x_train, y_train = X[:-30], y[:-30]
x_test, y_test = X[-30:], y[-30:]

In [58]:
x_train = pd.DataFrame(x_train, columns = ['Open', 'High', 'Low'])
x_test = pd.DataFrame(x_test, columns = ['Open', 'High', 'Low'])
y_train = pd.DataFrame(y_train, columns = ['Close'])
y_test = pd.DataFrame(y_test, columns = ['Close'])
x_test.index = X[-30:].index
y_test.index = y[-30:].index

Usando o modelo de regressão XGBRegressor

In [59]:
model = xgboost.XGBRegressor()
model.fit(x_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=1, enable_categorical=False,
             gamma=0, gpu_id=-1, importance_type=None,
             interaction_constraints='', learning_rate=0.300000012,
             max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
             monotone_constraints='()', n_estimators=100, n_jobs=8,
             num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0,
             reg_lambda=1, scale_pos_weight=1, subsample=1, tree_method='exact',
             validate_parameters=1, verbosity=None)

In [60]:
y_pred = model.predict(x_test)
y_pred = pd.DataFrame(y_pred, columns = ['Predição'])

In [66]:
#Gerando as predições e avaliando o modelo com várias métricas (MSE, MAE, RMSE, MAPE, R2)
timeseries_evaluation_metrics_func(y_test, y_pred)

MSE: 27.928073724940717
MAE: 4.426366072330733
RMSE: 5.284701857715411
MAPE: 26093.574620958636
R2: 0.9880871187238558


In [65]:
fig1 = px.line(y_train[-30:].rename(columns = {'Close': 'Treino'}), title = 'Predição de valores de ações do Facebook usando XGBoost (12/01/2022 - 25/02/2022)')
fig1.data[0].line.color = "#0000ff"
fig1.data[0].x = pd.to_datetime(df['Date'][-60:-30])
fig2 = px.line(y_test.rename(columns = {'Close': 'Teste'}))
fig2.data[0].line.color = "#ff0000"
fig2.data[0].x = pd.to_datetime(df['Date'])[-30:]
fig1.add_trace(fig2.data[0])
fig3 = px.line(y_pred)
fig3.data[0].line.color = "#ffa500"
fig3.data[0].x = pd.to_datetime(df['Date'])[-30:]
fig1.add_trace(fig3.data[0])
fig1.update_xaxes(title_text = 'Data')
fig1.update_yaxes(title_text = 'Valores (em US$)')
fig1.show()