<a href="https://colab.research.google.com/github/pchernic/Classification-/blob/main/ARIMA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#ARIMA

ARIMA stands for "Autoregressive Integrated Moving Average." It is a popular statistical method used for analyzing and forecasting time series data. ARIMA is particularly effective for modeling time series data that exhibits patterns and trends.

ARIMA combines three components to model the time series:

1. **Autoregressive (AR) Component:** This component models the relationship between the current value of the time series and its past values. It assumes that the current value can be predicted based on a linear combination of its own previous values.

2. **Integrated (I) Component:** This component involves differencing the time series data to make it stationary. Stationarity is important because many time series models, including ARIMA, work best with stationary data. Differencing involves subtracting the current value from the previous value to remove trends and make the series stationary.

3. **Moving Average (MA) Component:** This component models the relationship between the current value of the time series and the past forecast errors (residuals) from the AR component. It assumes that the current value can be predicted based on a linear combination of past forecast errors.

ARIMA models are typically denoted as ARIMA(p, d, q), where:
- p: The number of autoregressive terms.
- d: The number of differences needed to make the series stationary.
- q: The number of moving average terms.

The process of working with ARIMA involves the following steps:

1. **Data Preparation:** Collect and preprocess the time series data. This may involve checking for missing values, outliers, and trends.

2. **Stationarity:** Check whether the time series is stationary. If not, apply differencing to make it stationary.

3. **ACF and PACF Analysis:** Use the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots to identify the values of p and q for the AR and MA components.

4. **Model Selection:** Based on the ACF and PACF analysis, select appropriate values of p, d, and q for the ARIMA model.

5. **Model Fitting:** Fit the ARIMA model to the preprocessed data using the chosen parameters.

6. **Model Evaluation:** Evaluate the model's performance using various metrics and diagnostics. Common metrics include Mean Absolute Error (MAE), Mean Squared Error (MSE), and visualization of residuals.

7. **Forecasting:** Use the fitted ARIMA model to make future predictions.

ARIMA models provide a useful framework for time series forecasting and analysis. However, they may not be suitable for all types of time series data, especially those with complex patterns or irregularities. More advanced models, such as Seasonal ARIMA (SARIMA) or machine learning techniques like LSTM (Long Short-Term Memory), are often used for more challenging time series data.

In [None]:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt


from statsmodels.tsa.arima.model import ARIMA

In [None]:
!pip install scipy

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
!pip install pmdarima

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from pmdarima.arima import auto_arima

In [None]:
from xgboost import XGBRegressor

In [None]:
treino = df.loc[df.index <= '2016-12-31']
validacao = df.loc[df.index > '2016-12-31']

treino.shape, validacao.shape

((384, 1), (24, 1))

In [None]:
treino.index.min(), treino.index.max()

(Timestamp('1985-01-01 00:00:00'), Timestamp('2016-12-01 00:00:00'))

In [None]:
validacao.index.min(), validacao.index.max()

(Timestamp('2017-01-01 00:00:00'), Timestamp('2018-12-01 00:00:00'))

In [None]:
treino['producao']

DATE
1985-01-01     71.5920
1985-02-01     69.7870
1985-03-01     61.6790
1985-04-01     56.7479
1985-05-01     54.6165
                ...   
2016-08-01    113.7734
2016-09-01    100.7221
2016-10-01     89.5068
2016-11-01     91.2292
2016-12-01    112.3141
Name: producao, Length: 384, dtype: float64

In [None]:
treino['producao'].shift(-1)

DATE
1985-01-01     69.7870
1985-02-01     61.6790
1985-03-01     56.7479
1985-04-01     54.6165
1985-05-01     57.3509
                ...   
2016-08-01    100.7221
2016-09-01     89.5068
2016-10-01     91.2292
2016-11-01    112.3141
2016-12-01         NaN
Name: producao, Length: 384, dtype: float64

In [None]:
treino['target'] = treino['producao'].shift(-1)
treino.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
1985-01-01,71.592,69.787
1985-02-01,69.787,61.679
1985-03-01,61.679,56.7479
1985-04-01,56.7479,54.6165
1985-05-01,54.6165,57.3509


In [None]:
treino.tail()

Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-08-01,113.7734,100.7221
2016-09-01,100.7221,89.5068
2016-10-01,89.5068,91.2292
2016-11-01,91.2292,112.3141
2016-12-01,112.3141,


In [None]:
treino = treino.dropna()
treino.tail()

Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2016-07-01,112.4736,113.7734
2016-08-01,113.7734,100.7221
2016-09-01,100.7221,89.5068
2016-10-01,89.5068,91.2292
2016-11-01,91.2292,112.3141


In [None]:
validacao['target'] = validacao['producao'].shift(-1)
validacao.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-01,114.8282,98.2191
2017-02-01,98.2191,99.6408
2017-03-01,99.6408,85.9106
2017-04-01,85.9106,89.2053
2017-05-01,89.2053,99.1945


In [None]:
validacao.tail()

Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-08-01,113.0449,101.4058
2018-09-01,101.4058,94.4922
2018-10-01,94.4922,101.3895
2018-11-01,101.3895,110.5936
2018-12-01,110.5936,


In [None]:
validacao = validacao.dropna()
validacao.tail()

Unnamed: 0_level_0,producao,target
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1
2018-07-01,113.0893,113.0449
2018-08-01,113.0449,101.4058
2018-09-01,101.4058,94.4922
2018-10-01,94.4922,101.3895
2018-11-01,101.3895,110.5936


In [None]:
X_treino = treino.loc[:, ['producao']].values
y_treino = treino.loc[:, ['target']].values
X_validacao = validacao.loc[:, ['producao']].values
y_validacao = validacao.loc[:, ['target']].values

X_treino.shape, y_treino.shape, X_validacao.shape, y_validacao.shape

((383, 1), (383, 1), (23, 1), (23, 1))

In [None]:
modelo_xgba = XGBRegressor(objective="reg:squarederror", n_estimators=1000)
modelo_xgba.fit(X_treino, y_treino)

XGBRegressor(n_estimators=1000, objective='reg:squarederror')

In [None]:
validacao.iloc[0]

producao    114.8282
target       98.2191
Name: 2017-01-01 00:00:00, dtype: float64

In [None]:
predicao = modelo_xgba.predict(X_validacao)
predicao

array([103.73107 , 103.76478 , 108.55515 ,  79.2563  ,  92.45105 ,
        92.51808 , 101.939224, 109.148705,  95.981   ,  99.392525,
        95.981   , 102.47991 , 111.25085 ,  98.55811 ,  96.829605,
        89.18475 ,  87.32575 , 100.30612 , 111.92876 , 111.92876 ,
       110.77199 ,  96.6278  , 110.77199 ], dtype=float32)

In [None]:
validacao["pred"] = predicao
validacao.head()

Unnamed: 0_level_0,producao,target,pred
DATE,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-01-01,114.8282,98.2191,103.731071
2017-02-01,98.2191,99.6408,103.764778
2017-03-01,99.6408,85.9106,108.555153
2017-04-01,85.9106,89.2053,79.256302
2017-05-01,89.2053,99.1945,92.45105


In [None]:
mean_squared_error(X_validacao, predicao)

51.31568655701552