Please don't forget to upvote this notebook. I am novice and it is need for me.

If you have any questions, do not hesitate to contact me.

# ARIMA tutorial with BTC price

# Part 1. ACF, PACF

## Stationarity in time series analysis

The concept of stationarity is important in time series analysis. Stationary processes are easier to analyze, as the way they change is predictable. [1]

Strong stationarity

The time series is called strong stationarity time series if its properties do not change when we shift it along the time index axis.

Week stationarity

The time series is called weak stationarity if:

\begin{align}
E(X) = \mu = const
\end{align}
\begin{align}
D(X) = \sigma^2 = const
\end{align}
\begin{align}
Corr(X_t, X_{t+\tau}) = \rho(\tau)
\end{align}

## AR(1)
\begin{align}
X_t = a_0 + a_1 X_{t-1} + \epsilon_t 
\end{align}

In which case AR(1) is stationary based on the definition of stationarity?

\begin{align}
if \left | a_1 \right | < 1
\end{align}

## AR(p)

\begin{align}
X_t = a_0 + a_1 X_{t-1} + a_2 X_{t-2} + ... + a_p X_{t-p} + \epsilon_t , ~ a_p \neq 0
\end{align}

## MA(q)

\begin{align}
X_t = \epsilon_t + b_1 \epsilon_{t-1} + b_2 \epsilon_{t-2} + ... + b_q \epsilon_{t-q} , ~ b_q \neq 0
\end{align}

MA(q) is stationary based on the definition of stationarity

## ARMA(p,q)

\begin{align}
X_t = a_0 + a_1 X_{t-1} + a_2 X_{t-2} + ... + a_p X_{t-p} + \epsilon_t + b_1 \epsilon_{t-1} + b_2 \epsilon_{t-2} + ... + b_q \epsilon_{t-q} , ~ b_q \neq 0, ~ a_p \neq 0
\end{align}

In which case ARMA(p,q) is stationary based on the definition of stationarity?

\begin{align*}
\hat{L}X_t=X_{t-1} \\
\hat{L} ^k X_t=X_{t-k} \\
(1-a_1 \hat{L}X_{t}-a_2 \hat{L}^2 X_{t} - ... - a_p \hat{L}^p X_{t}) = a_0 + (\epsilon_t+b_1 \hat{L} \epsilon_t + b_2 \hat{L}^2 \epsilon_t + ... + b_q \hat{L}^q \epsilon_t)\\
(1-a_1 \hat{L}-a_2 \hat{L}^2 - ... - a_p \hat{L}^p) X_{t} = a_0 + (1+b_1 \hat{L} + b_2 \hat{L}^2 + ... + b_q \hat{L}^q)\epsilon_t \\
P_p(\hat{L})X_t = a_0 + Q_q(\hat{L}) \epsilon_t \\
P_p(\hat{L}) = 1-a_1 \hat{L}-a_2 \hat{L}^2 - ... - a_p \hat{L}^p = (1-\frac{\hat{L}}{Z_1})(1-\frac{\hat{L}}{Z_2})...(1-\frac{\hat{L}}{Z_p})
\end{align*}

The sum of the stationary serieses is the stationary series therefore drop  $ Q_q(\hat{L}) $ from our consideration.

\begin{align*}
(1-\frac{\hat{L}}{Z_1})(1-\frac{\hat{L}}{Z_2})...(1-\frac{\hat{L}}{Z_p})X_t = a_0 \\
(1-\frac{\hat{L}}{Z_1})^{-1} = 1 + \frac{\hat{L}}{Z_i} + \frac{\hat{L}^2}{Z_{i}^{2}} + ... +\frac{\hat{L}^k}{Z_{i}^{k}} + ...
\end{align*}


Convergence of the series.

So that the series converges is necessary and sufficient:
\begin{align*}
\left | Z_i \right | > 1
\end{align*}

## ACF, PACF

Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. [2]

In time series analysis, the partial autocorrelation function (PACF) gives the partial correlation of a stationary time series with its own lagged values, regressed the values of the time series at all shorter lags. It contrasts with the autocorrelation function, which does not control for other lags. [3]

<img src="https://slideplayer.com/slide/13084309/79/images/33/TABLE+2.1%3A+Properties+of+the+ACF+and+PACF.jpg">


In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
# Import libraries
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib as mpl
from scipy import stats
import statsmodels.api as sm
import warnings
from itertools import product
from datetime import datetime
warnings.filterwarnings('ignore')
plt.style.use('seaborn-poster')

In [None]:
# Load data
df = pd.read_csv("/kaggle/input/bitcoin-historical-data/coinbaseUSD_1-min_data_2014-12-01_to_2019-01-09.csv")
df.head()

In [None]:
# Unix-time to 
df.Timestamp = pd.to_datetime(df.Timestamp, unit='s')

# Resampling to daily frequency
df.index = df.Timestamp
df = df.resample('D').mean()

# Resampling to monthly frequency
df_month = df.resample('M').mean()

# Resampling to annual frequency
df_year = df.resample('A-DEC').mean()

# Resampling to quarterly frequency
df_Q = df.resample('Q-DEC').mean()

In [None]:
# PLOTS
fig = plt.figure(figsize=[15, 7])
plt.suptitle('Bitcoin exchanges, mean USD', fontsize=22)

plt.subplot(211)
plt.plot(df.Weighted_Price, '-', label='By Days')
plt.legend()

plt.subplot(212)
plt.plot(df_month.Weighted_Price, '-', label='By Months')
plt.legend()

# plt.tight_layout()
plt.show()

In [None]:
n_lags = 20
lags = np.arange(1, n_lags+1)
lags

In [None]:
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
plot_acf(df_month.Weighted_Price, lags=lags, title = 'ACF BTC')
plt.xticks(np.arange(0, n_lags+1,2))
plt.show()

In [None]:
plot_pacf(df_month.Weighted_Price, lags=lags, title = 'PACF BTC')
plt.xticks(np.arange(0, n_lags+1,2))
plt.show()

Since on the correlogram of the series the first autocorrelation approaches one, we give the correlogram for the time series of the first difference. Detail explanation of this step will be discused in the next part of the tutorial.

In [None]:
df_month.head(5)

In [None]:
df_month.Weighted_Price[:5]

In [None]:
df_month['Weighted_Price_diff'] = df_month['Weighted_Price']
df_month.head(5)

In [None]:
df_month['Weighted_Price_diff'] = df_month.Weighted_Price_diff - df_month.Weighted_Price_diff.shift(1)
df_month.head(5)

In [None]:
# PLOTS
fig = plt.figure(figsize=[15, 7])
plt.suptitle('$\Delta$BTC, USD', fontsize=22)

plt.subplot(111)
plt.plot(df_month.Weighted_Price_diff, '-', label='$\Delta$BTC')
plt.legend()

# plt.tight_layout()
plt.show()

In [None]:
df_month.Weighted_Price[:5]

In [None]:
df_month.Weighted_Price_diff[:5]

In [None]:
df_month_without_1 = df_month.Weighted_Price_diff
df_month_without_1[:5]

In [None]:
df_month_without_1 = df_month_without_1.drop(df_month_without_1.index[[0]])
df_month_without_1[:5]

In [None]:
plot_acf(df_month_without_1, lags=lags, title = 'ACF $\Delta$BTC')
plt.xticks(np.arange(0, n_lags+1,2))
plt.show()

In [None]:
plot_pacf(df_month_without_1, lags=lags, title = 'PACF $\Delta$BTC')
plt.xticks(np.arange(0, n_lags+1,2))
plt.show()

There is a suspicion of the presence of a unit root in the series BTC.

Next part of my ARIMA tutorial will be about testing this point and ARIMA model (how from non stationarity ARMA model go to stationarity ARIMA model).

References

1. https://towardsdatascience.com/stationarity-in-time-series-analysis-90c94f27322

2. https://en.wikipedia.org/wiki/Autocorrelation

3. https://en.wikipedia.org/wiki/Partial_autocorrelation_function

4. https://slideplayer.com/slide/13084309/