# Time Series Analysis in Python
## Introduction
Time series analysis is the study of a sequence of data points collected over time. It is used in a variety of fields, including finance, economics, and forecasting. In this guide, we'll explore the key concepts and techniques of time series analysis using Python.\
### Stationarity
Stationarity is a crucial concept in time series analysis. A time series is said to be stationary if its statistical properties, such as mean and variance, do not change over time. Stationary time series are easier to model and analyze.
#### Augmented Dickey-Fuller (ADF) Test
The Augmented Dickey-Fuller (ADF) test is a statistical test used to determine if a time series is stationary. The null hypothesis of the ADF test is that the time series has a unit root, meaning it is non-stationary.

In [None]:
from statsmodels.tsa.stattools import adfuller

def test_stationarity(time_series):
    """Perform the Augmented Dickey-Fuller test on a time series."""
    adf_test = adfuller(time_series)
    print(f"Augmented Dickey-Fuller Test Statistic: {adf_test[0]}")
    print(f"p-value: {adf_test[1]}")
    if adf_test[1] < 0.05:
        print("The time series is stationary.")
    else:
        print("The time series is non-stationary.")

### Autocorrelation and Partial Autocorrelation
Autocorrelation is the correlation of a time series with itself at different time lags. Partial autocorrelation is the correlation of a time series with its own lagged values, with the linear dependence on the intervening values removed.

In [None]:
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

def plot_autocorrelation(time_series):
    """Plot the autocorrelation and partial autocorrelation functions."""
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
    plot_acf(time_series, ax=ax1)
    plot_pacf(time_series, ax=ax2)
    ax1.set_title("Autocorrelation Function")
    ax1.set_xlabel("Lags")
    ax1.set_ylabel("Correlation")
    ax2.set_title("Partial Autocorrelation Function")
    ax2.set_xlabel("Lags")
    ax2.set_ylabel("Correlation")
    plt.show()

### Autoregressive Integrated Moving Average (ARIMA) Model
The ARIMA model is a popular time series forecasting model that combines autoregressive (AR) and moving average (MA) components. The 'I' in ARIMA stands for 'Integrated', which refers to the process of differencing the time series to make it stationary.

The parameters of the ARIMA model are denoted as $(p, d, q)$, where:

- $p$ (Autoregressive Order): The autoregressive order, or the number of lagged values of the time series that are included in the model. The AR component models the dependence of the current value on the previous values of the time series.

Example: If $p = 2$, the model includes the last two values of the time series as predictors.
The AR component is represented as: $y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \epsilon_t$
where $\phi_1, \phi_2, ..., \phi_p$ are the autoregressive coefficients.


- $d$ (Degree of Differencing): The degree of differencing, or the number of times the time series needs to be differenced to become stationary. The 'I' in ARIMA stands for 'Integrated', which refers to the process of differencing the time series.

Example: If $d = 1$, the model uses the first-order difference of the time series ($y_t - y_{t-1}$) as the new time series.
Differencing helps remove trends and non-stationarity in the time series.


- $q$ (Moving Average Order): The moving average order, or the number of lagged forecast errors that are included in the model. The MA component models the dependence of the current value on the previous forecast errors.

Example: If $q = 2$, the model includes the last two forecast errors as predictors.
The MA component is represented as: $y_t = c + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ... + \theta_q \epsilon_{t-q} + \epsilon_t$
where $\theta_1, \theta_2, ..., \theta_q$ are the moving average coefficients.



The general form of the ARIMA model is:
$$y_t = c + \phi_1 y_{t-1} + \phi_2 y_{t-2} + ... + \phi_p y_{t-p} + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + ... + \theta_q \epsilon_{t-q} + \epsilon_t$$
where $y_t$ is the time series value at time $t$, $c$ is a constant term, and $\epsilon_t$ is the error term (the difference between the actual value and the predicted value).

In [None]:
from statsmodels.tsa.arima.model import ARIMA

def fit_arima_model(time_series, p, d, q):
    """Fit an ARIMA model to a time series."""
    model = ARIMA(time_series, order=(p, d, q))
    model_fit = model.fit()
    return model_fit

def forecast_arima(time_series, p, d, q, steps):
    """Forecast the next 'steps' values using an ARIMA model."""
    model = ARIMA(time_series, order=(p, d, q))
    model_fit = model.fit()
    forecast = model_fit.forecast(steps=steps)
    return forecast

### Example: Generated Data

In [None]:
# Generate a time series with trend and seasonality
import pandas as pd
import numpy as np

np.random.seed(42)
time = pd.date_range(start='2020-01-01', end='2022-12-31', freq='ME')
trend = np.linspace(100, 200, len(time))
seasonality = 20 * np.sin(2 * np.pi * np.arange(len(time)) / 12)
noise = np.random.normal(0, 10, len(time))
data = trend + seasonality + noise

# Create a DataFrame with the generated time series
synthetic_data = pd.DataFrame({'value': data}, index=time)
synthetic_data.head()

In [None]:
# Style
import matplotlib as mpl
plt.style.use("default")
mpl.rcParams["text.usetex"] = True
mpl.rcParams["font.family"] = "serif"

# Plot
plt.plot(synthetic_data, "ro-")
plt.xticks(rotation=45)
plt.xlabel("Month")
plt.ylabel("Value")

In [None]:
# Test for stationarity
test_stationarity(synthetic_data["value"])

# Plot autocorrelation and partial autocorrelation
plot_autocorrelation(synthetic_data["value"])

# Fit an ARIMA model
p, d, q = 2, 1, 2
model_fit = fit_arima_model(synthetic_data["value"], p, d, q)
print(model_fit.summary())

# Forecast future values
forecast = forecast_arima(synthetic_data["value"], p, d, q, 12)
print("Forecast for the next 12 months:")
print(forecast)

In [None]:
plt.plot(synthetic_data, "ro-")
plt.plot(forecast, "go-")
plt.xticks(rotation=45)
plt.xlabel("Month")
plt.ylabel("Value")

### Example: Real Data

In [None]:
# Load the airline passenger data
airline_data = pd.read_csv('data/AirPassengers.csv', index_col='Month')
airline_data.index = pd.to_datetime(airline_data.index)
airline_data.head()

In [None]:
plt.plot(airline_data.values, "ro-")
plt.xticks(rotation=45)
plt.xlabel("Month")
plt.ylabel("Value")

In [None]:
# Test for stationarity
test_stationarity(airline_data)

# Plot autocorrelation and partial autocorrelation
plot_autocorrelation(airline_data)

# Fit an ARIMA model
p, d, q = 2, 1, 2
model_fit = fit_arima_model(airline_data, p, d, q)
print(model_fit.summary())

# Forecast future values
forecast = forecast_arima(airline_data, p, d, q, 12)
print("Forecast for the next 12 months:")
print(forecast)

In [None]:
plt.plot(airline_data, "ro-")
plt.plot(forecast, "go-")
plt.xticks(rotation=45)
plt.xlabel("Month")
plt.ylabel("Value")