# Theoretical Background of ARMA-GARCH


## Defining ARMA and GARCH

The *autoregressive moving average* (ARMA) representation of a time series combines a stationary autoregressive (AR) model with a stationary moving-average error process (MA). The *generalised autoregressive conditional heteroscedasticity* (GARCH) model is made up of two equations - the conditional mean equation and the conditional variance equation. By representing the conditional mean equation as an ARMA process, 


## Unconditional vs. Conditional Values

The unconditional mean and variance represents the mean and variance of the distribution and is assumed to be constant. On the other hand, the conditional mean and variance can change at every point in time, and hence depends on historical values (i.e. conditioned on past information). Volatility often forms in 'clusters', meaning that high volatility tends to be sustained over a certain time period. This forms the foundation of GARCH models.

## Return Distributions

(*from this point on, 'returns' refers to log-returns*)

Let $\mathcal{F_{t-1}}$ be the *filtration* of past returns, which is simply an *information set* of all of the observed past returns up to a time $t-1$. Let $r_t$ represent the series of log returns for $t=\{1,\dots,T\}$. If the distribution of returns is assumed to be *Normal*, one can write

\begin{equation}
    r_t \vert \mathcal{F_{t-1}} \sim N(\bar{r_t}, \sigma_t^2),
\end{equation}

where $\bar{r_t}$ is the conditional mean and $\sigma_t^2$ represents the conditional variance of returns. However, while the returns are often assumed to follow a Normal distribution, this is not the case with our data, which exhibits clear leptokursis and skew. This leads to 'fat-tails' and skewness. Therefore, multiple distributions forming a set $D = \{ d \}$ are considered:

- Normal Distribution (NORM)
- Generalized Error Distribution (GED)
- Student t Distribution (STD)
- Skewed Normal Distribution (SNORM)
- Skewed Generalised Error Distribution (SGED)
- Skewed Student t Distribution (SSTD)
- Generalized Hyperbolic Function Distribution (GHYP)
- Generalized Hyperbolic Skewed Student tDistribution
- Normal Inverse Gaussian Distribution (NIG)

Subsequently, the Akaike information criteria (AIC) will be used to assess the quality of each model. The AIC penalises a high number of estimated parameters and is hence a good criteria to obtain a parsimonious model, balancing goodness of fit and the number of parameters:

\begin{equation}
    AIC = 2k - 2\ln(\hat{L}),
\end{equation}

where $k$ is the number of estimated parameters and $\hat{L}$ is the maximum value of the likelihood function for the model.

## Conditional Mean Equation: ARMA

The conditional mean specifies the behaviour of the returns. In this case, we assume that the return series follows an ARMA model, which accounts for the possibility of autocorrelation and dependence on past error terms. The conditional mean equation is defined as

\begin{equation}
    r_t = c + \sum_{i=1}^{p} \phi_i r_{t-i} + \sum_{i=1}^{q} \theta_i \epsilon_{t-i} + \epsilon_t,
\end{equation}

where $c$ is a constant term, $\phi_i$ is the autoregressive coefficient, $\theta_i$ is the moving coefficient and $\epsilon_t$ is the *innovation* or *shock*. We allow for

\begin{equation}
    \epsilon_{t} \vert \mathcal{F_{t-1}} \sim d(0, \sigma_i^2), \quad d\subseteq D,
\end{equation}

where $D$ is the set of distributions defined previously.

## Conditional Variance Equation: GARCH

GARCH is a generalisation of the ARCH model. The ARCH model allows for many lags in the conditional variance, and the GARCH model extends it by also allowing for lags in the error terms. It is essentially the ARMA model, but for the conditional variance instead of the conditional mean. The conditional variance equation is hence defined as

\begin{equation}
    \sigma_t^2 = \omega + \sum_{i=1}^{q} \alpha_i \epsilon_{t-i}^2 + \sum_{i=1}^{p} \beta_i \sigma_{t-i}^2,
\end{equation}

where $\omega$ is a constant term, $\alpha_i$ is the GARCH error parameter, and $\beta_i$ is the autoregressive coefficient. In other words, $\alpha_i$ measures the *reaction* of the conditional volatility to market shocks, and $\beta_i$ measures the *persistence* of the conditional volatility. The parameters are constrained by:

\begin{equation}
    \alpha + \beta \leq 1
\end{equation}

for convergence. The sum $\alpha + \beta$ determines the rate of convergence of the conditional volatility to the long-term average level.