### 1. Define the GARCHX model

Mean Model
$$ r_t = \mu + \sigma_t z_t $$ 
$$z_t \sim i.i.d\ N(0, 1)$$
where $e_t = \sigma_t z_t$.

Volatility Model 
$$ \sigma_t^2 = \omega +  \alpha e_{t-1}^2 + \beta \sigma_{t-1}^2 + 
    \gamma x_t^2$$

### 2. Log likelihood
For simplicity, assume $\mu=0$. Let $\theta = (\omega, \alpha, \beta, \gamma)$ . Then the log likelihood function is defined as a function of the conditional densities of $r_t$ as such: 
 $$ l(\theta) =  \sum_{t=1}^T {\frac{1}{2} ({-\log{2\pi} -\log{\sigma_t^2} - \frac{e_t^2}{\sigma_t^2}})} $$

The first and second partial derivatives of $l(\theta)$ are as follows:
$$ \frac{\partial }{\partial \theta} l(\theta) =  \sum_{t=1}^T {\frac{1}{2} ({\frac{e_t^2}{\sigma_t^4} - \frac{1}{\sigma_t^2}})} \frac{\partial \sigma_t^2}{\partial \theta}$$

$$ \begin{split}
\frac{\partial^2 }{\partial \theta_1 \partial \theta_2} l(\theta) &= - \frac{1}{2} \sum_{t=1}^T\left( \frac{\partial^2 \sigma_t^2}{\partial \theta_1 \partial \theta_2} (\frac{1}{\sigma_t^2} - \frac{e_t^2}{\sigma_t^4}) + 
\frac{\partial \sigma_t^2}{\partial \theta_1} \frac{\partial \sigma_t^2}{\partial \theta_2} (\frac{2e_t^2}{\sigma_t^6} - \frac{1}{\sigma_t^4})\right) 

\end{split} $$


### 3. Partial Derivatives of $\sigma^2$

This section supplements the previous section, as the first and second partial derivatives of $\sigma^2$ are used in the score funtion and information matrix.

#### 3.1 First Partial Derivative

$$ \begin{split}
    \frac{\partial \sigma_{t}^2}{\partial \omega} &= 1 + \beta \frac{\partial \sigma_{t-1}^2}{\partial \omega} 
\end{split} $$


$$ \begin{split}
    \frac{\partial \sigma_{t}^2}{\partial \alpha} &= e_{t-1}^2 + \beta \frac{\partial \sigma_{t-1}^2}{\partial \alpha}
\end{split} $$

 
$$ \begin{split}
    \frac{\partial \sigma_{t}^2}{\partial \beta} &= \sigma_{t-1}^2 + \beta \frac{\partial \sigma_{t-1}^2}{\partial \beta}
\end{split} $$


$$ \begin{split}
    \frac{\partial \sigma_{t}^2}{\partial \gamma} &= x_{t-1}^2 + \beta \frac{\partial \sigma_{t-1}^2}{\partial \gamma}
\end{split} $$


Unconditional Expectation of first derivatives of $\sigma^2$

$$ E\left[\frac{\partial \sigma_{t}^2}{\partial \omega}\right] = \frac{1}{1-\beta} $$

$$ E\left[\frac{\partial \sigma_{t}^2}{\partial \alpha}\right] = \frac{\omega + \gamma \mu_{x^2}}{(1-\beta)(1-\alpha-\beta)} $$

$$ E\left[\frac{\partial \sigma_{t}^2}{\partial \beta}\right] = \frac{\omega + \gamma \mu_{x^2}}{(1-\beta)(1-\alpha-\beta)} $$

$$ E\left[\frac{\partial \sigma_{t}^2}{\partial \gamma}\right] = \frac{\mu_{x^2}}{1-\beta} $$


#### 3.2. Second Derivative to $\sigma^2$
There exist two cases. 

a). If both parameters $ \theta_i, \theta_j $ are not $\beta$,
$$ \begin{split}
    \frac{\partial^2 \sigma_{t}^2}{\partial \theta_i \theta_j} &= \beta \frac{\partial^2 \sigma_{t-1}^2}{\partial \theta_i \theta_j} 
\end{split} $$

and 

$$ E\left[\frac{\partial^2 \sigma_{t}^2}{\partial\theta_i \partial\theta_j}\right] = 0 $$


b). If at least one of the parameters $\theta_i, \theta_j$ is $\beta$,
$$ \begin{split}
    \frac{\partial^2 \sigma_{t}^2}{\partial \beta \theta_j} &= \frac{\partial \sigma_{t-1}^2}{\partial \theta_j } + \beta \frac{\partial^2 \sigma_{t-1}^2}{\partial \beta \theta_j} 
\end{split} $$

and 

$$ E\left[\frac{\partial^2 \sigma_{t}^2}{\partial\beta\partial\omega}\right] = \frac{1}{(1-\beta)^2}$$

$$ E\left[\frac{\partial^2 \sigma_{t}^2}{\partial\beta\partial\alpha}\right] = \frac{\omega + \gamma \mu_{x^2}}{(1-\beta)^2(1-\alpha-\beta)} $$

$$ E\left[\frac{\partial^2 \sigma_{t}^2}{\partial\beta^2}\right] = \frac{\omega + \gamma \mu_{x^2}}{(1-\beta)^2(1-\alpha-\beta)} $$

$$ E\left[\frac{\partial^2 \sigma_{t}^2}{\partial\beta\partial\gamma}\right] = \frac{1}{(1-\beta)^2}$$


### 4 Stationarity of time series.
#### 4.1 Stationarity of $\sigma_t^2$
We know that $e_t$ = $\sigma_t$ $z_t$, \
$since \ z_t \ is \sim i.i.d \ N(0,1)$
$$ \begin{split}
E[e_t^2] &= E[\sigma_t^2 z_t^2]\\
&= E[\sigma_t^2] E[z_t^2] \\
&= E[\sigma_t^2]
\\
\end{split} $$

*Note that $E[z_t^2]$ is a expectation of a chi-squared distribution of degree of freedom 1*

We use the fact that $E[e_t^2]$ = $E[\sigma_t^2]$

$$ \begin{split}
    E[\sigma_t^2] &= E[\omega + \alpha \sigma_{t-1}^2 + \beta e_{t-1}^2 + \gamma x_t^2] \\
    &= E[\omega] + \alpha E[\sigma_{t-1}^2] + \beta E[e_{t-1}^2] + \gamma E[x_t^2] \\
    &= \omega + \gamma E[x_t^2] + \alpha E[\sigma_{t-1}^2] + \beta E[e_{t-1}^2] \\
    &= \omega + \gamma k + (\alpha + \beta) E[\sigma_{t-1}^2] \\
    &= \omega + \gamma k + (\alpha + \beta) E[\omega + \alpha \sigma_{t-2}^2 + \beta e_{t-2}^2 + \gamma x_{t-1}^2] \\
    &= \ldots \\
    &= (\omega + \gamma k) + [1+ (\alpha + \beta) + (\alpha + \beta)^2 + (\alpha + \beta)^3 + \ldots] \\
    &= \frac{\omega + \gamma k}{1-(\alpha+\beta)}
\end{split} $$
where k = $E[x_t^2]$ which is a constant, given that we assume $x_t^2$ is stationary 

From this, we know that $|\alpha + \beta|$ < 1.

\
Since, $x_t^2$ is stationary, then $E[x_t^2, x_{t-s}^2]$ is not dependent on t

Following that $x_t^2$ is stationary,

We prove that
$$ \begin{split}
Cov[\sigma_t^2,\sigma_{t-s}^2] &= E[\sigma_t^2 \sigma_{t-s}^2] - E[\sigma_t^2]E[\sigma_{t-s}^2] \\
&= E[\sigma_t^2 \sigma_{t-s}^2] - (\frac{\omega + \gamma k}{1-(\alpha+\beta)}) ^2\\
\\
\end{split} $$

We found 
$$ \begin{split}
E[\sigma_t^2 \sigma_{t-s}^2] &= E[(\omega + \alpha \sigma_{t-1}^2 + \beta e_{t-1}^2 + \gamma x_t^2)(\omega + \alpha \sigma_{t-2}^2 + \beta e_{t-2}^2 + \gamma x_{t-1}^2)] \\
&= \ldots \\
&= (\omega^2 + 2\gamma \omega k) + (2 \alpha \omega + 2 \beta \omega + 2 \alpha \gamma k + 2 \beta \gamma k)(\frac{\omega + \gamma k}{1-(\alpha+\beta)}) + (\alpha + \beta) ^ 2 E[\sigma_{t-1}^2 \sigma_{t-s-1}^2] + \gamma^2 E[x_t^2 x_{t-s}^2]

\end{split} $$

Since we know $x_t$ is stationary, 
$Cov[x_t^2,x_{t-s}^2] = E[x_t^2 x_{t-s}^2] - E[x_t^2]E[x_{t-s}^2]$ does not depend on t, 

Thus we can conclude that $E[x_t^2 x_{t-s}^2]$ does not depend on t 

This shows that $Cov[\sigma_t^2,\sigma_{t-s}^2]$ does not depend on t.



We see that the following is a stationary AR(1) process now as $(\alpha+\beta) < 1 \implies  (\alpha+\beta)^2 < 1$:

$$\begin{split} &E[\sigma_t^2 \sigma_{t-s}^2] = (w^2 + 2\gamma \omega k) + 2(\alpha\omega + \beta\omega + \alpha\gamma k + \beta\gamma k) \frac{\omega+\gamma k}{1-(\alpha+\beta)} + (\alpha+\beta)^2 E(\sigma_{t-1}^2\sigma_{t-s-1}^2) + \gamma^2 E(x_t^2 x_{t-s}^2) \\

&\implies E[\sigma_t^2 \sigma_{t-s}^2] \text{ does not depend on }t \\
&\implies cov(\sigma_t^2, \sigma_{t-s}^2) = E[\sigma_t^2 \sigma_{t-s}^2] - \left(\frac{\omega+\gamma k}{1-(\alpha+\beta)}\right)^2\text{ does not depend on }t\\

&\therefore \{\sigma_t^2\} \text{ is stationary}.

\end{split}$$


#### 4.2 Stationary of $r_t$

First note that $z_t \sim i.i.d \ N(0,1) \implies E[z_t] = 0$

First, we show $E[r_t]$ is constant. 
$$\begin{split}
E[r_t] &= E[\mu + \sigma_t z_t] \\
        &= \mu + E[\sigma_t z_t] \\
        &= \mu + E[\sigma_t]E[z_t] \\
        &= \mu
\end{split} $$

Next, we show finite variance of $r_t$:
$$\begin{split}
Var(r_t) &= E[\sigma_t^2] \\ 
        &= \frac{\omega + \gamma k}{1-(\alpha+\beta)}
\end{split} $$

Lastly, we show that the autocovariance of $r_t$ is not time dependent:
$$\begin{split}
Cov(r_t, r_{t-r}) &= Cov(e_t, e_{t-r}) \\ 
        &= E[e_t e_{t-r}] -E[e_t] E[e_{t-r}] \\
        &= 0
\end{split} $$








#### 4.3 Convergence of e_t

We know that $e_t$ = $\sigma_t$ $z_t$

and $z_t$ is
$$z_t  \sim i.i.d \ N(0, 1)$$

Therefore, 
$$ \begin{split}
    E[e_{t}| \mathcal{F}_{t-1}] &= E[\sigma_t z_t| \mathcal{F}_{t-1}] \\
    &= E[z_{t}| \mathcal{F}_{t-1}] \ E[\sigma_t| \mathcal{F}_{t-1}] \\
    &= 0
\end{split} $$
we conclude that $e_t$ is an MDS sequence

For simplicity, we assume our time series is ergodic.



By the Martingale CLT, as $e_t$ is stationary and ergodic Martingale difference, 
$$ \frac{1}{\sqrt{n}} (\sum_{t=1}^n e_t) \xrightarrow[]{\text{d}} N(0, \sigma^2) $$ 



### 5 Consistency of the MLE.

Let the score function be defined as $s(\theta) = \frac{\partial}{\partial\theta} l(\theta)$ and the information $I(\theta) = \frac{\partial^2}{\partial\theta^2} l(\theta)$. 

Under suitable regularity conditions, and having shown that the time series is stationary and ergodic,

$$ \hat{\theta} \xrightarrow[]{\text{p}} \theta_0$$


### 6 Asymptotic normality of the MLE.

First, we consider the distribution of the true score $s(\theta_0)$. Under the regularity conditions, it can be shown that $E(s_n(\theta_0)) = 0$  and $Var(s_n(\theta_0)) = I_E(\theta_0)$. Furthermore as we have shown that $r_t $ converges in distribution by the Martingale CLT, the score function

$$\begin{split}
 \frac{1}{\sqrt{n}} s_n(\theta_0) &= \sum_{t=1}^n \frac{\partial}{\partial\theta}\log f(r_t | \theta_0) \\ 
 &\xrightarrow[]{\text{d}} N(0, I_E(\theta_0))
 
\end{split}$$

converges in distribution to a multivariate normal distribution $N(0, I_E(\theta_0))$.


Next, consider a first order multivariate Taylor expansion of the score function at $\theta_0$ about $\hat{\theta}$:

$$ s_n(\theta_0) \approx s_n(\hat{\theta}) - I_n(\theta_0) (\theta_0 - \hat{\theta}) $$

Since by definition $s_n(\hat{\theta}) = 0$,
$$\begin{split}
 s_n(\theta_0) &= - I_n(\theta_0) (\theta_0 - \hat{\theta}) \\
 (\hat{\theta} - \theta_0) &=  I_n(\theta_0)^{-1} s_n(\theta_0) \\
 \sqrt{n}(\hat{\theta} - \theta_0) &=  \sqrt{n} I_n(\theta_0)^{-1} s_n(\theta_0) 
\end{split} $$

As shown earlier,

$$ \frac{1}{\sqrt{n}} s_n(\theta_0)  \xrightarrow[]{\text{d}} N(0, I_E(\theta_0)) $$ 

and by the law of large numbers,

$$\frac{1}{n} I_n(\theta_0) = \frac{1}{n} \sum_{t=1}^n \frac{\partial^2}{\partial\theta^2} \log(f(x_i|\theta))  \xrightarrow[]{\text{p}} I_E(\theta_0)$$ 


Therefore by Slutsky's theorem,

$$ \sqrt{n}(\hat{\theta} - \theta_0) \xrightarrow[]{\text{d}} I_E(\theta_0)^{-1} N(0, I(\theta_0)) = N(0, I_E(\theta_0)^{-1}) $$

---

### 7. Asymptotic distribution of the MLE

The fisher information is given by 

$$ I(\theta) = - \begin{pmatrix}
\frac{\partial^2 l}{\partial \omega^2} & \frac{\partial^2 l}{\partial \omega \partial \alpha} & \frac{\partial^2 l}{\partial \omega \partial \beta} & \frac{\partial^2 l}{\partial \omega \partial \gamma} \\

\frac{\partial^2 l}{\partial \alpha \partial \omega} & \frac{\partial^2 l}{\partial \alpha^2} & \frac{\partial^2 l}{\partial \alpha \partial \beta} & \frac{\partial^2 l}{\partial \alpha \partial \gamma} \\

\frac{\partial^2 l}{\partial \beta \partial \omega} & \frac{\partial^2 l}{\partial \beta \partial \alpha} & \frac{\partial^2 l}{\partial \beta^2} & \frac{\partial^2 l}{\partial \beta \partial \gamma} \\

\frac{\partial^2 l}{\partial \gamma \partial \omega} & \frac{\partial^2 l}{\partial \gamma \partial \alpha} & \frac{\partial^2 l}{\partial \gamma \partial \beta} & \frac{\partial^2 l}{\partial \gamma^2} \\
\end{pmatrix}$$

where each partial derivative is given above.




Assuming consistency (we probably prove this before this part?) / If the regularity conditions hold, the asymptotic distribution of the MLE $\hat{\theta}$ converges to a multivariate normal distribution with expected value $\theta$ and variance $I(\theta)^{-1}$,i.e. $$\hat{\theta} \sim MVN_d(\theta_0, I(\hat{\theta})^{-1})$$


### 9. Consistency of $\sigma_t^2$ and $r_t^2$
to do

---

### Asymtotic distribution of parameters.

1. Find first partial derivative for each parameter. For example for $\gamma$,

$$ \begin{split}
    \frac{\partial l}{\partial \gamma} &= \frac{\partial l}{\partial \sigma^2} \frac{\partial \sigma^2}{\partial \gamma} \\
    &= \sum_{t=1}^T \frac{x_t^2}{2\sigma_t^2} (\frac{e_{t}^2}{\sigma_t^2} -1)
\end{split} $$



2.  Find second partial derivative for each parameter, to each parameter. For four parameters, we will have 16 partial derivatives. For instance,

$$ \begin{split}
    \frac{\partial^2 l}{\partial \gamma^2} &= \frac{\partial}{\partial \sigma^2} (\frac{\partial l }{\partial \gamma}) \frac{\partial \sigma^2}{\partial \gamma} \\
    &= \sum_{t=1}^T \frac{x_t^4}{\sigma_t^4} (\frac{1}{2} - \frac{e_t^2}{\sigma_t^2})
\end{split} $$

3. Find the expectation of the negative of each second partial derivative. Question: do we need to find the expectation, or is the raw form of the second derivative (such as shown in step 2) sufficient?

4. This 4x4 matrix then forms our fisher information matrix, $I_E(\theta)$ where $\theta$ is the vector of our parameters.

5. The asymptotic distribution of our parameters follows a multi variate normal distribution with mean $(\omega_0, \alpha_0, \beta_0, \gamma_0), $ and variance as the inverse fisher information matrix mentioned in step 3. 

6. Using the variance, we can find the p value associated with each parameter.

---
Questions:
1. Do we need the regularity conditions for the log likelihood function to hold in order for asymptotic normality and consistency to hold? If so, is this where our assumptions of parameter boundaries and distributions come into place?  


## Functions for first derivative

In [2]:
import numpy as np
import pandas as pd
"""
    Compute the partial derivative of l with respect to alpha.
    
    Parameters:
    x : np.array
        The input time series data.
    sigma_squared : np.array
        The variance values \( \sigma_t^2 \) for each time step.
    e : np.array
        The error terms \( e_t \) for each time step.
    
    Returns:
    float
        The computed derivative value.
    """

### a.
def partial_l_gamma(x, sigma_squared, e):
    T = len(x)
    derivative = np.sum((x**2 / (2 * sigma_squared)) * ((e**2 / sigma_squared) - 1))
    return derivative

### b.
def partial_l_omega(sigma_squared, e):
    T = len(x)
    derivative = np.sum((1 / (2 * sigma_squared)) * ((e**2 / sigma_squared) - 1))
    return derivative

### c.
def partial_l_alpha(x, sigma_squared, e):
    T = len(x)
    derivative = np.sum((x**2 / (2 * sigma_squared)) * ((e**2 / sigma_squared) - 1))
    return derivative

### d.
def partial_l_beta(x, sigma_squared, e):
    sigma_squared_tminus1 = sigma_squared.shift(1)
    T = len(x)
    derivative = np.sum((sigma_squared_tminus1 / (2 * sigma_squared)) * ((e**2 / sigma_squared) - 1))
    return derivative


Proving consistency: <a href='https://en.wikipedia.org/wiki/Maximum_likelihood_estimation#Consistency'>Consistency</a>
Regulartiy conditions: <a href='https://en.wikipedia.org/wiki/Fisher_information#Regularity_conditions'>Regularity, Fisher Information</a>

# Proof of consistency
1. Ergodicity,
2. Stationarity,
3. $\theta_0$ is not on the boundary of the parameter space.

If a time series $X_t$ is stationary and erogdic, then 
$$\frac{1}{T} \sum_{t=1}^T X_t \xrightarrow[]{\text{p}} \mu $$ 
where  $\mu = E[X_t] < \infty$.

### Ergodic Theorem

$$


### Other Notes
$x_t$ stationary and ergodic => $x_t^2$ stationary and ergodic


### Backlog:
- Testing raw data for stationary and ergodicity. 

In [4]:
import numpy as np
s0 = 90
u = 1.2
d = 0.8
r = 0.1
K = 100

s0 * (d - u*np.e**(-1*r)) - K

-125.72244114788363

In [5]:
90**2 * 1.2 * 0.8 * np.e**(-0.1)

7036.015762647621

In [7]:
1.2 * 0.8

0.96