# Teacing Maximum Likelihood via Applications
$$
\\
$$

- (A) **Detecting ARCH effects**

$$
\\
$$

- (B) **Fitting AR(1)-ARCH(1) using 2pass regressions**

$$
\\
$$

- (C) **Fitting AR(1)-ARCH(1) using MLE**


$$
\\
$$

- (D) **Fitting AR(1)-GARCH(1,1) using MLE**

$$
\\
$$

- (E) **Comparison of Local and Global Optimization Routines**

# A. Detecting ARCH Effects  

We check whether there is evidence for heteroscedasticity in returns. For that, we fit an AR(1) to the return series and ask whether the resulting squared residuals show evidence of auto-correlation of order up to m. Note: You could fit any other mean equation

In [None]:
#Necessary Python packages
import pandas as pd
import pylab as plt

**We load ES 50 return panel** 



In [None]:
r_d = pd.read_csv('r_ES50_d_cleaned_realized_Nov2020.csv')

In [None]:
r_d.head(1)

**we select one time series, say the equal-weight portfolio's return to ask whether there is evidence for time-variation in the second moment of that time-series**

We answer this question by applying the Portmanteau Test to the squared residuals of an AR(1) fit to chosen return time-series. Notice: It is very unlikely that the conclusion depends on the  lag structure of the mean equation.

The Portmanteau test checks whether there is sufficient evidence for auto-correlation of lag length m in squared residuals. From our vol class you know already that squared residuals are a noisy proxy for variance

In [None]:
r_t = r_d["1/N"]

## Portmanteau Test on Squared AR(1) Residuals

In [None]:
from ipynb.fs.defs.Helper_TestingForARCHEffects import Portmanteau_Test

In [None]:
p = 1

In [None]:
m = 10

In [None]:
PT = Portmanteau_Test(r_t.values, p, m)

In [None]:
PT_pvl = PT[1]

In [None]:
print(PT_pvl)

### Observation:
    
- **We reject $H_0$ of a constant vol in $r_t$**
$$
\\
$$

- **We conclude $r_t$ is heteroscedastic**
$$
\\
$$

- **We now have to think about how best to account for stochastic volatility in returns**

## B. Fitting AR(1)-ARCH(1) using 2-Pass Regression

- **Quick and a little dirty approach**

$$
\\
$$

- **Yet, it is robust and provides good starting values for MLE**

In [None]:
#necessary packages
import numpy as np
import pandas as pd
import statsmodels.api as sm
import math
import scipy.optimize

## "Pass 1":  Fit AR(1)

to fit mean equation and to get residuals.   

In [None]:
y = r_t.iloc[1:].values

In [None]:
X = np.ones((y.shape[0], 2))

In [None]:
X[:,1] = r_t.iloc[:-1].values

In [None]:
ar1 = sm.OLS(y,X).fit()

In [None]:
eps = ar1.resid

**summary stats of regression**

In [None]:
print(ar1.summary())

### Observation:
    
- **Daily equity returns are NOT predictable**
$$
\\
$$

- **R2 of 0**
$$
\\
$$

- **AR(1) coefficient of -0.0017 with a t-stat of -0.1**
$$
\\
$$

- **Semi-strong Form of Market Efficiency: "Current and past information does not help to predict future prices"**

## "Pass 2":  Fit ARCH(1)

by fitting an AR(1) to squared AR(1) residuals of mean equation 

**Visualize Squared Residuals of Mean Equation**

In [None]:
plt.plot(eps**2)

**fit ARCH(1) part**

In [None]:
y = eps[1:]**2

In [None]:
X = np.ones((y.shape[0], 2))

In [None]:
X[:,1] = eps[:-1]**2

In [None]:
arch_1 = sm.OLS(y, X).fit()

**summary stats of ARCH(1) regression**

In [None]:
print(arch_1.summary())

### Observations

- **AR(1) coefficient in eps^2 is significant (t-stat of 9.3)**
$$
\\
$$

- **R2 is low (noisy variance measurement)**
$$
\\
$$

- **Huge kurtosis and sizeable positive skew**


## Are ARCH(1) Residuals Homoscedastic? 

In [None]:
print( Portmanteau_Test(arch_1.resid, 1, 10)[1]) #p-value of H_0 of homoscedastic innovations


**Observation**

- **ARCH(1) residuals do still exhibit heteroscedasticity**
$$
\\
$$

- **Ergo (1): ARCH(1) was not sufficient to eliminate all heteroscedasticity**
$$
\\
$$

- **Ergo (2): ARCH(m), m>1 might be successful. MU: I doubt it (see skew and high excess kurtosis)**
 $$
\\
$$
     
- **Ergo (3): GARCH(m,s) might be successful. MU: I doubt it (see skew and high excess kurtosis)**
$$
\\
$$
        
- **What to Do? Depends on the precise question. If you want to get residuals to be homoscedastic, you need likely upward jumps in vol (skew) and vol in vol (excess kurtosis)**
$$
\\
$$

- **.... But why do you really need homoscedastic residuals?**

    - coefficients of linear regression ('return forecasting etc') remain unbiased (i.e. use Newey-West or other robust standard errors)
    $$
    \\
    $$
    
    - for trading vol or careful risk management you need a precise vol estimate

In [None]:
plt.plot(arch_1.resid)

# C. Fitting AR(1)-ARCH(1) using MLE

Remember: assuming Gaussian innovations one ends up with

$$
L_T(\phi_0, \phi_1, \alpha_0, \alpha_1) = \prod_{t=2}^T \frac{1}{\sqrt{ 2 \pi (\alpha_0 + \alpha_1 \epsilon^2_{t-1})}} \times \exp\left( -\frac{(r_t - [\phi_0 + \phi_1 r_{t-1}])^2}{2 (\alpha_0 + \alpha_1 \epsilon^2_{t-1})} \right)
$$

Remember:
$$
\ln (L_T(.)) = \sum_{t=2}^T -\frac{1}{2} \ln(2\pi [\alpha_0 + \alpha_1 \epsilon^2_{t-1}]) - \frac{1}{2}  \frac{(r_t - [\phi_0 + \phi_1 r_{t-1}])^2}{2 (\alpha_0 + \alpha_1 \epsilon^2_{t-1})} 
$$

In [None]:
# -ln( L_T(.) )

def Neg_loglikelihood_ar1_arch1(parameters):   # Parameters is a list of model parameters, here: [\phi_0, phi_1, alpha_0, alpha_1   ]  
    phi_0   = parameters[0]
    phi_1   = parameters[1]
    alpha_0 = parameters[2]
    alpha_1 = parameters[3]

    means = phi_0 + phi_1 * r_t.iloc[:-1].values
    eps   = r_t.iloc[1:].values - means
    vars_  = alpha_0 + alpha_1 * eps[:-1]**2
       
    loglikeli = np.sum(-0.5 * np.log(2 * math.pi * vars_) - (r_t.iloc[2:].values - means[1:])**2 / (2 * vars_))

    return -loglikeli

## Numerical Optimization of AR(1)-ARCH(1)- ln (L_T(.)) 

**Smart Starting Values: here from 2-pass estimation**

In [None]:
ar1_arch1_params_start = [ar1.params[0], ar1.params[1], arch_1.params[0], arch_1.params[1]]

In [None]:
print(ar1_arch1_params_start)

### Nelder-Mead (Local) Optimization
i.e. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

In [None]:
ar1_arch1_params_optimal = scipy.optimize.minimize(Neg_loglikelihood_ar1_arch1, ar1_arch1_params_start, method = 'Nelder-Mead', options={'disp':True})

**Print optimal AR(1)-ARCH(1) params**

In [None]:
print(ar1_arch1_params_optimal.x)

#  D. Fitting AR(1)-GARCH(1,1) using MLE

Note: assuming Gaussian innovations one ends up with

$$
L_T(\phi_0, \phi_1, \alpha_0, \alpha_1, \beta_1, \sigma_1) = \prod_{t=2}^T \frac{1}{\sqrt{ 2 \pi (\alpha_0 + \alpha_1 \epsilon^2_{t-1} + \beta_1 \sigma^2_{t-1})}} \times \exp\left( -\frac{(r_t - [\phi_0 + \phi_1 r_{t-1}])^2}{2 (\alpha_0 + \alpha_1 \epsilon^2_{t-1}+ \beta_1 \sigma^2_{t-1})} \right)
$$
with $\sigma^2_t = \alpha_0 + \alpha_1 \epsilon^2_{t-1} + \beta_1 \sigma^2_{t-1}, s.t. \sigma^2_1 = \text{known parameter}$

Note:
$$
\ln (L_T(.)) = \sum_{t=2}^T -\frac{1}{2} \ln(2\pi [\alpha_0 + \alpha_1 \epsilon^2_{t-1}+ \beta_1 \sigma^2_{t-1}]) - \frac{1}{2}  \frac{(r_t - [\phi_0 + \phi_1 r_{t-1}])^2}{2 (\alpha_0 + \alpha_1 \epsilon^2_{t-1}+ \beta_1 \sigma^2_{t-1})} 
$$

In [None]:
#calculate sigma^2_t parametrically using the GARCH(1,1) recursion

def garch11_variance(alpha_0, alpha_1, beta_1, sigma2_1, epsilon):
    sigma2 = np.zeros(epsilon.shape[0] - 1)
    sigma2[0] = alpha_0 + alpha_1 * epsilon[0]**2 + beta_1 * sigma2_1 
    for i in range(1, sigma2.shape[0]):
        sigma2[i] = alpha_0 + alpha_1 * epsilon[i]**2 + beta_1 * sigma2[i-1]
    
    return sigma2

In [None]:
# -ln( L_T(.) )

def Neg_loglikelihood_ar1_Garch11(parameters):   # Parameters is a list of model parameters, here: [\phi_0, phi_1, alpha_0, alpha_1, beta_1, sigma2_1]
    phi_0   = parameters[0]
    phi_1   = parameters[1]
    alpha_0 = parameters[2]
    alpha_1 = parameters[3]
    beta_1  = parameters[4]
    sigma2_1= parameters[5]
    
    means = phi_0 + phi_1 * r_t.iloc[:-1].values
    eps   = r_t.iloc[1:].values - means
    vars_  = garch11_variance(alpha_0, alpha_1, beta_1, sigma2_1, eps)
       
    loglikeli = np.sum(-0.5 * np.log(2 * math.pi * vars_) - (r_t.iloc[2:].values - means[1:])**2 / (2 * vars_))

    return -loglikeli

## Numerical Optimization of AR(1)-GARCH(1,1)- ln (L_T(.)) 

**Smart Starting Values: here: AR(1)-ARCH(1) 2-pass Estimates**

In [None]:
ar1_Garch11_params_start = [ar1.params[0], ar1.params[1], arch_1.params[0], arch_1.params[1], 0.01,1]

In [None]:
print(ar1_Garch11_params_start)

### Nelder-Mead (Local Optimization)

In [None]:
ar1_Garch11_params_optimal = scipy.optimize.minimize(Neg_loglikelihood_ar1_Garch11, ar1_Garch11_params_start, method = 'Nelder-Mead', options={'disp':True})

**Print Optimal AR(1)-GARCH(1,1) Params**

In [None]:
print(ar1_Garch11_params_optimal.x)

# E. Comparing Optimization Routines, applied to AR(1)-GARCH(1,1)

**3 (4) Local Routines**

- Nelder Mead
$$
\\
$$

- SLSQP
$$
\\
$$

- BFGS
$$
\\
$$

- Python Package for ARMA-GARCH

**2 Global Routines**

- Dual Annealing
$$
\\
$$

- Evolutionary Algorithm


**package to add bounds to local optimization**

In [None]:
from scipy.optimize import Bounds

## E.1 Local Optimization: SLSQP

In [None]:
ar1_Garch11_params_optimal_constr = scipy.optimize.minimize(Neg_loglikelihood_ar1_Garch11,
                                                            ar1_Garch11_params_start, method='SLSQP',
                                                            bounds=Bounds(np.zeros(6),np.ones(6)*1), options={'disp':True})

## E.2 Local Optimization: Quasi-Newton of Broyden, Fletcher, Goldfarb, Shanno (BFGS)

In [None]:
ar1_Garch11_params_optimal_constr_bfgs_x, f_bgfs, _ = scipy.optimize.fmin_l_bfgs_b(Neg_loglikelihood_ar1_Garch11, x0=ar1_Garch11_params_start, bounds=[(0,1)]*6, approx_grad=True)

## E.3 Python's ARCH Package

$$r_t=const+ \phi r_{t-1}+\epsilon_t$$
$$\sigma^2_t = \omega + \alpha \epsilon_{t-1}^2 +\beta \sigma^2_{t−1}$$
$$\epsilon_t= \sigma_t e_t,\ e_t \sim \mathcal{N}(0,1)$$

In [None]:
from arch import arch_model

In [None]:
#Instantiate the AR(1)-GARCH(1,1) Model
am = arch_model(r_t, lags=1, mean="AR", vol="Garch", dist="Normal", rescale=False)

In [None]:
#fit the AR(1)-GARCH(1,1)
res = am.fit()

## E.4 Global Optimization: DUAL ANNEALING

In [None]:
ar1_Garch11_params_optimal_constr_go_ann = scipy.optimize.dual_annealing(Neg_loglikelihood_ar1_Garch11, [(0,1)]*6, x0=ar1_Garch11_params_start,  seed=123)

## E.5 Global Optimization: Evolutionary Algorithm

In [None]:
ar1_Garch11_params_optimal_constr_go_de = scipy.optimize.differential_evolution(Neg_loglikelihood_ar1_Garch11, [(0,1)]*6, seed=123)

## E.6 Compare L_opt for AR(1)-GARCH(1,1) Across Optimization Routines

In [None]:
df_Lopt = pd.DataFrame({"BFGS: Quasi-Newton": -f_bgfs,
                   "Dual Annealing": -ar1_Garch11_params_optimal_constr_go_ann.fun ,
                   "arch_pckge": res.loglikelihood,
                  "Nelder-Mead ": -ar1_Garch11_params_optimal.fun,
                   "Evolut.Algo" : -ar1_Garch11_params_optimal_constr_go_de.fun ,
                   "SLSQP": -ar1_Garch11_params_optimal_constr.fun,    
                   }, index=["L_opt"])
 
df_Lopt

## E.7 Compare x_opt for AR(1)-GARCH(1,1) Across Optimization Routines

In [None]:
df_x = pd.DataFrame({"BFGS: Quasi-Newton": ar1_Garch11_params_optimal_constr_bfgs_x[:-1].round(5), 
                     "Dual Annealing": ar1_Garch11_params_optimal_constr_go_ann.x[:-1].round(5),
                     "arch_pckge": res.params.values,
                     "Nelder-Mead": ar1_Garch11_params_optimal.x[:-1].round(5),
                     "Evolut.Algo": ar1_Garch11_params_optimal_constr_go_de.x[:-1].round(5),
                     "SLSQP": ar1_Garch11_params_optimal_constr.x[:-1].round(5) ,            
                     },
                    index=["phi_0","phi_1","alpha_0","alpha_1","beta_1"])
df_x

## Observation w.r.t. Choice of Optimizer

- **Choice of optimizer is crucial**
$$
\\
$$

- **Different optimization routines succeed for different problems**
$$
\\
$$

- **Try at least one global (like Dual Annealing), especially for high dimensional problems (more than 5 variables) problems**
$$
\\
$$

- **If you use local optimizers: randomize start values**
$$
\\
$$

- **USE SMART (INFORMATIVE) START VALUES**