# Simulated Maximum likelihood

This notebook simulates panel dataset from the model 
$$ 
y_{it} = \mathbf{x}_{it} \boldsymbol{\beta} + c_i + u_{it},  \quad c_i \sim \text{IID}\mathcal{N}(0,\sigma_c^2), \quad u_{it} \sim \text{IID}\mathcal{N}(0,\sigma_u^2).
$$

The likelihood function needs to integrate out the unobserved $c_i$, taking the form 
$$
\ell_i(\theta) = \log \int \prod_{t=1}^T \frac{1}{\sigma_u} \phi\left( \frac{y_{it} - \mathbf{x}_{it} \boldsymbol{\beta} - \sigma_c c}{\sigma_u} \right) \phi(c) \text{d} c.
$$

The integral (the expectation wrt. $c_i$) can be computed either using *simulation* or *quadrature*. Both are *approximations* to the integral.

* **Simulation:** Take $R$ draws, $c^{(r)} \sim \mathcal{N}(0,1)$, and compute 
$$
\ell_i(\theta) \cong \log R^{-1}\sum_{r=1}^R  \left[ \prod_{t=1}^T \frac{1}{\sigma_u} \phi\left( \frac{y_{it} - \mathbf{x}_{it} \boldsymbol{\beta} - \sigma_c \color{red}{ c^{(r)}} }{\sigma_u} \right) \right].
$$

* **Quadrature:** using $Q$ quadrature nodes and weights, $\{n_q, w_q\}_{q=1}^Q$, compute
$$
\ell_i(\theta) \cong \log \sum_{q=1}^Q \color{red}{w_q}  \left[ \prod_{t=1}^T \frac{1}{\sigma_u} \phi\left( \frac{y_{it} - \mathbf{x}_{it} \boldsymbol{\beta} - \sigma_c \color{red}{n_q} }{\sigma_u} \right) \right].
$$

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns 
sns.set_theme()

%load_ext autoreload
%autoreload 2

import sml 
import estimation

# Simulate data

In [2]:
np.random.seed(1)
N = 100
T = 10 

The parameter vector to be estimated consists of three: $$ \theta = (\beta, \sigma_u, \sigma_c). $$

In [3]:
betao = np.array([1.,1.])
K = betao.size
sigma_c = 1.
sigma_u = 1.
thetao = np.append(betao, [sigma_u, sigma_c])
y,x,c = sml.sim_data(thetao, N, T)
# note: it is "cheating" that we return c, since in a real 
# dataset we do not observe c; here, it is done so that we 
# can make illustrative plots

In [4]:
theta_lab = ['beta1', 'beta2', 'sigma_c', 'sigma_u']

# Estimate

## Estimate with Simulation

Here, we estimate the model using the criterion function where we compute the integral by simulation. 

In [19]:
R = 1000
theta0 = thetao*1.0
q = lambda theta,y,x : sml.q(theta, y, x, R=R, seed=8) # seed=None: use equiprobably grid points on (0;1)
res = estimation.estimate(q, theta0, y, x)

Optimization terminated successfully.
         Current function value: 15.627367
         Iterations: 6
         Function evaluations: 45
         Gradient evaluations: 9


In [20]:
# print a nice table 
pd.DataFrame({'start': theta0, 
              'truth': thetao,
              'thetahat': res['theta'],
              't': res['t']}, 
            index=theta_lab).round(4)

Unnamed: 0,start,truth,thetahat,t
beta1,1.0,1.0,1.0542,9.5096
beta2,1.0,1.0,0.9783,27.319
sigma_c,1.0,1.0,1.0227,40.4082
sigma_u,1.0,1.0,1.0407,11.6331


***Warning!*** If the number of simulation draws, `R` is too low, we get a bias that is most clearly visible in the estimate of `sigma_u`. This bias occurs because for low `R`, the law of large numbers has not kicked in, so our approximation, $\mathbb{E}[f(...,c)] \cong R^{-1} \sum_{r=1}^R f(...,c^r)$, is not very good.

**Minimum R:** We need $R \ge 100$ before we have a reasonably good approximation. 

**Other options:** Go to `sml.py` to see what options you have for the draws. 

## Estimate with Quadrature

Now, we compute the integral by *quadrature* instead. 

In [7]:
R = 20 # no. quadrature points 
theta0 = thetao*1.0
q = lambda theta,y,x : -sml.loglikelihood_quad(theta, y, x, R) 
res = estimation.estimate(q, theta0, y, x)

Optimization terminated successfully.
         Current function value: 18.619909
         Iterations: 6
         Function evaluations: 45
         Gradient evaluations: 9


In [8]:
pd.DataFrame({'start': theta0, 
              'truth': thetao,
              'thetahat': res['theta'],
              't': res['t']}, 
            index=theta_lab).round(4)

Unnamed: 0,start,truth,thetahat,t
beta1,1.0,1.0,1.0507,9.7926
beta2,1.0,1.0,0.9756,27.438
sigma_c,1.0,1.0,1.0213,41.1359
sigma_u,1.0,1.0,1.0529,11.6141
