# Generalized Method of Moments (GMM)  

### This code estimates parameters from the following model (moment conditions): 
$$ \mathbb{E}\left[\theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^i_{t+1,t})-1\right]=0,$$
### where $i = 1,...,N$ is the number of assets. In other words, it contains an application of GMM to the estimation of a consumption CAPM model in which the stochastic discount factor $m_{t+1}$ is `nonlinear` and equal to $m_{t+1} = \theta_1\left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}.$

### Let us begin by uploading the main libraries.

In [1]:
import os
import pandas as pd
import numpy as np
import scipy.optimize
from scipy.stats import t, norm, chi2
from platform import python_version

In [2]:
# The recommended python version is 3.8 or 3.9
print(python_version())
# Check current directory
os.getcwd()

3.8.5


'/Users/Fbandi/Dropbox/nonlineareconometrics2024/FEDERICO/Python_codes'

# The data
### We use data on 10 risky assets and consumption growth. The data are monthly observations.

In [3]:
data = pd.read_excel('ccapmmonthlydata.xls')

data

Unnamed: 0,Date,CONS_GROWTH,R1,R2,R3,R4,R5,R6,R7,R8,R9,R10
0,1959-02-01,1.00203,0.030531,0.015071,0.029324,0.023168,0.035988,0.032759,0.022402,0.033360,0.014899,0.001814
1,1959-03-01,1.01293,0.014044,0.018926,0.021002,0.018656,0.005316,0.007779,0.010434,0.000907,0.005017,0.000685
2,1959-04-01,0.99169,0.024539,0.012107,0.007940,0.032005,0.026305,0.032149,0.033886,0.040101,0.015802,0.042236
3,1959-05-01,1.00867,-0.000550,0.018403,-0.007679,0.002192,0.011910,-0.003016,0.014792,0.007357,-0.005574,0.025607
4,1959-06-01,0.99797,-0.013110,0.001369,0.003227,0.005999,0.005040,0.011527,0.006842,0.001895,0.011515,-0.005856
...,...,...,...,...,...,...,...,...,...,...,...,...
413,1993-07-01,1.00086,0.046263,0.009054,0.009871,-0.000613,0.008385,0.003841,-0.006115,0.005908,0.004158,-0.007578
414,1993-08-01,0.99947,0.010932,0.019995,0.035390,0.033687,0.044615,0.031195,0.045107,0.044392,0.037157,0.032150
415,1993-09-01,1.00059,0.001390,0.003805,0.008751,0.017508,0.016118,0.005107,0.004222,-0.003743,-0.000390,-0.011184
416,1993-10-01,0.99784,0.063450,0.032265,0.024299,0.016981,0.010179,0.007463,0.007426,0.004722,0.006056,0.017435


In [4]:
data.describe() #This command provides descriptive statistics for each colunm in the dataframe.

Unnamed: 0,CONS_GROWTH,R1,R2,R3,R4,R5,R6,R7,R8,R9,R10
count,418.0,418.0,418.0,418.0,418.0,418.0,418.0,418.0,418.0,418.0,418.0
mean,0.997737,0.008989,0.007844,0.007141,0.007168,0.006303,0.006555,0.005906,0.0061,0.005045,0.003555
std,0.005565,0.07029,0.06163,0.058507,0.05577,0.05344,0.05194,0.050378,0.049068,0.046384,0.041435
min,0.9738,-0.30557,-0.30801,-0.29946,-0.28881,-0.28535,-0.28206,-0.26206,-0.2661,-0.22999,-0.20057
25%,0.994768,-0.027987,-0.023862,-0.022993,-0.024535,-0.022658,-0.023896,-0.022692,-0.023787,-0.023269,-0.019272
50%,0.998265,0.004034,0.00792,0.007194,0.008396,0.008287,0.008145,0.006734,0.00739,0.008249,0.004682
75%,1.001478,0.042789,0.038898,0.038224,0.039484,0.038534,0.038316,0.038691,0.035739,0.033055,0.029285
max,1.01612,0.573527,0.426797,0.367947,0.302337,0.254217,0.253527,0.224877,0.213347,0.170157,0.174916


In [5]:
# The 10 columns of asset return data
ret = np.array(data.iloc[:, 2:])

# consumption growth data (c_{t+1}/c_{t}) is in the first column
cons = np.array(data.CONS_GROWTH)  

# The number of assets
number_assets = 10

# The number of observations
T = len(cons)

## Now that we have the data, we are ready to begin. Let us go step-by-step.

## (1) We define the GMM criterion in a `Python function` called `gmm`. 

The criterion is:

\begin{eqnarray*}
Q_T(\theta) = \underbrace{g_{T}(\theta )^{\top }}_{1\times N}\underbrace{W_{T}}_{N\times N}\underbrace{g_{T}(\theta )}_{N\times1},
\end{eqnarray*}

where

\begin{eqnarray*}
\underbrace{g_T(\theta)}_{N\times1} &=& \frac{1}{T}\sum_{t=1}^{T-1}\underbrace{g(X_{t+1},\theta)}_{N\times1} \\
&=&\frac{1}{T}\sum_{t=1}^{T-1}\begin{pmatrix} g^{1}(X_{t+1},\theta) \\ g^{2}(X_{t+1},\theta) \\ ... \\g^{N}(X_{t+1},\theta)\end{pmatrix} \\
&=& \frac{1}{T}\sum_{t=1}^{T-1}\begin{pmatrix} \theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^1_{t+1,t})-1 \\ \theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^2_{t+1,t})-1 \\ ... \\\theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^N_{t+1,t})-1\end{pmatrix} 
\end{eqnarray*}

and $W_T$ is a square symmetric matrix of weights that we choose (see below).

In [6]:
def gmm(parameters, cons, ret, W, flag):

    p_error = np.zeros([T, number_assets])          # The matrix in which we are going to store the pricing errors.
                                                    # The rows are time periods, the columns are assets.

    # The following loop creates the pricing errors for each period and each asset    
        
    for j in range(number_assets):      
        p_error[:,j] = parameters[0] * np.power(cons, -parameters[1]) * (1 + ret[:,j]) -1

    g = np.mean(p_error,axis=0)
    
    if flag == 1:
        f = g @ W @ g.T  
    else:
        f = p_error
    return f


## (2) We find the parameters by `minimizing the GMM criterion`:

\begin{eqnarray*}
\widehat{\theta }_{GMM} &=&\underset{\theta }{\text{ }\arg \min }\left[\underbrace{g_{T}(\theta )^{\top }}_{1\times N}\underbrace{W_{T}}_{N\times N}\underbrace{g_{T}(\theta )}_{N\times1}\right] \\
&=&\text{ }\underset{\theta }{\arg \min }\left[\underbrace{Q_{T}(\theta )}_{1\times 1}\right],
\end{eqnarray*}

where $\widehat{\theta}_{GMM} = (\widehat{\theta}_1, \widehat{\theta}_2)$ in our bivariate case. 

### In order to do the minimization we now have to feed the criterion (in the function `gmm` above) into a minimizer (another function which will compute the minimum of the criterion). A possible function is `scipy.optimize.fmin`. 

The inputs of the `scipy.optimize.fmin` function are:

1. `func`. The function to minimize - in our case `gmm` - as defined in the previous snippet.
2. `x0`. The initial guess of the parameters $\theta_1$ and $\theta_2$: `initial_guess`. This is just our initial guess of the parameters for evaluating the function `gmm` at the beginning of the minimization.
3. `args`. The arguments of the `gmm` function that are not paramaters. For our problem, these are the data `cons` and `ret`, respectively.

Additional inputs that are optional:

4. `xtol` and `ftol`. This is the tolerance for the minimizer and the function evaluation, respectively. The algorithm will stop automatically when it cannot find another minimizer that is smaller by at least `xtol` than the current one. Same as for the function evaluation. 
5. `maxiter`. The maximum number of iterations to try. The algorithm stops if it reaches `maxiter` attempts, even if it did not find a minimum.
6. `disp`. A variable indicating whether we want to see some results or not. `disp=0` will not show results, `disp=1` will provide additional information.

# First-stage estimation 

We are going to use the identity matrix as the initial weight matrix: 

\begin{equation*}
\underbrace{W_T}_{N \times N} = \begin{pmatrix} 1 & 0 & ... & 0 \\ 0 & 1 & ... & 0 \\ ... &...&...& ... \\ 0 & 0 & ... & 1 \end{pmatrix}.
\end{equation*}


After obtaining the first-stage GMM estimates, we will calculate the `optimal` weight matrix (using the first-stage estimates) and obtain our `final` second-stage estimates.

In [7]:
# First-stage weigth matrix (the identity matrix)
W = np.eye(number_assets)
                         
# parameters used to initialize the optimization
initial_guess = [4, 5]

# minimize the gmm criterion to find the parameters estimates
estimates = scipy.optimize.fmin(func=gmm, 
                                  x0=initial_guess, 
                                  args=(cons, ret, W, 1), 
                                  xtol=1e-5, 
                                  ftol=1e-5,
                                  maxiter = 100000,
                                  disp=0)

# The first-stage parameter estimates
print(f'The first-stage estimate of the first parameter is {estimates[0]:.3f}')
print(f'The first-stage estimate of the second parameter is {estimates[1]:.3f}')

The first-stage estimate of the first parameter is 0.577
The first-stage estimate of the second parameter is 123.265


# Second-stage estimation

Now that we have the first-stage estimates ($\widehat{\theta}^1_{GMM}$), we will be able to compute the `optimal second-stage estimates` ($\widehat{\theta}^2_{GMM}$) using the `optimal weight matrix` $W_{T}=\widehat{\Phi }_{0}^{-1}$ with

\begin{equation*}
W_{T}=\left( \frac{1}{T}\sum \limits_{t=1}^{T-1}\left( g(X_{t+1},\widehat{%
\theta }_{GMM}^{1})g(X_{t+1},\widehat{\theta }_{GMM}^{1})^{^{\top }}\right)
\right) ^{-1}.
\end{equation*}


In [8]:
####################################################################
# We compute the optimal weight matrix using first-stage estimates
####################################################################

# The pricing errors evaluated at the first-stage estimates
g_opt = gmm(estimates, cons, ret, W, 2);     

# Phi_hat0 is an average of outer products of pricing errors. We do a loop to compute this average.
Phi_hat0 = np.zeros([number_assets, number_assets])
for j in range(T):
    Phi_hat0 = Phi_hat0 + np.outer(g_opt[j,:], g_opt[j,:]) / T

# The optimal weight matrix is just the inverse of Phi_hat0    
W_opt = np.linalg.inv(Phi_hat0)

In [9]:
#############################
# Second-stage estimation 
#############################

estimates_opt = scipy.optimize.fmin(func=gmm, 
                                    x0=initial_guess, 
                                    args=(cons, ret, W_opt, 1), 
                                    xtol=1e-5, 
                                    ftol=1e-5, 
                                    disp=0)

# The second-stage parameter estimates
print(f'The second-stage estimate of the first parameter is {estimates_opt[0]:.3f}')
print(f'The second-stage estimate of the second parameter is {estimates_opt[1]:.3f}')

The second-stage estimate of the first parameter is 0.870
The second-stage estimate of the second parameter is 47.848


## (3) Finally, we do statistical inference.

When using the `optimal weight matrix,` we know that the GMM estimator is asymptotically normal with a very specific asymptotic variance (the smallest possible):

\begin{equation}
\sqrt{T}(\widehat{\theta}_{GMM} - \theta_0) \overset{d}{\rightarrow}N(0,\mathbb{V})
\end{equation}

with $\mathbb{V} = \left( \Gamma _{0}^{^{\top }}\Phi_{0}^{-1}\Gamma _{0}\right)^{-1}.$ Thus,

\begin{equation*}
\boxed{\mathbb{V}(\widehat{\theta}_{GMM}) = \frac{1}{T}\left( \Gamma _{0}^{^{\top }}\Phi
_{0}^{-1}\Gamma _{0}\right) ^{-1}.}
\end{equation*}

The relevant quantities are $\Phi_0$ and $\Gamma_0$. We begin with $\Phi_0.$ 

\begin{equation*}
\underbrace{\Phi_0}_{N \times N}=\mathbb{E}\left(g(X_{t+1},\theta _{0})g(X_{t+1},\theta _{0})^{\top}\right).
\end{equation*}

This is the expected value of the outer product of the pricing errors: 

\begin{equation*}
\underbrace{\Phi_0}_{N \times N} = \mathbb{E}\left( \begin{bmatrix}\theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^1_{t+1,t})-1 \\ \theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^2_{t+1,t})-1 \\...\\ ... \end{bmatrix} \begin{bmatrix}\theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^1_{t+1,t})-1 & \theta_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\theta_2}(1+R^2_{t+1,t})-1 & ... & ... \end{bmatrix} \right).
\end{equation*}

`Notice that we have already computed this quantity when calculating the optimal weight matrix. We simply have to recompute it using the second-stage (optimal) GMM estimates.` 

As for $\Gamma_0$:

\begin{equation*}
\underbrace{\Gamma_0}_{N \times d} = \mathbb{E}\left( \frac{\partial g(X_{t+1},\theta _{0})}{\partial \theta ^{\top }}\right).
\end{equation*}

This is a matrix in which every row is a moment and every column is the derivative of the moment with respect to the correspoding parameter. For our problem, we have $N=10$ moments and $d=2$ parameters, thus


\begin{eqnarray*}
\underbrace{\Gamma_{0}}_{N \times d} 
  =  \left[\begin{array}{cc}\mathbb{E}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{1})\right) & -\mathbb{E}\left(\theta_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{1})\right)\\
\mathbb{E}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{2})\right) & -\mathbb{E}\left(\theta_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{2})\right) \\
... \\
\mathbb{E}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{10})\right) & -\mathbb{E}\left(\theta_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\theta_2}(1+R_{t+1}^{10})\right) \\
\end{array}\right].
\end{eqnarray*}

### Estimation of $\widehat{\mathbb{V}}(\widehat{\theta}_{GMM}).$ 

We have:

\begin{equation*}
\boxed{\widehat{\mathbb{V}}(\widehat{\theta}_T) = \frac{1}{T}\left( \widehat{\Gamma} _{0}^{^{\top }}\widehat{\Phi}
_{0}^{-1}\widehat{\Gamma} _{0}\right) ^{-1}.}
\end{equation*}

The estimates of $\Phi_0$ and $\Gamma_0$ are, respectively:

\begin{equation*}
\widehat{\Phi}_0 = \frac{1}{T}\sum_{t=1}^{T-1}\left( \begin{bmatrix}\widehat{\theta}_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\widehat{\theta}_2}(1+R^1_{t+1,t})-1 \\ \widehat{\theta}_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\widehat{\theta}_2}(1+R^2_{t+1,t})-1 \\...\\ ... \end{bmatrix} \begin{bmatrix}\widehat{\theta}_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\widehat{\theta}_2}(1+R^1_{t+1,t})-1 & \widehat{\theta}_1 \left(\frac{c_{t+1}}{c_t}\right)^{-\widehat{\theta}_2}(1+R^2_{t+1,t})-1 & ... & ... \end{bmatrix} \right)
\end{equation*}

and

\begin{eqnarray*}
\widehat{\Gamma}_0
  =  \left[\begin{array}{cc}\frac{1}{T}\sum_{t=1}^{T-1}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{1})\right) & -\frac{1}{T}\sum_{t=1}^{T-1}\left(\widehat{\theta}_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{1})\right)\\
\frac{1}{T}\sum_{t=1}^{T-1}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{2})\right) & -\frac{1}{T}\sum_{t=1}^{T-1}\left(\widehat{\theta}_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{2})\right) \\
... \\
\frac{1}{T}\sum_{t=1}^{T-1}\left(\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{10})\right) & -\frac{1}{T}\sum_{t=1}^{T-1}\left(\widehat{\theta}_1\log\left[\frac{c_{t+1}}{c_{t}}\right]\left(\frac{c_{t+1}}{c_{t}}\right)^{-\widehat{\theta}_2}(1+R_{t+1}^{10})\right) \\
\end{array}\right].
\end{eqnarray*}
Notice that, as always, the expectation was replaced by an arithmetic average over the data and the true parameter value $\theta_0$ was replaced by the second-stage optimal GMM estimates.


In [10]:
#############################
# We begin with Phi_hat0
##############################

# The pricing errors evaluated at the optimal SECOND-STAGE estimates
g_opt = gmm(estimates_opt, cons, ret, W_opt, 2)

# Phi_hat0 is an average of outer products of pricing errors. We do a loop to compute this average.
Phi_hat0 = np.zeros([number_assets, number_assets])
for j in range(T):
    Phi_hat0 = Phi_hat0 + np.outer(g_opt[j, :], g_opt[j, :]) / T

invPhi_hat0 = np.linalg.inv(Phi_hat0)  # This is the inverse of Phi_hat

#############################
# We now turn to Lambda_hat0
##############################

# we use derivatives directly in the loop
Lambda_hat0 = np.zeros([number_assets, 2])
for i in range(number_assets):
    Lambda_hat0[i, 0] = np.mean(np.power(cons, -estimates_opt[1]) * (1 + ret[:, i]))
    Lambda_hat0[i, 1] = np.mean(-estimates_opt[0] * np.log(cons) * np.power(cons,-estimates_opt[1]) * (1 + ret[:, i]))

################################
# Putting everything together: The estimated variance
################################
                               
# compute the variance-covariance matrix of the parameter estimates
VarCov = (1 / T) * np.linalg.inv(Lambda_hat0.T @ invPhi_hat0 @ Lambda_hat0)


##################################
# Standard errors and t-statistics
###################################

# the variances are on the diagonal
var_diag = np.diag(VarCov);
std_error = np.sqrt(var_diag);

# t-statistics
t_stats = estimates_opt/std_error     

# table of parameters, standard errors and t statistics
table_estimates = pd.DataFrame({'Stage 1': estimates, 'Stage 2': estimates_opt, 'Std Errors': std_error,
                                't stats': t_stats }, index = ['theta1', 'theta2'])
table_estimates


Unnamed: 0,Stage 1,Stage 2,Std Errors,t stats
theta1,0.577245,0.870383,0.2025,4.298178
theta2,123.265097,47.848226,61.291257,0.78067


## Test of overidentifying restrictions

In [11]:
# Test of over-identifying restrictions in case there are too many moment restrictions(N>>d)
Overid = gmm(estimates_opt, cons, ret, W_opt, 1)  # criterion evaluated at the optimal second-stage estimates
test = T * Overid  # Hansen's test 
Pvalue = 1 - chi2.cdf(test, number_assets - 2)  # compute p-value according to a Chi-Squared distribution

print(f'Overidentification test: p-value = {Pvalue:.4f}')

Overidentification test: p-value = 0.9169
