# Generalized Method of Moments (GMM) Estimation

본 포스트는 Rice University의 Richard W. Evans가 2018년 7월 QuantEcon에 게시한 [Generalized Method of Moments (GMM) Estimation](https://notes.quantecon.org/submission/5b3b1856b9eab00015b89f90)을 기본적으로 번역하되, 필요에 따라 일부 내용을 수정하거나 추가적인 내용들을 덧붙이면서 만들어졌습니다.

<br>

## 1. Introduction: MLE vs. GMM

Maximum likelihood (ML) estimator와 GMM estimator의 정확도와 효율성에 대한 비교는 여러 연구자들에 의해 다양한 컨텍스트에 대해서 이뤄진 적이 있습니다. 일례로 Fuhrer et al. (1995)은 모델이 단순하고 샘플 수가 적을 때 MLE가 가지는 강점들을 인벤토리 모델 컨텍스트에서 보여주었는데, 오늘 여기 서론에서는 좀더 일반적인 관점에서 MLE와 GMM의 강점과 약점을 비교하고자 합니다.

1.1. MLE 강점

* MLE를 통해서 나온 추정치는 더 높은 통계적 유의성을 가지곤 합니다. 이는 GMM과 비교했을 때 MLE가 분포에 관한 훨씬 강한 가정에 기반하기 때문입니다.
* ML 추정치는 파라미터나 모델 표준화에 대해 덜 민감합니다.
* ML 추정치는 샘플 수가 적은 상황에서 보통 bias가 더 작고 더 효율적입니다.


1.2. MLE 약점

* MLE는 분포에 관한 강한 가정을 필요로 합니다. MLE를 하기 위해서는 DGP(data generating process)가 완전하게 명시되어야 하는데, 이 과정에서 쓰이는 가정들이 현실과는 다른 경우가 거의 대부분이겠습니다.
* MLE는 rational expectations 모델에 적합하지 않습니다. 일관적인 belief를 고려할 때 likelihood function에 비선형성이 발생하면서 global optimum을 찾기 어려워지기 때문입니다.
* 연장선 상에서 MLE는 비선형 모델에 적합하지 않습니다. 심지어 선형 모델일지라도 데이터가 불규칙적인 경우에는 likelihood function이 비선형성을 띄기 쉬운데, 모델 자체가 복잡하고 비선형적이라면 추정의 어려움이 배가 되겠습니다.

1.3. GMM 강점

* GMM allows for most flexible identification. GMM estimates can be identified by any set of moments from the data as long as you have at least as many moments as you have parameters to estimate and that those moments are independent enough to identify the parameters. (And the parameters are independent enough of each other to be separately identified.)
* Good large sample properties. The GMM estimator is strongly consistent and asymptotically normal. GMM will likely be the best estimator if you have a lot of data.
* GMM requires minimal assumptions about the DGP. In GMM, you need not specify the distributions of the error terms in your model of the DGP. This is often a strength, given that most error are not observed and most models are gross approximations of the true DGP.


1.4. GMM 약점

* GMM estimates are usually less statistically significant than ML estimates. This comes from the minimal distributional assumptions. GMM parameter estimates usually are measured with more error.
* GMM estimates can be sensitive to normalizations of the model or parameters.
* GMM estimates have bad small sample properties. GMM estimates can have large bias and inefficiency in small samples.

1.5. Key questions to answer when deciding between MLE and GMM

- How much data is available for the estimation? Large data samples will make GMM relatively more attractive than MLE because of the nice large sample properties of GMM and fewer required assumptions on the model
- How complex is the model? Linear models or quadratic models are much easier to do using MLE than are more highly nonlinear models. Rational expectations models (macroeconomics) create an even more difficult level of nonlinearity that pushes you toward GMM estimation.
- How comfortable are you making strong distributional assumptions? MLE requires a complete specification of all distributional assumptions of the model DGP. If you think these assumptions are too strong, you should use GMM.

<br>

## 2. GMM

GMM was first formalized by Hansen (1982). A strength of GMM estimation is that the econometrician can remain completely agnostic as to the distribution of the random variables in the DGP. For identification, the econometrician simply needs at least as many moment conditions from the data as he has parameters to estimate.

A moment of the data is broadly defined as any statistic that summarizes the data to some degree. A data moment could be as narrow as an individual observation from the data or as broad as the sample average. GMM estimates the parameters of a model or data generating process to make the model moments as close as possible to the corresponding data moments. See Davidson and MacKinnon (2004, ch. 9) for a more detailed treatment of GMM. The estimation methods of linear least squares, nonlinear least squares, generalized least squares, and instrumental variables estimation are all specific cases of the more general GMM estimation method.

Let $m(x)$ be an $R \times 1$ vector of moments from the real world data $x$. And let $x$ be an $N \times K$ matrix of data with $K$ columns representing $K$ variables and $N$ observations. Let the model DGP be characterized as $F(x,\theta)$, where $F$ is a vector of equations, each of which is a function of the data $x$ and the $K \times 1$ parameter vector $\theta$. Then define $m(x|\theta)$ as a vector of $R$ moments from the model that correspond to the real-world moment vector $m(x)$. Note that GMM requires both real world data $x$ and deterministic moments from the model $m(x|\theta)$ in order to estimate $\hat\theta_{GMM}$. There is also a stochastic way to generate moments from the model, which we discuss later in our section on Simulated Method of Moments (SMM).

The GMM approach of estimating the parameter vector $\hat\theta_{GMM}$ is to choose $\theta$ to minimize some distance measure of the data moments $m(x)$ from the model moments $m(x|\theta)$.

\begin{equation}
    \hat\theta_{GMM} := \arg\min_{\theta}{|| m(x|\theta) - m(x) ||}
\end{equation}

The distance measure $||m(x|\theta)−m(x)||$ can be any kind of norm. But it is important to recognize that your estimates $\hat\theta_{GMM}$ will be dependent on what distance measure (norm) you choose. The most widely studied and used distance metric in GMM estimation is the $L^2$ norm or the sum of squared errors in moments. Define the moment error function $e(x|\theta)$ as the percent difference in the vector of model moments from the data moments.

\begin{equation}
    e(x) = \frac{m(x|\theta)-m(x)}{m(x)}
\end{equation}

<br>

## 2. Comparing Data and Distributions

실제 데이터를 가지고 likelihood를 계산하는 간단한 실습을 진행해 보겠습니다. 여기서 다룰 데이터는 원문 저자인 Richard의 거시경제학 과목에서 학생들이 득점한 성적입니다. 우선 실습에 필요한 라이브러리와 데이터를 불러옵니다.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import scipy.integrate as intgr

data = pd.read_csv("https://raw.githubusercontent.com/rickecon/Notebooks/master/MLE/data/Econ381totpts.txt",
                   header=None, names=["points"])
data

Unnamed: 0,points
0,275.50
1,351.50
2,346.25
3,228.25
4,108.25
...,...
156,235.00
157,102.20
158,112.30
159,130.60


In [None]:
# PDF 생성 함수: Truncated normal 분포
def trunc_norm_pdf(xvals, mu, sigma, cutoff):
    if cutoff is None:
        prob_notcut = 1
    else:
        prob_notcut = sts.norm.cdf(cutoff, loc=mu, scale=sigma)
            
    pdf_vals = ((1 / (sigma * np.sqrt(2*np.pi)) * np.exp(- (xvals-mu)**2 / (2*sigma**2))) / prob_notcut)
    
    return pdf_vals

In [None]:
# computes the two data moments for GMM: (mean(data), variance(data))
def data_moments(xvals):
    mean_data = xvals.mean()
    var_data = xvals.var()
    
    return mean_data, var_data

# computes the two model moments for GMM: (mean(model data), variance(model data))
def model_moments(mu, sigma, cutoff):
    '''
    mean_model = scalar, mean value of test scores from model
    m_m_err    = scalar > 0, estimated error in the computation of the
                 integral for the mean of the distribution
    var_model  = scalar > 0, variance of test scores from model
    v_m_err    = scalar > 0, estimated error in the computation of the
                 integral for the variance of the distribution
    '''
    xfx = lambda x: x * trunc_norm_pdf(x, mu, sigma, cutoff)
    (mean_model, m_m_err) = intgr.quad(xfx, -np.inf, cutoff)
    
    x2fx = lambda x: ((x - mean_model) ** 2) * trunc_norm_pdf(x, mu, sigma, cutoff) 
    (var_model, v_m_err) = intgr.quad(x2fx, -np.inf, cutoff)
    
    return mean_model, var_model

In [None]:
# computes the vector of moment errors (in percent deviation from the data moment vector) for GMM
def err_vec(xvals, mu, sigma, cutoff, simple):
    '''
    simple = boolean, =True if errors are simple difference, =False if
             errors are percent deviation from data moments
    
    err_vec    = (2, 1) matrix, column vector of two moment error
                 functions    
    '''
    mean_data, var_data = data_moments(xvals)
    moms_data = np.array([[mean_data], [var_data]])
    
    mean_model, var_model = model_moments(mu, sigma, cutoff)
    moms_model = np.array([[mean_model], [var_model]])
    
    if simple:
        err_vec = moms_model - moms_data
    else:
        err_vec = (moms_model - moms_data) / moms_data
    
    return err_vec

## References

* Fuhrer JC, Moore GR, Schuh SD. (1995). "Estimating the Linear-quadratic Inventory Model: Maximum Likelihood versus Generalized Method of Moments". <i>Journal of Monetary Economics</i> 35(1) 115-157.

<br>