# Tutorial 1: GMM Basics

This tutorial introduces the **Generalized Method of Moments (GMM)** and demonstrates how to use `momentest` for GMM estimation.

## What You'll Learn

1. What GMM is and when to use it
2. How to define moment conditions
3. How to estimate parameters using `gmm_estimate()`
4. How to interpret results and diagnostics

## Prerequisites

- Basic knowledge of econometrics (OLS, IV)
- Familiarity with Python and NumPy

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Import momentest
from momentest import (
    gmm_estimate,
    linear_iv,
    j_test,
    table_estimates,
    confidence_interval,
    plot_moment_comparison,
)

np.random.seed(42)

## 1. What is GMM?

**Generalized Method of Moments (GMM)** is an estimation method based on **moment conditions** - equations that should hold at the true parameter values.

### The Key Idea

Suppose we have a model with parameters $\theta$ and data $\{z_i\}_{i=1}^n$. We define **moment conditions**:

$$E[g(z_i, \theta_0)] = 0$$

where $\theta_0$ is the true parameter value. GMM finds $\hat{\theta}$ that makes the sample moments as close to zero as possible:

$$\hat{\theta} = \arg\min_\theta \bar{g}(\theta)' W \bar{g}(\theta)$$

where $\bar{g}(\theta) = \frac{1}{n}\sum_{i=1}^n g(z_i, \theta)$ and $W$ is a weighting matrix.

### When to Use GMM

- **Instrumental Variables (IV)**: When you have endogenous regressors
- **Overidentification**: When you have more moment conditions than parameters
- **Robust estimation**: When you want to avoid distributional assumptions

## 2. Example: Linear IV Model

Let's start with the classic **linear IV model**:

$$Y = \beta_0 + \beta_1 X + \varepsilon$$

where $X$ is **endogenous** (correlated with $\varepsilon$). We have an instrument $Z$ that:
1. **Relevance**: $Cov(Z, X) \neq 0$
2. **Exclusion**: $Cov(Z, \varepsilon) = 0$

### Generate Data

We'll use the built-in `linear_iv` DGP (Data Generating Process):

In [None]:
# Generate data from linear IV model
dgp = linear_iv(n=1000, seed=42, beta0=1.0, beta1=2.0, rho=0.5)

# Print info about the DGP
dgp.info()

In [None]:
# Extract data
Y = dgp.data['Y']
X = dgp.data['X']
Z = dgp.data['Z']

print(f"True parameters: β₀ = {dgp.true_theta[0]}, β₁ = {dgp.true_theta[1]}")
print(f"Sample size: n = {dgp.n}")

### Why OLS Fails

Let's first see what happens with OLS:

In [None]:
# OLS estimation (biased due to endogeneity)
X_ols = np.column_stack([np.ones(len(Y)), X])
beta_ols = np.linalg.lstsq(X_ols, Y, rcond=None)[0]

print(f"OLS estimates: β₀ = {beta_ols[0]:.4f}, β₁ = {beta_ols[1]:.4f}")
print(f"True values:   β₀ = {dgp.true_theta[0]:.4f}, β₁ = {dgp.true_theta[1]:.4f}")
print(f"\nOLS is BIASED because X is endogenous (correlated with ε)")

## 3. GMM Moment Conditions

For the linear IV model, the moment conditions are:

1. $E[\varepsilon] = 0$ → $E[Y - \beta_0 - \beta_1 X] = 0$
2. $E[Z \cdot \varepsilon] = 0$ → $E[Z(Y - \beta_0 - \beta_1 X)] = 0$

Let's define the moment function:

In [None]:
def moment_func(data, theta):
    """
    GMM moment conditions for linear IV.
    
    Args:
        data: Dictionary with 'Y', 'X', 'Z' arrays
        theta: [beta0, beta1] parameters
    
    Returns:
        Moment conditions of shape (n, k) where k=2
    """
    beta0, beta1 = theta
    
    # Residual
    residual = data['Y'] - beta0 - beta1 * data['X']
    
    # Moment conditions
    moments = np.column_stack([
        residual,              # E[ε] = 0
        residual * data['Z'], # E[Zε] = 0 (exclusion restriction)
    ])
    
    return moments

## 4. GMM Estimation with momentest

Now let's estimate using `gmm_estimate()`. This is the simple, high-level API:

In [None]:
# GMM estimation - just a few lines!
result = gmm_estimate(
    data=dgp.data,           # Pass the data dictionary
    moment_func=moment_func, # Our moment function
    bounds=[(-10, 10), (-10, 10)],  # Parameter bounds
    k=2,                     # Number of moment conditions
    weighting="optimal",    # Two-step optimal weighting
)

print(result)

In [None]:
# Compare estimates
print("\n" + "="*60)
print("COMPARISON")
print("="*60)
print(f"{'Method':<15} {'β₀':>12} {'β₁':>12}")
print("-"*40)
print(f"{'True':<15} {dgp.true_theta[0]:>12.4f} {dgp.true_theta[1]:>12.4f}")
print(f"{'OLS (biased)':<15} {beta_ols[0]:>12.4f} {beta_ols[1]:>12.4f}")
print(f"{'GMM':<15} {result.theta[0]:>12.4f} {result.theta[1]:>12.4f}")
print("="*60)
print("\nGMM recovers the true parameters! OLS is biased upward.")

## 5. Understanding the Results

### Parameter Estimates and Standard Errors

In [None]:
# Formatted table of estimates
ci_lower, ci_upper = confidence_interval(result.theta, result.se)

print(table_estimates(
    theta=result.theta,
    se=result.se,
    param_names=["β₀ (intercept)", "β₁ (slope)"],
    ci_lower=ci_lower,
    ci_upper=ci_upper,
))

### Moment Fit

At the estimated parameters, the sample moments should be close to zero:

In [None]:
# Check moment fit
print(f"Sample moments at θ̂: {result.sample_moments}")
print(f"\nThese should be close to zero (the target).")
print(f"Objective value: {result.objective:.2e}")

## 6. Identity vs Optimal Weighting

GMM allows different weighting matrices $W$:

- **Identity**: $W = I$ (simple, but not efficient)
- **Optimal**: $W = S^{-1}$ where $S$ is the moment covariance (efficient)

Let's compare:

In [None]:
# Identity weighting (one-step)
result_identity = gmm_estimate(
    data=dgp.data,
    moment_func=moment_func,
    bounds=[(-10, 10), (-10, 10)],
    k=2,
    weighting="identity",
)

# Optimal weighting (two-step)
result_optimal = gmm_estimate(
    data=dgp.data,
    moment_func=moment_func,
    bounds=[(-10, 10), (-10, 10)],
    k=2,
    weighting="optimal",
)

print(f"{'Weighting':<15} {'β₀':>10} {'β₁':>10} {'SE(β₁)':>10}")
print("-"*50)
print(f"{'Identity':<15} {result_identity.theta[0]:>10.4f} {result_identity.theta[1]:>10.4f} {result_identity.se[1]:>10.4f}")
print(f"{'Optimal':<15} {result_optimal.theta[0]:>10.4f} {result_optimal.theta[1]:>10.4f} {result_optimal.se[1]:>10.4f}")
print(f"{'True':<15} {dgp.true_theta[0]:>10.4f} {dgp.true_theta[1]:>10.4f} {'-':>10}")
print("\nOptimal weighting is asymptotically efficient (lower SE).")

## 7. Overidentification and the J-Test

When we have more moments than parameters ($k > p$), the model is **overidentified**. We can test whether all moment conditions hold using the **J-test** (Hansen-Sargan test).

Let's add another instrument to create an overidentified model:

In [None]:
# Add a second instrument (Z squared, for illustration)
dgp.data['Z2'] = dgp.data['Z']**2

def moment_func_overid(data, theta):
    """
    Overidentified GMM: 3 moments, 2 parameters.
    """
    beta0, beta1 = theta
    residual = data['Y'] - beta0 - beta1 * data['X']
    
    moments = np.column_stack([
        residual,               # E[ε] = 0
        residual * data['Z'],   # E[Zε] = 0
        residual * data['Z2'],  # E[Z²ε] = 0 (additional moment)
    ])
    
    return moments

# Estimate overidentified model
result_overid = gmm_estimate(
    data=dgp.data,
    moment_func=moment_func_overid,
    bounds=[(-10, 10), (-10, 10)],
    k=3,  # Now 3 moments
    weighting="optimal",
)

print(f"Overidentified estimates: β₀ = {result_overid.theta[0]:.4f}, β₁ = {result_overid.theta[1]:.4f}")

In [None]:
# J-test for overidentifying restrictions
j_result = j_test(
    objective=result_overid.objective,
    n=dgp.n,
    k=3,  # 3 moments
    p=2,  # 2 parameters
)

print(j_result)

### Interpreting the J-Test

- **H₀**: All moment conditions are valid
- **H₁**: At least one moment condition is invalid

If we **fail to reject** H₀ (high p-value), the instruments are likely valid.
If we **reject** H₀ (low p-value), at least one instrument may be invalid.

## 8. Visualization

In [None]:
# Plot moment comparison
fig = plot_moment_comparison(
    data_moments=np.zeros(2),  # Target is zero for GMM
    model_moments=result.sample_moments,
    moment_names=["E[ε]", "E[Zε]"],
)
plt.suptitle("GMM Moment Fit", y=1.02)
plt.tight_layout()
plt.show()

## 9. Exercises

Try these exercises to deepen your understanding:

### Exercise 1: Change the Endogeneity
Regenerate data with different `rho` values (0.1, 0.3, 0.7). How does OLS bias change? Does GMM still work?

### Exercise 2: Weak Instruments
Modify the DGP to have a weak instrument (low correlation between Z and X). What happens to GMM estimates?

### Exercise 3: Invalid Instrument
Add an instrument that violates the exclusion restriction. Does the J-test detect it?

### Exercise 4: Real Data
Try the `load_labor_supply()` dataset for a real-world IV example.

In [None]:
# Exercise 1 starter code
for rho in [0.1, 0.3, 0.5, 0.7]:
    dgp_ex = linear_iv(n=1000, seed=42, rho=rho)
    
    # OLS
    X_ols = np.column_stack([np.ones(dgp_ex.n), dgp_ex.data['X']])
    beta_ols = np.linalg.lstsq(X_ols, dgp_ex.data['Y'], rcond=None)[0]
    
    # GMM
    result_ex = gmm_estimate(
        data=dgp_ex.data,
        moment_func=moment_func,
        bounds=[(-10, 10), (-10, 10)],
        k=2,
        weighting="optimal",
    )
    
    print(f"ρ={rho}: OLS β₁={beta_ols[1]:.3f}, GMM β₁={result_ex.theta[1]:.3f}, True β₁=2.0")

## Summary

In this tutorial, you learned:

1. **GMM basics**: Estimation based on moment conditions $E[g(z, \theta)] = 0$
2. **Moment functions**: How to define `moment_func(data, theta)` returning $(n, k)$ array
3. **Simple API**: `gmm_estimate()` handles optimization, weighting, and inference
4. **Weighting**: Identity vs optimal (two-step efficient)
5. **J-test**: Testing overidentifying restrictions when $k > p$

### Key Takeaways

- GMM is powerful for IV estimation and robust inference
- The moment function is the key input - it encodes your economic model
- Optimal weighting gives efficient estimates
- The J-test helps validate your instruments

### Next Steps

- **Tutorial 2**: SMM basics - when you need to simulate moments
- **Tutorial 3**: Optimal weighting in depth
- **Tutorial 4**: Bootstrap inference