# Assignment 1 - Part 1: Frisch-Waugh-Lovell (FWL) Theorem
## Math (3 points)

This notebook contains the mathematical proof and numerical verification of the Frisch-Waugh-Lovell theorem.

The FWL theorem is a fundamental result in econometrics that shows how to isolate the effect of specific variables by "partialling out" the effects of other variables.

## Import Required Libraries

In [1]:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression

## Mathematical Proof of the FWL Theorem

The FWL theorem states that the OLS estimate of β₁ in the regression of y on [X₁ X₂] is equal to the OLS estimate obtained from the following two-step procedure:

1. Regress y on X₂ and obtain the residuals ỹ = M_{X₂}y, where M_{X₂} = I - X₂(X₂'X₂)⁻¹X₂'
2. Regress X₁ on X₂ and obtain the residuals X̃₁ = M_{X₂}X₁
3. Regress ỹ on X̃₁ and show that the resulting coefficient vector is equal to β̂₁ from the full regression.

Formally, we need to show that: β̂₁ = (X̃₁'X̃₁)⁻¹X̃₁'ỹ

In [2]:
def fwl_theorem_proof():
    """
    Mathematical proof of the Frisch-Waugh-Lovell theorem.
    """
    print("=== FRISCH-WAUGH-LOVELL THEOREM PROOF ===\n")
    
    print("Mathematical Proof:")
    print("==================")
    print()
    print("Consider the linear regression model:")
    print("y = X₁β₁ + X₂β₂ + u")
    print()
    print("Where:")
    print("- y is an n×1 vector of outcomes")
    print("- X₁ is an n×k₁ matrix of regressors of interest")
    print("- X₂ is an n×k₂ matrix of control variables")
    print("- u is an n×1 vector of errors")
    print()
    
    print("Step 1: Full regression")
    print("The full regression in matrix form is:")
    print("y = [X₁ X₂][β₁; β₂] + u = Xβ + u")
    print()
    print("The OLS estimator is:")
    print("β̂ = (X'X)⁻¹X'y")
    print()
    print("Partitioning X'X and X'y:")
    print("X'X = [X₁'X₁  X₁'X₂]")
    print("      [X₂'X₁  X₂'X₂]")
    print()
    print("X'y = [X₁'y]")
    print("      [X₂'y]")
    print()
    
    print("Step 2: Using the partitioned inverse formula")
    print("For a partitioned matrix [A B; C D], if D is invertible:")
    print("The (1,1) block of the inverse is (A - BD⁻¹C)⁻¹")
    print()
    print("Applying this to our case:")
    print("β̂₁ = [(X₁'X₁ - X₁'X₂(X₂'X₂)⁻¹X₂'X₁)]⁻¹[X₁'y - X₁'X₂(X₂'X₂)⁻¹X₂'y]")
    print()
    
    print("Step 3: Factoring out the projection matrix")
    print("Let M_{X₂} = I - X₂(X₂'X₂)⁻¹X₂' (the annihilator matrix)")
    print("Note that M_{X₂} is idempotent: M_{X₂}M_{X₂} = M_{X₂}")
    print("And symmetric: M_{X₂}' = M_{X₂}")
    print()
    print("Then:")
    print("X₁'X₁ - X₁'X₂(X₂'X₂)⁻¹X₂'X₁ = X₁'[I - X₂(X₂'X₂)⁻¹X₂']X₁ = X₁'M_{X₂}X₁")
    print("X₁'y - X₁'X₂(X₂'X₂)⁻¹X₂'y = X₁'[I - X₂(X₂'X₂)⁻¹X₂']y = X₁'M_{X₂}y")
    print()
    
    print("Step 4: Final form")
    print("Therefore:")
    print("β̂₁ = (X₁'M_{X₂}X₁)⁻¹X₁'M_{X₂}y")
    print()
    print("Let X̃₁ = M_{X₂}X₁ and ỹ = M_{X₂}y")
    print("Then: β̂₁ = (X̃₁'X̃₁)⁻¹X̃₁'ỹ")
    print()
    print("This shows that β̂₁ from the full regression equals the OLS coefficient")
    print("from regressing the residuals ỹ on the residuals X̃₁.")
    print()
    print("Q.E.D.")
    print()

# Display the mathematical proof
fwl_theorem_proof()

=== FRISCH-WAUGH-LOVELL THEOREM PROOF ===

Mathematical Proof:

Consider the linear regression model:
y = X₁β₁ + X₂β₂ + u

Where:
- y is an n×1 vector of outcomes
- X₁ is an n×k₁ matrix of regressors of interest
- X₂ is an n×k₂ matrix of control variables
- u is an n×1 vector of errors

Step 1: Full regression
The full regression in matrix form is:
y = [X₁ X₂][β₁; β₂] + u = Xβ + u

The OLS estimator is:
β̂ = (X'X)⁻¹X'y

Partitioning X'X and X'y:
X'X = [X₁'X₁  X₁'X₂]
      [X₂'X₁  X₂'X₂]

X'y = [X₁'y]
      [X₂'y]

Step 2: Using the partitioned inverse formula
For a partitioned matrix [A B; C D], if D is invertible:
The (1,1) block of the inverse is (A - BD⁻¹C)⁻¹

Applying this to our case:
β̂₁ = [(X₁'X₁ - X₁'X₂(X₂'X₂)⁻¹X₂'X₁)]⁻¹[X₁'y - X₁'X₂(X₂'X₂)⁻¹X₂'y]

Step 3: Factoring out the projection matrix
Let M_{X₂} = I - X₂(X₂'X₂)⁻¹X₂' (the annihilator matrix)
Note that M_{X₂} is idempotent: M_{X₂}M_{X₂} = M_{X₂}
And symmetric: M_{X₂}' = M_{X₂}

Then:
X₁'X₁ - X₁'X₂(X₂'X₂)⁻¹X₂'X₁ = X₁'[I - X

## Numerical Verification

Now let's verify the FWL theorem numerically using simulated data. We'll generate data with known parameters and compare the results from:
1. Full regression: y ~ [X₁ X₂]
2. FWL two-step procedure: residuals of y on residuals of X₁

In [3]:
def numerical_verification():
    """
    Numerical verification of the FWL theorem using simulated data.
    """
    print("=== NUMERICAL VERIFICATION ===\n")
    
    # Set random seed for reproducibility
    np.random.seed(42)
    
    # Generate data
    n = 1000  # Sample size
    k1 = 2    # Number of variables of interest
    k2 = 3    # Number of control variables
    
    # Generate X1, X2, and error term
    X1 = np.random.randn(n, k1)
    X2 = np.random.randn(n, k2)
    u = np.random.randn(n, 1)
    
    # True parameters
    beta1_true = np.array([[1.5], [2.0]])
    beta2_true = np.array([[0.5], [-1.0], [0.8]])
    
    # Generate y
    y = X1 @ beta1_true + X2 @ beta2_true + u
    
    print(f"Sample size: {n}")
    print(f"X1 dimensions: {X1.shape} (variables of interest)")
    print(f"X2 dimensions: {X2.shape} (control variables)")
    print(f"True β₁: {beta1_true.ravel()}")
    print(f"True β₂: {beta2_true.ravel()}")
    print()
    
    return X1, X2, y, beta1_true, beta2_true, n, k1, k2

# Generate the data
X1, X2, y, beta1_true, beta2_true, n, k1, k2 = numerical_verification()

=== NUMERICAL VERIFICATION ===

Sample size: 1000
X1 dimensions: (1000, 2) (variables of interest)
X2 dimensions: (1000, 3) (control variables)
True β₁: [1.5 2. ]
True β₂: [ 0.5 -1.   0.8]



### Method 1: Full Regression

First, let's estimate the full regression model with all variables.

In [4]:
# Method 1: Full regression
X_full = np.column_stack([X1, X2])
beta_full = np.linalg.inv(X_full.T @ X_full) @ X_full.T @ y
beta1_full = beta_full[:k1]

print("Method 1: Full regression")
print(f"β̂₁ from full regression: {beta1_full.ravel()}")
print()

Method 1: Full regression
β̂₁ from full regression: [1.52360737 1.99063613]



### Method 2: FWL Two-Step Procedure

Now let's implement the FWL two-step procedure:
1. Residualize y and X₁ with respect to X₂
2. Regress the residualized y on the residualized X₁

In [5]:
# Method 2: FWL two-step procedure

# Step 1: Regress y on X2 and get residuals
P_X2 = X2 @ np.linalg.inv(X2.T @ X2) @ X2.T
M_X2 = np.eye(n) - P_X2
y_tilde = M_X2 @ y

# Step 2: Regress X1 on X2 and get residuals
X1_tilde = M_X2 @ X1

# Step 3: Regress y_tilde on X1_tilde
beta1_fwl = np.linalg.inv(X1_tilde.T @ X1_tilde) @ X1_tilde.T @ y_tilde

print("Method 2: FWL two-step procedure")
print(f"Step 1: Residualize y on X₂")
print(f"Step 2: Residualize X₁ on X₂")
print(f"Step 3: Regress residuals")
print(f"β̂₁ from FWL method: {beta1_fwl.ravel()}")
print()

Method 2: FWL two-step procedure
Step 1: Residualize y on X₂
Step 2: Residualize X₁ on X₂
Step 3: Regress residuals
β̂₁ from FWL method: [1.52360737 1.99063613]



### Comparison and Verification

Let's check if both methods produce identical results (within numerical precision).

In [6]:
# Check if they are equal (within numerical precision)
difference = np.abs(beta1_full - beta1_fwl)
max_diff = np.max(difference)

print("Verification:")
print(f"Maximum absolute difference: {max_diff:.2e}")
print(f"Are they equal (within 1e-10)? {max_diff < 1e-10}")
print()

Verification:
Maximum absolute difference: 4.44e-16
Are they equal (within 1e-10)? True



### Alternative Verification using Scikit-learn

Let's also verify using scikit-learn to ensure our manual implementation is correct.

In [7]:
# Alternative verification using sklearn
print("Alternative verification using sklearn:")

# Full regression with sklearn
reg_full = LinearRegression(fit_intercept=False)
reg_full.fit(X_full, y.ravel())
beta1_sklearn_full = reg_full.coef_[:k1]

# FWL with sklearn
# Step 1: Get residuals y_tilde
reg_y_on_x2 = LinearRegression(fit_intercept=False)
reg_y_on_x2.fit(X2, y.ravel())
y_tilde_sklearn = y.ravel() - reg_y_on_x2.predict(X2)

# Step 2: Get residuals X1_tilde
X1_tilde_sklearn = np.zeros_like(X1)
for i in range(k1):
    reg_x1_on_x2 = LinearRegression(fit_intercept=False)
    reg_x1_on_x2.fit(X2, X1[:, i])
    X1_tilde_sklearn[:, i] = X1[:, i] - reg_x1_on_x2.predict(X2)

# Step 3: Final regression
reg_fwl = LinearRegression(fit_intercept=False)
reg_fwl.fit(X1_tilde_sklearn, y_tilde_sklearn)
beta1_sklearn_fwl = reg_fwl.coef_

print(f"β̂₁ from sklearn full regression: {beta1_sklearn_full}")
print(f"β̂₁ from sklearn FWL method: {beta1_sklearn_fwl}")

diff_sklearn = np.abs(beta1_sklearn_full - beta1_sklearn_fwl)
max_diff_sklearn = np.max(diff_sklearn)
print(f"Maximum absolute difference (sklearn): {max_diff_sklearn:.2e}")
print(f"Are they equal (within 1e-10)? {max_diff_sklearn < 1e-10}")

Alternative verification using sklearn:
β̂₁ from sklearn full regression: [1.52360737 1.99063613]
β̂₁ from sklearn FWL method: [1.52360737 1.99063613]
Maximum absolute difference (sklearn): 2.00e-15
Are they equal (within 1e-10)? True


### Summary of Results

Let's create a summary table of all our results.

In [8]:
# Create results summary
results_df = pd.DataFrame({
    'Method': ['True Values', 'Full Regression (Manual)', 'FWL Method (Manual)', 
               'Full Regression (Sklearn)', 'FWL Method (Sklearn)'],
    'β₁[0]': [beta1_true[0,0], beta1_full[0,0], beta1_fwl[0,0], 
              beta1_sklearn_full[0], beta1_sklearn_fwl[0]],
    'β₁[1]': [beta1_true[1,0], beta1_full[1,0], beta1_fwl[1,0], 
              beta1_sklearn_full[1], beta1_sklearn_fwl[1]]
})

print("\n=== RESULTS SUMMARY ===")
print(results_df.to_string(index=False, float_format='%.6f'))

print(f"\nMaximum difference between methods: {max(max_diff, max_diff_sklearn):.2e}")
print("\n✅ FWL Theorem verification SUCCESSFUL!")
print("The full regression and FWL two-step procedure produce identical results.")


=== RESULTS SUMMARY ===
                   Method    β₁[0]    β₁[1]
              True Values 1.500000 2.000000
 Full Regression (Manual) 1.523607 1.990636
      FWL Method (Manual) 1.523607 1.990636
Full Regression (Sklearn) 1.523607 1.990636
     FWL Method (Sklearn) 1.523607 1.990636

Maximum difference between methods: 2.00e-15

✅ FWL Theorem verification SUCCESSFUL!
The full regression and FWL two-step procedure produce identical results.


## Conclusion

We have successfully:

1. **Provided a complete mathematical proof** of the Frisch-Waugh-Lovell theorem using partitioned matrix algebra
2. **Numerically verified** the theorem using simulated data with both manual implementations and scikit-learn
3. **Demonstrated** that both the full regression and the FWL two-step procedure produce identical estimates (within machine precision)

The FWL theorem is a powerful tool in econometrics that allows us to:
- Isolate the effect of specific variables by "partialling out" control variables
- Understand the mechanics of multiple regression
- Implement efficient computational methods for large datasets

This completes Part 1 of Assignment 1.