# Econometrics seminar

### Wooldridge
We are gonno need Data sets from wooldridge. [Click here](https://pypi.org/project/wooldridge/) to get manual to install

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import wooldridge

In [2]:
wooldridge.data()

  J.M. Wooldridge (2019) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93    

1. The data in 401K.RAW are a subset of data analyzed by Papke (1995) to study the rela- tionship between participation in a 401(k) pension plan and the generosity of the plan. The variable *prate* is the percentage of eligible workers with an active account; this is the variable we would like to explain. The measure of generosity is the plan match rate, *mrate*. This variable gives the average amount the firm contributes to each worker’s plan for each 100¢ contribution by the worker. For example, if mrate = 0.50, then a 100¢ contribution by the worker is matched by a 50¢  contribution by the firm.
    1. Find the average participation rate and the average match rate in the sample of plans.
    2. Now, estimate the simple regression equation prate 5 bˆ0 1 bˆ1 mrate, and report the results along with the sample size and R-squared.
    3. Interpret the intercept in your equation. Interpret the coefficient on mrate.
    4. Find the predicted prate when mrate 5 3.5. Is this a reasonable prediction? Explain what is happening here.
    5. How much of the variation in prate is explained by mrate? Is this a lot in your opinion?

<p id="average"><b>1</b></p>
The data in 401K.RAW

In [3]:
wooldridge.data('401k', description=True)

name of dataset: 401k
no of variables: 8
no of observations: 1534

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| prate    | participation rate, percent     |
| mrate    | 401k plan match rate            |
| totpart  | total 401k participants         |
| totelg   | total eligible for 401k plan    |
| age      | age of 401k plan                |
| totemp   | total number of firm employees  |
| sole     | = 1 if 401k is firm's sole plan |
| ltotemp  | log of totemp                   |
+----------+---------------------------------+

L.E. Papke (1995), “Participation in and Contributions to 401(k)
Pension Plans:Evidence from Plan Data,” Journal of Human Resources 30,
311-325. Professor Papke kindly provided these data. She gathered them
from the Internal Revenue Service’s Form 5500 tapes.


In [4]:
df = wooldridge.data('401k')
df

Unnamed: 0,prate,mrate,totpart,totelg,age,totemp,sole,ltotemp
0,26.100000,0.21,1653.0,6322.0,8,8709.0,0,9.072112
1,100.000000,1.42,262.0,262.0,6,315.0,1,5.752573
2,97.599998,0.91,166.0,170.0,10,275.0,1,5.616771
3,100.000000,0.42,257.0,257.0,7,500.0,0,6.214608
4,82.500000,0.53,591.0,716.0,28,933.0,1,6.838405
...,...,...,...,...,...,...,...,...
1529,85.099998,0.33,553.0,650.0,24,907.0,0,6.810143
1530,100.000000,2.52,142.0,142.0,17,197.0,1,5.283204
1531,100.000000,2.27,1928.0,1928.0,35,2171.0,0,7.682943
1532,100.000000,0.58,166.0,166.0,8,931.0,1,6.836259


<p id="average"><b>1.A</b></p>
Find the average participation rate and the average match rate in the sample of plans.

In [6]:
print(f'The average participation rate is {df.prate.mean().round(3)}')

The average participation rate is 87.363


In [7]:
print(f'The average participation rate is {df.mrate.mean().round(3)}')

The average participation rate is 0.732


<p id="average"><b>1.B</b></p>
Now, estimate the simple regression equation $$\widehat{prate} =\widehat{ \beta_0} + \widehat{ \beta_1}mrate ,$$ and report the results along with the sample size and $R$-squared.

In [20]:
import statsmodels.formula.api as smf
mod = smf.ols(formula='prate ~ mrate', data=df)
res = mod.fit()
print(res.summary())

                            OLS Regression Results                            
Dep. Variable:                  prate   R-squared:                       0.075
Model:                            OLS   Adj. R-squared:                  0.074
Method:                 Least Squares   F-statistic:                     123.7
Date:                Wed, 04 Jun 2025   Prob (F-statistic):           1.10e-27
Time:                        13:27:16   Log-Likelihood:                -6437.0
No. Observations:                1534   AIC:                         1.288e+04
Df Residuals:                    1532   BIC:                         1.289e+04
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     83.0755      0.563    147.484      0.0

<p id="average"><b>1.C</b></p>
Interpret the intercept in your equation. Interpret the coefficient on mrate.

<p id="average"><b>1.D</b></p>
Find the predicted *prate* when $mrate = 3.5$. Is this a reasonable prediction?
Explain what is happening here.

In [62]:
res.predict(df.loc[df['mrate']==3.5])

730    103.589233
dtype: float64

<p id="average"><b>1.E</b></p>
How much of the variation in prate is explained by mrate? Is this a lot in your opinion?

In [66]:
print(f'The variation in prate is explained by mrate {res.rsquared.round(3)}')

The variation in prate is explained by mrate 0.075
