## Question 01: Interpreting $\beta$ estimates

#### 1. Load the wooldridge package

In [3]:
import wooldridge
wooldridge.data()

  J.M. Wooldridge (2019) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93    

#### 2. Load and review the data description

In [6]:
wooldridge.data('wage1', description=True)

name of dataset: wage1
no of variables: 24
no of observations: 526

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| wage     | average hourly earnings         |
| educ     | years of education              |
| exper    | years potential experience      |
| tenure   | years with current employer     |
| nonwhite | =1 if nonwhite                  |
| female   | =1 if female                    |
| married  | =1 if married                   |
| numdep   | number of dependents            |
| smsa     | =1 if live in SMSA              |
| northcen | =1 if live in north central U.S |
| south    | =1 if live in southern region   |
| west     | =1 if live in western region    |
| construc | =1 if work in construc. indus.  |
| ndurman  | =1 if in nondur. manuf. indus.  |
| trcommpu | =1 if in trans, commun, pub ut  |
| trade    | =1 if in wholesale or retail    |
| services | =1 if in services indus.  

#### 3. Load the wage1 dataset

In [9]:
dta = wooldridge.data('wage1')
dta.dtypes

wage        float64
educ          int64
exper         int64
tenure        int64
nonwhite      int64
female        int64
married       int64
numdep        int64
smsa          int64
northcen      int64
south         int64
west          int64
construc      int64
ndurman       int64
trcommpu      int64
trade         int64
services      int64
profserv      int64
profocc       int64
clerocc       int64
servocc       int64
lwage       float64
expersq       int64
tenursq       int64
dtype: object

#### 4. Produce and review a summary statistics of the data

In [12]:
dta.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
wage,526.0,5.896103,3.693086,0.53,3.33,4.65,6.88,24.98
educ,526.0,12.562738,2.769022,0.0,12.0,12.0,14.0,18.0
exper,526.0,17.01711,13.57216,1.0,5.0,13.5,26.0,51.0
tenure,526.0,5.104563,7.224462,0.0,0.0,2.0,7.0,44.0
nonwhite,526.0,0.102662,0.303805,0.0,0.0,0.0,0.0,1.0
female,526.0,0.479087,0.500038,0.0,0.0,0.0,1.0,1.0
married,526.0,0.608365,0.48858,0.0,0.0,1.0,1.0,1.0
numdep,526.0,1.043726,1.261891,0.0,0.0,1.0,2.0,6.0
smsa,526.0,0.722433,0.448225,0.0,0.0,1.0,1.0,1.0
northcen,526.0,0.250951,0.433973,0.0,0.0,0.0,0.75,1.0


#### 5. Run a regression of natural logarithm of wages (lwage) against 'exper', 'educ', 'tenure'

In [43]:
import statsmodels.api as sm
import pandas as pd

X = dta[['exper', 'educ', 'tenure']]
y = dta['lwage']

X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Tue, 05 Nov 2024   Prob (F-statistic):           9.13e-43
Time:                        10:15:06   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2844      0.104      2.729      0.0

#### 6. A
Regression coefficient for 'educ' is $0.092$. Thus additional year of 'educ' will increase logarithm of wage by 0.092 on average; which means it will increase wage, or hourly earnings, 9.2% on average. Answer is A. 

#### 7. A, B
p value for 'exper' is 0.017. 

#### 8. Estimate the log wage and wage based on the average values for exper, educ, and tenure using the predict function.

In [20]:
import numpy as np

mean_values0 = X.mean()
lwageHat0 = model.predict(mean_values0).iloc[0]
print("Estimation of the log wage: ", lwageHat0)
print("Estimation of the wage: ", np.exp(lwageHat0))

Estimation of the log wage:  1.623268444558513
Estimation of the wage:  5.069633081941634


#### 9. Reestimate the wage based on the average values again, but add one more year of education.

In [23]:
lwageHat1 = model.predict(mean_values0 + [0, 0, 1, 0]).iloc[0]
print("Estimation of the log wage: ", lwageHat1)
print("Estimation of the wage: ", np.exp(lwageHat1))

dta['educ'] = dta['educ'] + 1

Estimation of the log wage:  1.7152974329923563
Estimation of the wage:  5.558328496800482


#### 10. Compute the percentage increase in wages

In [26]:
((np.exp(lwageHat1) - np.exp(lwageHat0)) / np.exp(lwageHat0)) * 100

9.639660444058816

#### 11. Leverage the regression results to compute the approximate increase in wages due to one additional year of education.

In [29]:
model.params['educ'] * 100

9.202898843384322

In [67]:
# Approximate increase in wages due to one additional year of education
np.exp(lwageHat1) - np.exp(lwageHat0)

0.4886954148588476

#### 12. E

In [70]:
model.summary()

0,1,2,3
Dep. Variable:,lwage,R-squared:,0.316
Model:,OLS,Adj. R-squared:,0.312
Method:,Least Squares,F-statistic:,80.39
Date:,"Tue, 05 Nov 2024",Prob (F-statistic):,9.13e-43
Time:,10:22:46,Log-Likelihood:,-313.55
No. Observations:,526,AIC:,635.1
Df Residuals:,522,BIC:,652.2
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,0.2844,0.104,2.729,0.007,0.080,0.489
exper,0.0041,0.002,2.391,0.017,0.001,0.008
educ,0.0920,0.007,12.555,0.000,0.078,0.106
tenure,0.0221,0.003,7.133,0.000,0.016,0.028

0,1,2,3
Omnibus:,11.534,Durbin-Watson:,1.769
Prob(Omnibus):,0.003,Jarque-Bera (JB):,20.941
Skew:,0.021,Prob(JB):,2.84e-05
Kurtosis:,3.977,Cond. No.,135.0
