# Linear Probability Model

R Package requirements:
* `car`
* `broom`

`Mroz` data frame contains data about 753 married women. These data are collected within the "Panel Study of Income Dynamics" (PSID). Of the 753 observations, the first 428 are for women with positive hours worked in 1975, while the remaining 325 observations are for women who did not work for pay in 1975. A more complete discussion of the data is found in Mroz (1987), Appendix 1.

References: Mroz, T.A. (1987). The Sensitivity of an Empirical Model of Married Women's Hours of Work to Economic and Statistical Assumptions. *Econometrica*, **55**, 765–799.



This data frame contains the following columns:

* **lfp**: labor-force participation; a factor with levels: no; yes.
* **k5**: number of children 5 years old or younger.
* **k618**: number of children 6 to 18 years old.
* **age**: in years.
* **wc**: wife's college (university) attendance; a factor with levels: no; yes.
* **hc**: husband's college attendance; a factor with levels: no; yes.
* **lwg**: log expected wage rate; for women in the labor force, the actual wage rate; for women not in the labor force, an imputed value based on the regression of lwg on the other variables.
* **inc**: family income exclusive of wife's income.

In [2]:
library(car)
library(broom)

In [3]:
mroz <- Mroz
head(mroz)

Unnamed: 0_level_0,lfp,k5,k618,age,wc,hc,lwg,inc
Unnamed: 0_level_1,<fct>,<int>,<int>,<int>,<fct>,<fct>,<dbl>,<dbl>
1,yes,1,0,32,no,no,1.2101647,10.91
2,yes,0,2,30,no,no,0.3285041,19.5
3,yes,1,3,35,no,no,1.5141279,12.04
4,yes,0,3,34,no,no,0.0921151,6.8
5,yes,1,2,31,yes,no,1.5242802,20.1
6,yes,0,0,54,no,no,1.5564855,9.859


In [4]:
mroz$lfp = ifelse(mroz$lfp=="yes", 1, 0)
mroz$wc = ifelse(mroz$wc=="yes", 1, 0)
mroz$hc = ifelse(mroz$hc=="yes", 1, 0)
summary(mroz)

      lfp               k5              k618            age       
 Min.   :0.0000   Min.   :0.0000   Min.   :0.000   Min.   :30.00  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:36.00  
 Median :1.0000   Median :0.0000   Median :1.000   Median :43.00  
 Mean   :0.5684   Mean   :0.2377   Mean   :1.353   Mean   :42.54  
 3rd Qu.:1.0000   3rd Qu.:0.0000   3rd Qu.:2.000   3rd Qu.:49.00  
 Max.   :1.0000   Max.   :3.0000   Max.   :8.000   Max.   :60.00  
       wc               hc              lwg               inc        
 Min.   :0.0000   Min.   :0.0000   Min.   :-2.0541   Min.   :-0.029  
 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.8181   1st Qu.:13.025  
 Median :0.0000   Median :0.0000   Median : 1.0684   Median :17.700  
 Mean   :0.2815   Mean   :0.3918   Mean   : 1.0971   Mean   :20.129  
 3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 1.3997   3rd Qu.:24.466  
 Max.   :1.0000   Max.   :1.0000   Max.   : 3.2189   Max.   :96.000  

OLS Model
$$
y_{i} =\beta _{1} +\beta _{2} x_{i2} +\beta _{3} x_{i3} +\beta _{4} x_{i4} +u_{i} 
$$ 

$$
y_{i} =\left \{ \begin{array}{ll} 1 \quad \textrm{if individual i works outside the home}\\
0 \quad \textrm{otherwise}
\end{array} \right.
$$
* $x_{2}$: other income (incl partner's earnings, etc.) ($x_{2}$ is measured in thousands dollars)
* $x_{3}$: university education
* $x_{4}$: # children less than 6 years old 

What do you expect the signs of $\beta$'s to be?

In [5]:
mroz.lpm<-lm(mroz$lfp ~ mroz$inc+mroz$wc+mroz$k5, data=mroz)    
tidy(mroz.lpm)
#summary(mroz.lpm)

term,estimate,std.error,statistic,p.value
<chr>,<dbl>,<dbl>,<dbl>,<dbl>
(Intercept),0.698271765,0.035043283,19.92598,3.281793e-71
mroz$inc,-0.007141772,0.001538121,-4.643181,4.051074e-06
mroz$wc,0.233627958,0.039979959,5.843627,7.618145e-09
mroz$k5,-0.218319205,0.033015707,-6.612586,7.180021e-11


The estimated probabilities can be written as:
$$
\widehat{\Pr }\left(y_{i} =1\right)=\hat{y}_{i} =\hat{\beta }_{1} +\hat{\beta }_{2} x_{i2} +\hat{\beta }_{3} x_{i3} +\hat{\beta }_{4} x_{i4} 
$$ 
Interpreting the coefficients and the probabilities
$$
\frac{\partial P\left(y=1|x\right)}{\partial x_{i} } =\beta _{i} \quad i=0,1,2,3
$$
1. Interpret the constant term, $\beta_{1}$
2. Interpret Other Income, $\beta_{2}$. What would you predict if she obtains \$10,000 more of Other Income?
3. Interpret Years Education, $\beta_{3}$.
4. Interpret # kids $<6$yrs? What would you predict if the family has an additional kid $<6$yrs
5. What is the probability for a woman with $0$ other income, university education and $1$ kids $<6$yrs?
6. What is the probability for a woman with $0$ other income, $15$ years of education and $3$ kids $<6$yrs?

In [6]:
coefs <- mroz.lpm$coefficients[1:4]
values <- c(1,0,1,1)
probhat <- sum(coefs*values)
print(probhat)

[1] 0.7135805
