# Binary dependent variable Models

- The outcomes or responses are  discrete
    - 1 or 0
    - E.g. Decision to take a job or not; to pursue higher studies or not etc.

   

### Linear Probability Model


- Dependent variable is a binary variable

- $y=X\beta +\varepsilon_{i}$ where $E(y_{i}|X)=X\beta =\beta _{0}+\beta _{1}x_{1i}+...+\beta _{k}x_{ki}$

$y_{i}=\left\{ 
\begin{array}{l}
1-\text{if some event occurs} \\
0-\text{otherwise}
\end{array}
\right.$

- Consider $\ y_{i}=\alpha +\beta x_{i}+\varepsilon _{i},$ where $
y_{i}=\left\{ 
\begin{array}{l}
1-\text{if  ith woman is in paid employment} \\ 
0-\text{otherwise}
\end{array}
\right. $

    - Unconditional $E(y_{i})=[1\times P(y_{i}=1)]+[0\times P(y_{i}=0)]=P(y_{i}=1)$
    - Conditional $E(y_{i}|x_{i})=[1\times P(y_{i}=1|x_{i})]+[0\times P(y_{i}=0|x_{i})]=P(y_{i}=1|x_{i})$

- Hence $E(y_{i}|x_{i})=\alpha +\beta x_{i}=P(y_{i}=1|x_{i})=P_{i}$

- $\beta $ measures the expected change in probability of being in paid
employment due to one unit change in $x_{i}$

- In a multivariate case


- $E(y_{i}|X_{i})=X_{i}\beta =P_{i}$ and

- $\beta _{k}$ measures the expected change in the probability of occurrence of an event due to one unit change in $x_{k}$ holding all other
variables constant

- A unit change in $x_{k}$ always result in the same change in the probability-'Linear Probability Model'

#### Problems with LPM

##### Properties of errors

- $\varepsilon _{i}$ can take two values

- $\varepsilon _{i}=\left\{ 
\begin{array}{l}
1-X_{i}\beta \text{\  if }y_{i}=1\text{ with probability }P_{i} \\ 
-X_{i}\beta \text{ \ if }y_{i}=0\text{ with probability }1-P_{i}
\end{array}
\right. $

- Errors are not normal

- $E(\varepsilon _{i}|X_{i})=P_{i}(1-X_{i}\beta )+(1-P_{i})(-X_{i}\beta
) $ substituting \ $P_{i}=X_{i}\beta $ we get

- $E(\varepsilon _{i}|X_{i})=0$

- $var(\varepsilon _{i}|X_{i})=P_{i}(1-X_{i}\beta
)^{2}+(1-P_{i})(-X_{i}\beta )^{2}=X_{i}\beta (1-X_{i}\beta
)^{2}+(1-X_{i}\beta )(-X_{i}\beta )^{2}=X_{i}\beta (1-X_{i}\beta )$

- Heteroskedastic errors - variance of error depends on $X_{i}$

- OLS is inefficient and std. errors are biased


##### Interpretation of coefficients

- $E(y_{i}|X_{i})=X_{i}\beta $is interpreted as probabilities and in many cases $X_{i}\beta $ lie outside the interval (0,1)

- Functional form: Probability is assumed to be linear - it is often unrealistic

##  Latent variable intrepretation 

- e.g.: Participation in the labour market

- $y_{i}=\left\{ 
\begin{array}{l}
1-\text{if \ ith woman is in paid employment} \\ 
0-\text{otherwise}
\end{array}
\right. $

- $y_{i}$ is the observed variable but we can assume the values of $y_{i} $ is dependent of some unobserved variables
- Assume woman $i$ has some expected wage (reservation wage) and she decides to get employment if market wage is greater than the expected
- Let $w_{i}=$Market wage, $w_{i}^{\ast }=$ reservation wage

- Now $\ y_{i}^{\ast }=w_{i}- w_{i}^{\ast }$

- $y_{i}=\left\{ 
\begin{array}{l}
1\text{ \ if }y_{i}^{\ast }>0 \\ 
0\text{ if }y_{i}^{\ast }\leq 0
\end{array}
\right. $

- One observes only the participation or nonparticipation ($y_{i}$)

- Choice is dependent upon some unobserved variable called 'latent' variable $y_{t}^{\ast }$


- $y_{t}^{\ast }-$ Can be influenced by other factors too
- In general; $y_{i}^{\ast }=X_{i}\beta +\varepsilon _{i}$ and

- $y_{i}=\left\{ 
\begin{array}{l}
1\ \text{if }y_{i}^{\ast }>\tau \\ 
0\ \text{if }y_{i}^{\ast }\leq \tau
\end{array}
\right. $

- $E(y_{i}^{\ast }|X_{i})=X_{i}\beta $ (it captures the determinants of $
y^{\ast })$

- Cannot use OLS since dependent variable is unobserved

- Maximum Likelihood estimators has to be used hence assumptions regarding errors are needed
    - Normal errors -Probit model
    - logistic errors - Logit model

### Probit  and Logit Model


- Given $y^{\ast }=X\beta +\varepsilon $ \ where

- $y=\left\{ 
\begin{array}{l}
1\ \text{if }y^{\ast }>0 \\ 
0\ \text{if }y^{\ast }\leq 0
\end{array}
\right. $ and $\varepsilon $ $\sim (0,\sigma ^{2})$

- Errors are normally distributed with $\sigma ^{2}=1$ for probit model


- The pdf of errors is :$\phi (\varepsilon )=\frac{1}{\sqrt{2\pi }}\exp
\left( -\frac{\varepsilon ^{2}}{2}\right) $

- cdf :$\Phi (\varepsilon )=$ $\int\limits_{-\infty }^{\varepsilon }%
\frac{1}{\sqrt{2\pi }}\exp \left( \frac{z_{i}^{2}}{2}\right) dz_{i}$


- Errors are logistically distributed with $\sigma ^{2}=\pi
^{2}/2\approx $ 3.29 for a logit model


- pdf: $\lambda (\varepsilon )=\frac{\exp (\varepsilon )}{[1+\exp
(\varepsilon )]^{2}}$

- cdf:$\Lambda =\frac{\exp (\varepsilon )}{1+\exp (\varepsilon )}$

- Hence latent variable $y_{t}^{\ast }$ is assumed to be
normally/logistically distributed but not its realization $y_{i}$

- But we can get the $P(y_{i}=1|X)$ using the error distribution

- Since $P(y_{i}=1|X)$ if $y^{\ast }>0,$ and we know the pdf of $y^{\ast 
\text{ }}$ by virtue of error distribution, then it is possible to get $
P(y_{i}=1|X)$ for a particular value of $y^{\ast }$

- $y^{\ast }$ can be estimated using ML as a function of independent
variables

- If we use std normal dbn to link $\ y_{i}$ to $y^{\ast }$ , its a
'probit link'

- If we use logistic dbn to link $\ y_{i}$ to $y^{\ast }$ , its a 'logit
link' 

- Given $y^{\ast }=X_{i}\beta +\varepsilon $ \ where

- $y=\left\{ 
\begin{array}{l}
1\ \text{if }y^{\ast }>0 \\ 
0\ \text{if }y^{\ast }\leq 0
\end{array}%
\right.  $and $\varepsilon \sim (0,\sigma ^{2})$

\begin{array}\  {P}(y_{i}=1|X)&=P(y^{\ast }>0|X)\\
&=P(X_{i}\beta +\varepsilon _{i}>0|X)\\
&=P(\varepsilon _{i}>-X_{i}\beta |X)\\
&=P(\varepsilon _{i}\leq X_{i}\beta |X)
\end{array}


- This gives the probability of getting an error less than or equal to some value

- Which is nothing but the cdf of error evaluated at $\beta ^{\prime }X$
hence


- $P(y_{i}=1|X)=F(X_{i}\beta )=P(\varepsilon _{i}\leq X_{i}\beta
)=\int\limits_{-\infty }^{X_{i}\beta }f(z_{i})dz_{i}$


- where $F$ normal cdf for probit model and logistic for logit model
    - Probit:$ F(X_{i}\beta )=\int\limits_{-\infty }^{X_{i}\beta }\frac{1
}{\sqrt{2\pi }}\exp (-\frac{1}{2}z_{i}^{2})dz_{i}=\Phi (X_{i}\beta )$
    - Logit:$F(X_{i}\beta )=\frac{\exp (X_{i}\beta )}{1+\exp (X_{i}\beta )}
=\Lambda (X_{i}\beta )$


- Both cases F returns a value between 0 and 1

- In a probit/logit model, the value of $X_{i}\beta $ is taken to be the
z-value of a normal/logistic distribution

- Higher values of $X_{i}\beta $ means that the event is more likely to
happen

- Have to be careful about the interpretation of estimation results here

- A one unit change in $X_{i}$ to a $\beta $ change in the z-score of Y

- The estimated curve is an S-shaped cumulative normal distribution

- Consider a single variable case

- $P(y_{i}=1|x_{i})=F(\alpha +\beta x_{i})$

- For each one unit increase in $x_{i}$ the argument of $F [(E(y^{\ast
}|X)]$ increases by  $\beta$ units

### ML estimation


- Given $p_{i}=P(y_{i}=1|X_{i})$ and $1-p_{i}=P(y_{i}=0|X_{i})$

- The likelihood is the product of $p_{i}$ for all observations for which $y_{i}=1$ multiplied by $1-p_{i}$ for which $y_{i}=0$

- $L(\beta|y,X)=\prod\limits_{i=1}^{n1}p_{i}\prod\limits_{i=1}^{n2}1-p_{i}$ where $n1$= no of observation where $y_{i}=1$ and $\ n2=$ number of observation where $y_{i}=0$
- $=\prod\limits_{i=1}^{n1}F(X_{i}\beta )\prod\limits_{i=1}^{n2}[1-F(X_{i}\beta )]$
- Taking log we get log likelihood function
- $\ln (L)=\sum\limits_{i=1}^{n1}\ln [F(X_{i}\beta )]\sum\limits_{i=1}^{n2}\ln [1-F(X_{i}\beta )]$

- Maximizing $\ln (L)$ w.r.t $\beta $ gives ML estimator of $\beta $

#### Interpretation

- For each observation of independent variables we can estimate the predicted probabilities


- Predicted probabilities can be summarized as a range( min- max)

- Predicted probability for ith individual:  $\widehat{P}(y_{i}=1|X)=F(X
\widehat{\beta })$

- For +ve $\beta$'s $F(X\widehat{\beta })$ gives minimum for if we take
lower extreme of $x_{i}$

- if $\beta $ is negative lower extreme value of a variables will return
maximum probability


- Caution extreme values (outliers) can give misleading interpretations

- If outliers are there, consider 5th quantile and 95th quantile,instead of minimum and maximum

Interpretation:Example(Probit)


- $y_{i}=1$ if the ith individual travels by own vehicles, 0=otherwise

- $x_{i}=$ Difference in the travel time between public transport and own
vehicle( in minutes)

- $p_{i}=\Phi (X_{i}\beta )=\Phi (\alpha +\beta x_{i})=\Phi (Z)$

- $\widehat{p}_{i}=\Phi (\underset{(-0.161)}{-0.0644}+\underset{(2.95)}{0.0299}x_{i})$


- If $x_{i}=20:\Phi (0.54),$ hence $P(z\leq 0.54)=0.7054$

- If $x_{i}=5:\Phi (0.085),P(z\leq 0.085)=0.53387$

#### Interpretation:Marginal effects 


- In linear regression, if the coefficient on $x_{1}$ is $\beta _{1}$%
,then a 1-unit increase in $x_{1}$ increases $y$ by $\beta _{1}$.

- In a probit model t a 1-unit increase in $x_{1}$ increases the z score
of $P(y=1)$ by $\beta _{1}$

- An increase in $x_{1}$ has a constant effect on $y^{\ast }$, but this
doesn't translate into a constant effect on the original y \ or its
probability

- For a logit/probit model

- $p=P(y=1|X)=F(X\beta )$

- where F is either the cdf $\Phi $ for normal disb or the cdf $\Lambda $
for the logistic dbn

- The marginal effect of $x_{k}$-partial change in the prob due to a
change in $x_{k}$ is

- $\frac{\partial p_{i}}{\partial x_{k}}=\frac{\partial F(X_{i}\beta )}{
\partial x_{k}}=\frac{dF(X_{i}\beta )}{dX\beta }\frac{\partial X_{i}\beta }{
\partial x_{k}}=f(X_{i}\beta )\beta _{k};$ where $f$ is pdf

- For a probit model $\frac{\partial p_{i}}{\partial x_{k}}=\phi (X\beta
)\beta _{k}$

- Since \ $f(X_{i}\beta )$ is always positive, the sign of $\widehat{%
\beta }_{k}$ indicates the direction of relation between $\ x_{k}$ and $%
p_{i} $

- The magnitude of the change in the probability given a change in $%
x_{k} $ is determined by the magnitude of $\widehat{\beta }_{k}$ and \ $%
f(X_{i}\beta )$

- Consequently $\frac{\partial p_{i}}{\partial x_{k}}$ varies with $%
x_{k} $ and values other independent variables


- Example:

- $\widehat{p}_{i}=\Phi (\underset{(-0.161)}{-0.0644}+\underset{(2.95)}{%
0.0299}x_{i})$

- $\frac{\partial \widehat{p}_{i}}{\partial x_{i}}=(\widehat{\beta }%
)\times \phi (\widehat{\alpha }+\widehat{\beta }x_{i})=(\widehat{\beta }%
)\times \phi (z)$


- $=(0.0299)\times \underset{\text{Changes with }x_{i}}{\underbrace{\phi
(-0.0644+0.0299x_{i})}}$


- If $x_{i}=20$


- $\frac{\partial \widehat{p}_{i}}{\partial x_{i}}=0.0299\times \phi
(0.54)=0.0104$ where $\phi (0.54)=0.3448$



- Since marginal effects depends on the \ level of all variables, we
need to summarize both.

- There are two prominent methods

- One method is to compute average over all observations


- Mean$\frac{\partial \widehat{p}_{i}}{\partial x_{ik}}=\frac{1}{N}%
\sum\limits_{i=1}^{N}$\ $f(X_{i}\beta )\beta _{k}$


- Another method is to compute the ME at the mean of the independent
variables


- $\frac{\partial \widehat{p}_{i}}{\partial x_{ik}}=f(\overline{X}%
_{i}\beta )\beta _{k}$


- Both methods give identical results most of the time

- Marginal effect of dummy variable: $\widehat{p}_{i}^{1}-\widehat{p}%
_{i}^{0}$


- Exactly the same issues arise with logit as with probit model (but
formulae differ)


- Predicted probability :$\widehat{P}(y_{i}=1|X)=F(X\widehat{\beta })$
where $F=\Lambda $

- Marginal effects:$\frac{\partial \widehat{p}_{i}}{\partial x_{ik}}=%
\frac{\partial \Lambda (X\widehat{\beta })}{\partial x_{ik}}=\lambda
(X_{i}\beta )\beta _{k}$


- $=\frac{\exp (X_{i}\beta )}{[1+\exp (X_{i}\beta )]^{2}}\beta _{k}$

- $=\left( \frac{\exp (X_{i}\beta )}{1+\exp (X_{i}\beta )}\right) \left( 
\frac{1}{1+\exp (X_{i}\beta )}\right) \beta _{k}$

### Logit in terms of odds

- Logit was derived from a latent variable perspective so far

- There is an alternative derivation based on odds ratio

- If some event occurs with probability $p=P(y=1|X)$, then the odds of
it happening are $O(p)=p/(1-p)$

- Odds indicate how often something happens relative to how often it
doesnt happen
    - $p=0\Rightarrow O(p)=0$
    - $p=0.25\Rightarrow O(p)=1/3$ Odds are 1-to-3 against
    - $p=0.75\Rightarrow O(p)=3/1$  Odds are 3-to-1 in favor


- Original y=(0,1), y as probability=(0 to 1) , odds of y=(0 to $\infty
),$ log odds \ of y =(-$\infty $ to $\infty ),$


- $p_{i}=P(y_{i}=1|X_{i})=\frac{\exp (X_{i}\beta )}{1+\exp (X_{i}\beta )}%
;\Rightarrow $(0 to 1)

- $1-p_{i}=1-P(y_{i}=1|X_{i})=\frac{1}{1+\exp (X_{i}\beta )};$


- In particular odds


- $\Omega (X)=\frac{p_{i}}{1-p_{i}}=\exp (X_{i}\beta )\Rightarrow $(0 to 
$\infty )$


- log odds-konwn as logit


- $\ln \Omega $($X)=\ln \left( \frac{p_{i}}{1-p_{i}}\right) =X_{i}\beta
\Rightarrow $(-$\infty $ to $\infty ),$


- Since logit is log of odds logistic slope coefficients can be
interpreted as the effect of a unit of change in the X variable on the
predicted logits with the other variables in the model held constant

- $\ln \Omega (X)=X\beta $

- $\frac{\partial \ln \Omega (X)}{\partial x_{k}}=\beta _{k}$

- We can also drive the impact on a change in one varible on odds ratio


##  The Logit Classifier
- Classification maps probabilities into 0-1 classifications.
    - “Bayes classifier” uses a cutoff of .5.
– Decision boundary:
    - Suppose we use a Bayes classifier. We predict 1 when $\Lambda(X\beta)> \frac{1}{2}$. 
      - But that’s the same as predicting 1 when $X\beta=0$ since $\Delta(0)= \frac{1}{2}$.
   - Points on one side will be classified as 0, and points on the other side will be classified as 1. That line is the “decision boundary

## An example ( taken from Gujarati)

The data used here is a random sample of 1,196 US males.1.
The variables used in the analysis are as follows:
_Smoker_ = 1 for smokers and 0 for nonsmokers
_Age_ = age in years
_Education_ = number of years of schooling
_Income_ = family income
_Pcigs_ = price of cigarettes in individual states in 1979
### Linear Probability model

The Model to be estimated:
$$y_i=\beta_0+ \beta_1 Age_i + \beta_2 Educ_i + \beta_3 Income_i+ \beta_4 Pcigs+u_i$$
where 
$y_i$=1 if the individual is smoker and 0 otherwise

In [24]:
## Please intsall these packages if not installed in your syestem
library(huxtable)
library(knitr)
library(tidyverse)

In [15]:
data <- read.csv("logit_smoke.csv", sep=",")
lpm <- lm(smoker~age+educ+income+pcigs79, data=data)
kable(tidy(lpm), digits=2, align='c', caption=
        "LPM model of to smoke or not to smoke.")




|    term     | estimate | std.error | statistic | p.value |
|:-----------:|:--------:|:---------:|:---------:|:-------:|
| (Intercept) |   1.12   |   0.19    |   5.96    |  0.00   |
|     age     |   0.00   |   0.00    |   -5.70   |  0.00   |
|    educ     |  -0.02   |   0.00    |   -4.47   |  0.00   |
|   income    |   0.00   |   0.00    |   0.63    |  0.53   |
|   pcigs79   |  -0.01   |   0.00    |   -1.80   |  0.07   |

Notice that all the variables, except income, are individually statistically significant
at least at the 10% level of significance.

The interpretation of the regression coefficients is as follows. If we hold all other variables constant, the probability of smoking decreases at the rate of $\approx$0.005 as a person ages, probably due to the adverse impact of smoking on health. Likewise, other things being equal, an increase
in schooling by one year decreases the probability of smoking by 0.02. Similarly, if
the price of cigarettes goes up by a dollar, the probability of smoking decreases by
 $\approx$ 0.005, holding all other variables constant.

In [16]:
summary(lpm)$r.squared

 The R2 value of  $\approx$ 0.038 seems very low,
but one should not attach much importance to this because the dependent variable is
nominal, taking only values of 1 and zero. The LPM  is not the preferred choice for modeling indicator variables for the reasons I mentioned in the class. Let us estimate it via Probit and Logit models. use **glm()** function to estimate the models.
### Logit model
The results of the logit model is given below:

In [17]:
s_logit<- glm(smoker ~ age + educ + income + pcigs79, family=binomial(link="logit"), 
                   data=data)
kable(tidy(s_logit), digits=4, align='c', caption=
        " 1. Logit model of to smoke or not to smoke")



|    term     | estimate | std.error | statistic | p.value |
|:-----------:|:--------:|:---------:|:---------:|:-------:|
| (Intercept) |  2.7451  |  0.8292   |  3.3105   | 0.0009  |
|     age     | -0.0209  |  0.0037   |  -5.5773  | 0.0000  |
|    educ     | -0.0910  |  0.0207   |  -4.4021  | 0.0000  |
|   income    |  0.0000  |  0.0000   |  0.6583   | 0.5104  |
|   pcigs79   | -0.0223  |  0.0125   |  -1.7895  | 0.0735  |

The interpretation of the logit model in  is as follows: each slope coefficient
shows how the log of the odds in favor of smoking changes as the value of the X
variable changes by a unit 
Let us examine these results. The variables age and education are highly statistically significant and have the expected signs. As age increases, the value of the logit
decreases, perhaps due to health concerns – that is, as people age, they are less likely
to smoke. Likewise, more educated people are less likely to smoke, perhaps due to
the ill effects of smoking. The price of cigarettes has the expected negative sign and
is significant at about the 7% level. Keeping all other variables constant, the higher the price of cigarettes,
the lower is the probability of smoking. Income has no statistically visible impact on
smoking, perhaps because expenditure on cigarettes may be a small proportion of
family income.
The interpretation of the various coefficients is as follows: holding other variables
constant, if, for example, education increases by one year, the average logit value goes
down by$\approx$0.09, that is, the log of odds in favor of smoking goes down by about 0.09.
Other coefficients are interpreted similarly.
#### Marginal effects
In the LPM the slope coefficient measures the marginal effect of a unit change in
the explanatory variable on the probability of smoking, holding other variables
constant. This is not the case with the logit model, for the marginal effect of a unit change in the explanatory variable not only depends on the coefficient of
that variable but also on the level of probability from which the change is measured. But the latter depends on the values of all the explanatory variables in the
model. You can use functions from *mfx* packge to get the marginal effects



In [20]:
 library("mfx")
mlogit <-logitmfx(smoker ~ age + educ + income + pcigs79,data=data)

In [22]:
mlogit

Call:
logitmfx(formula = smoker ~ age + educ + income + pcigs79, data = data)

Marginal Effects:
              dF/dx   Std. Err.       z     P>|z|    
age     -4.8903e-03  8.7359e-04 -5.5979 2.170e-08 ***
educ    -2.1334e-02  4.8365e-03 -4.4111 1.029e-05 ***
income   1.1069e-06  1.6814e-06  0.6583   0.51033    
pcigs79 -5.2340e-03  2.9242e-03 -1.7899   0.07347 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The interpretation of these marginal effects is as follows: Keeping all other variables constant, if mean
age increases by a year, it reduces the probability of smoking approximately by 0.005.
Likewise, if mean education increases by a year, the probability of smoking goes
down by about 0.021, Keeping all other variables constant. A unit changes in mean income has no statistically significant impact on the rate of change in the probability of smoking. A unit
increase in the mean price of a pack of cigarettes reduces the probability of smoking
by about .005, which is significant at about the 7% level


#### The language of the odds ratio (OR)
Instead of expressing logits in terms of probabilities, we can express them in terms
of the odds ratios (OR). For our example, using Stata 12, we obtained the odds ratios
shown in Table below gives the odds ratios, their standard errors, and their z values (instead
of the t statistic).

In [25]:
options(scipen=999)
orlogit <-logitor(smoker ~ age + educ + income + pcigs79,data=data)
(orlogit)

Call:
logitor(formula = smoker ~ age + educ + income + pcigs79, data = data)

Odds Ratio:
           OddsRatio    Std. Err.       z         P>|z|    
age     0.9793627211 0.0036617871 -5.5773 0.00000002443 ***
educ    0.9130425607 0.0188687267 -4.4021 0.00001072081 ***
income  1.0000047202 0.0000071705  0.6583       0.51036    
pcigs79 0.9779283903 0.0121969923 -1.7895       0.07354 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

 These odds ratios are obtained by exponentiating the coefficients given in Table 1. For example,
the coefficient of age in Table 1 is –0.020853. Therefore, we compute $e^{–0.020853}$ =
0.9793627, which gives the value of the odds ratio.To interpret the odds ratios, keep in mind that an OR greater than 1 indicates an increased chance that an event (smoking) will occur rather than it not occurring. An odds ratio of less than 1 suggests a decreased chance of the event occurring versus
not occurring. An OR of 1 means that the chances of an event occurring or not
occurring are even. Another property of the odds ratio is that it is invariant with
respect to the ordering of the variables. Thus, ORa/ORb = 1/(ORb/ORa), where a and
b are two events. You will observe that the odds ratios for age and education
are less than one, suggesting that the probability of smoking decreases with respect
to these variables; not a surprising finding. More specifically, the odds of smoking
decrease by about 2% as one ages, ceteris paribus. Similarly, an additional year of
education reduces the odds of smoking by about 8.7%, ceteris paribus. The OR for income is not much different from 1. Th e OR for picgs79 also does not seem much
different from one

### Probit Model

As we have disussed in the class in  the probit model the error term has the normal distribution.Although the numerical values of the logit and probit coefficients are different, qualitatively the results are similar: the coefficients of age, education, and price of cigarettes are individually significant at least at the 10% level. The income coefficient,however, is not significant.

In [26]:
s_probit <- glm(smoker ~ age + educ + income + pcigs79, family=binomial(link="probit"), 
                   data=data)
kable(tidy(s_probit), digits=2, align='c', caption=
        "Probit model of to smoke or not to smoke")



|    term     | estimate | std.error | statistic | p.value |
|:-----------:|:--------:|:---------:|:---------:|:-------:|
| (Intercept) |   1.70   |   0.51    |   3.34    |  0.00   |
|     age     |  -0.01   |   0.00    |   -5.70   |  0.00   |
|    educ     |  -0.06   |   0.01    |   -4.46   |  0.00   |
|   income    |   0.00   |   0.00    |   0.62    |  0.54   |
|   pcigs79   |  -0.01   |   0.01    |   -1.79   |  0.07   |

#### Marginal effects
The interpretations are similar to logit model

In [28]:
mprobit <-probitmfx(smoker ~ age + educ + income + pcigs79,data=data)

mprobit

Call:
probitmfx(formula = smoker ~ age + educ + income + pcigs79, data = data)

Marginal Effects:
                dF/dx     Std. Err.       z         P>|z|    
age     -0.0049204438  0.0008628909 -5.7023 0.00000001182 ***
educ    -0.0213399178  0.0047859798 -4.4588 0.00000824044 ***
income   0.0000010324  0.0000016740  0.6168       0.53740    
pcigs79 -0.0052348582  0.0029164385 -1.7949       0.07266 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1