# MIT 15.071x - Unit 3 - Logistic Regression

Logistic regression is an extension of linear regression, and is used in environments where the dependent variable is categorical.

The example is this case is an analysis of claims data, in which the dependent variable is modeled as a binary variable:
+  1 for low-quality care (_PoorCare_)
+  0 for high-quality care (_GoodCare_)

The probability that the outcome variable is 0 is just 1 minus the probability that the outcome variable is 1.

To predict the probability that y = 1, we use the _Logistic Response Function_.

$$P(y=1)= \frac{1}{1 + e^{-(\beta _{0} + \beta _{1}x _{1} + \beta _{2}x _{2} + ... +\beta _{k} x_{k})}}$$

The _Coefficients_, or _Betas_, are selected to predict a high probability for the actual poor care cases, and to predict a low probability for the actual good care cases.

+  A positive coefficient value for a variable increases the linear regression piece, which increases the probability that y = 1, or increases the probability of poor care.

+  A negative coefficient value for a variable decreases the linear regression piece, which in turn increases the probability of good care.

Another useful way to think about the logistic response function is in terms of Odds. The Odds are the probability of 1 divided by the probability of 0.

$$Odds = \frac{P(y=1)}{P(y=0)}$$

+ If y = 1 is more likely: Odds > 1
+ If y = 0 is more likely: Odds < 1
+ If outcomes are equally likely: Odds = 1

When the probabilities in the Odds are substituted by the Logistic Response Function, the Odds are equal to _e_ raised to the power of the linear regression equation.

$$Odds = e^{(\beta _{0} + \beta _{1}x _{1} + \beta _{2}x _{2} + ... +\beta _{k} x_{k})}$$

By taking the log of both sides, the _log(Odds)_, or what we call the _Logit_, looks exactly like the linear regression equation.

$$log(Odds) = \beta _{0} + \beta _{1}x _{1} + \beta _{2}x _{2} + ... +\beta _{k} x_{k}$$

+  A positive beta value increases the Logit, which in turn increases the Odds of 1.
+  A negative beta value decreases the Logit, which in turn, decreases the Odds of 1.

Suppose the coefficients of a logistic regression model with two independent variables are as follows:
$\beta_{0}=-1.5, \beta_1 = 3, \beta_2 = -0.5$

And we have an observation with the following values for the independent variables:
$x_1=1, x_2=5$

What is the value of the Logit for this observation? Recall that the Logit is log(Odds).

In [1]:
-1.5 + 3*1+ 5*(-0.5)

What is the value of the Odds for this observation? Note that you can compute e^x, for some number x, in your R console by typing exp(x). The function exp() computes the exponential of its argument.

In [2]:
Odds = exp(-1)

In [3]:
print(Odds)

[1] 0.3678794


What is the value of P(y = 1) for this observation?

In [4]:
PoorCare = Odds / (1 + Odds)

In [5]:
print(PoorCare)

[1] 0.2689414
