# Chapter 15 - Models with Categorical Outcomes

Thus far in the course, we've explored a great many types of models that the general linear model framework can handle. This gives us a large amount of flexibility in the types of research questions we can answer with statistics. Hopefully you are beginning to appreciate how understanding the fundamentals of the general linear model can open a great many avenues of inquiry for you. The general linear model is not the only way to do statistics (there are more advanced statistics classes you can take that will teach you different approaches for specific problems), but it is an extremely powerful one to start with.

The last major class of general linear model that we will learn in this course will address one thing we've been leaving out so far. Across all the regressions, interactions, nonparametric tests with all sorts of predictors and relationships between predictors, one thing has been contast - we've always been predicting a *continuous* outcome variable. Of course, there are research questions concerning categorical outcome variables as well. What political party is someone likely to join, whether or not someone is admitted to college, etc. These sorts of models are called **logistic regression**, and we will cover them here. 

## 15.1 Non-continuous outcomes

Standard regression models are linear combinations of parameters estimated from the data. Multiplying these parameters by different values of the predictor variables (or by each other) gives estimates of the outcome.

However, because there’s no hard limit on the range of predictor variables (at least, no limit coded into the model itself), the predictions of a linear model in theory range between negative -∞ (infinity) and +∞. Although values approaching infinity might be very unlikely, there is no hard limit on either the parameters we fit (the regression coefficients) or the predictor values themselves.

When outcome data are continuous or somewhat like a continuous variable this isn’t usually a problem. Although our models might predict some improbable values (for example, that someone is 8 feet tall), they will not often be strictly impossible.

It might occur to you at this point that, if a model predicted a height of -8 feet or a temperature below absolute zero, then this *would* be impossible. And this is true, and a theoretical violation of the assumption of the linear model that the outcome can range betweem -∞ (infinity) and +∞ (we'll talk more about the assumptions the GLM makes in chapter 20, and what violates those assumptions). Despite this, researchers use linear regression to predict many outcomes which have this type of range restriction and, although models can make strange predictions in edge cases, they are useful and can make good predictions most of the time.

However, for other types of outcomes like categories, this often won’t be the case. Standard linear regression will fail to make sensible predictions even in cases that are not unusual.

For binary data we want to predict the probability of a positive response, and this can range between zero and 1. For count data, predicted outcomes must always be non-negative (i.e. zero or greater). For these data, the lack of constraint on predictions from linear regression are a problem.

## 15.2 Link functions

One of the most common cases where this occurs is when you're trying to predict a binary outcome, yes or no. We already learned in simple regression how expressing this as a dummy-coded number, 1 or 0, let's us do math with it. That is also why we choose to express categorical outcome variables with dummy codes.

But hang on. When a dummy variable is an outcome, no other option outside the range of 0 to 1 is possible, and no other option *between* these values is possible either. This poses a problem for predictions based off a continuous regression line like in linear regression, where all possible inputs can be operated on to produce theoretically all possible output values. 

The answer to this in the GLM framework is to still make continuous predictions, but then translate those predictions into being either a 1 or 0 final answer. The mathematical tool for doing this kind of translation is called a **link function**. A link function, like the name suggests, is a separate mathematical function that *links* the output of a linear model to a corresponding value that makes sense in terms of the actual outcome variable. 

Simple linear regression actually has a link function too. It's just a trivial one, called the **identity function** - the linear model's prediction is mapped to an outcome value by its identity, its already-existing value. For other types of model different functions are used.

## 15.3 Log odds

The link function in logistic regression is called the **logit function** (hence the name logistic regression). The logit function expresses category membership in terms of the *log odds* of being in that category. If we're predicting whether or not someone is admitted to college, their *likelihood* of admittance is what the prediction means.

### Log odds conversion - step 1
The first step in converting a binary categorical variable into log odds is to consider the probability of someone being in one category versus the other. Recall our discussion of probabilities back in chapter 8. Under a probability model, we can't determine the value of any one datapoint, but we know over the whole population of datapoints how many end up in one category versus another. We know the *probability* of category membership. When 9% of students are admitted to Pomona and 91% are not, there is a 9% likelihood that any one person will end up in the category "admitted." 

Probabilities thus allow us to express a binary outcome - admitted or not - as a probability of category membership. Thus, we can now express this variable as a continuous variable between 0 and 1. True data will always have a 0 or 1 value - when someone has *already* been admitted to college or rejected, the probability of their admittance is 100% or 0%. But when making predictions about new data points, their predicted probability can be anything between 0 and 1. Predictors in a logistic model help us make guesses about these probabilities.

### Log odds conversion - step 2
We now have a binary variable expressed as a continuous variable, but it is still bounded between 0 and 1. We can transform a probability on the 0—1 scale to a 0 → ∞ scale by converting it to **odds**, which are expressed as a ratio:

$$odds = \frac{p}{1-p} $$

Probabilities and odds ratios are two equivalent ways of expressing the same idea. So a probability of P(X) = 0.5 equates to an odds ratio of 1 (i.e. 1 to 1 or 1/1, which is what 0.5/(1-0.5) reduces to). P(X) = 0.6 equates to odds of 1.5 (that is, 1.5 to 1, or 3 to 2). And P(X) = 0.95 equates to an odds ratio of 19 (19 to 1).

Odds convert or map probabilities from 0 to 1 onto the real numbers from 0 to ∞.

<img src="images/ch15-probtoodds.png" width="500">

We can reverse the transformation (which is important later) like so:

$$probability = \frac{odds}{1 + odds} $$

### Log odds conversion - step 3
When we convert a probability to odds, the odds can go up to infinity, but will never be less than 0. This is still a problem for our linear model. We’d like our regression coefficients to be able to vary between -∞ and ∞.

To avoid this restriction, we can take the *logarithm* of the odds — sometimes called the **logit** for conciseness. The figure below shows the transformation of probabilities between 0 and 1 to the log-odds scale. 

<img src="images/ch15-probtologit.png" width="500">

The logit has two nice properties:

- It converts odds of less than one to negative numbers, because the log of a number between 0 and 1 is always negative.
- It flattens the rather square curve for the odds in the figure above, giving you more interpretability among different logit values. 

By doing these mathematical conversion steps, we can now express a binary 0-or-1 variable as a continuous variable, possibly ranging from -∞ to ∞.


## 15.4 Interpretting a logistic model

As we’ve seen here, the logit or logistic link function transforms probabilities between 0 and 1 to the range from negative to positive infinity. This means logistic regression coefficients are in log-odds units, so we must interpret logistic regression coefficients differently from regular regression with continuous outcomes. Consider the equation form of a simple logistic regression model:

$$logit(Y_i) = b_0 + b_1X_i + e_i$$

The right side of the equation, the actual model side, is built the same between a linear model and a logistic model. We still combine the values of predictors with coefficients in order to make predictions. However, in linear regression, a coefficient like b<sub>1</sub> means the change in the outcome (expressed with the outcome's units) for a unit change in the predictor.

For logistic regression, the same coefficient means the change in the *log odds* of the outcome being 1, for a unit change in the predictor.

If we want to interpret logistic regression in terms of probabilities, we need to undo the transformation described in steps 1 and 2. To do this:

We take the exponent of the logit to ‘undo’ the log transformation. This gives us the predicted odds.

We convert the odds back to probability.