# Logistic Regression

The FBI have a [forensic glass service](https://www.fbi.gov/about-us/lab/forensic-science-communications/fsc/april2009/review/2009_04_review01.htm). They have a large database of the chemical composition of many types of glass, and what the glass was used for.

- 1 building_windows_float_processed 
- 2 building_windows_non_float_processed 
- 3 vehicle_windows_float_processed 
- 4 vehicle_windows_non_float_processed (none in this database) 
- 5 containers 
- 6 tableware 
- 7 headlamps

They analysed these glassware products for their refractive index (ri) and the composition of various elements (Na, Fe, K, etc).

The raw spreadsheet has no column headers, so we have to supply them ourselves

In [None]:
import pandas as pd
url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/glass/glass.data'
col_names = ['id','ri','na','mg','al','si','k','ca','ba','fe','glass_type']
glass = pd.read_csv(url, names=col_names, index_col='id')


Pretend we are investigating a crime scene, and we want to know what the probability is that some
glass fragments we have found came from a car or vehicle vs some kind of assorted glassware.

Remember that a pandas Series has a method .map() . You can use a function for this, or you can use a dictionary.



In [None]:
# Create a python language dictionary that which has the value 0 for the keys 1,2,3 and 4; and has the 
# value 1 for the keys 5, 6 and 7

In [None]:
# Create a new column in the "glass" data frame my calling glass.glass_type.map() with your dictionary as an argument



In [None]:
# Make sure that this data looks right (e.g. use the head() method and look at it)

Let's try to predict **assorted** using **al**. Let's visualize the relationship to figure out how to do this:

In [None]:
# Do a scatter plot of *al* on the x-axis, and *assorted* on the y-axis.


In [None]:
# create a logistic regression model with incredibly weak regularisation
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression(C=1e9)


In [None]:
# use logreg to fit the data. You will need an X using the *al* column, and a Y of the *assorted* column


In [None]:
# use this regressor to predict from your X data


In [None]:
# plot the class predictions on top of the actual data


What if we wanted the **predicted probabilities** instead of just the **class predictions**, to understand how confident we are in a given prediction?

In [None]:
# your regressor (logreg) can predict a probability with .predict_proba() - plot the probabilities
# and it should look like a sigmoid curve
# You will probably need to .reshape() or similar in order to get it into a form that you can use

What is this? The first column indicates the predicted probability of **class 0**, and the second column indicates the predicted probability of **class 1**.

## Part 2: Probability, odds, e, log, log-odds

If you are interested in the mathematics behind this in more detail, carry on with the following sections.

$$probability = \frac {one\ outcome} {all\ outcomes}$$

$$odds = \frac {one\ outcome} {all\ other\ outcomes}$$

Examples:

- Dice roll of 1: probability = 1/6, odds = 1/5
- Even dice roll: probability = 3/6, odds = 3/3 = 1
- Dice roll less than 5: probability = 4/6, odds = 4/2 = 2

$$odds = \frac {probability} {1 - probability}$$

In [None]:
# create a table of probability versus odds
table = pd.DataFrame({'probability':[0.1, 0.2, 0.25, 0.5, 0.6, 0.8, 0.9, 0.99]})
table['odds'] = table.probability/(1 - table.probability)
table

What is **e**? It is the base rate of growth shared by all continually growing processes:

In [None]:
# exponential function: e^1
np.exp(1)

What is a **(natural) log**? It gives you the time needed to reach a certain level of growth:

In [None]:
# time needed to grow 1 unit to 2.718 units
np.log(np.exp(5))

It is also the **inverse** of the exponential function:

In [None]:
np.log(np.exp(5))

In [None]:
# add log-odds to the table
table['logodds'] = np.log(table.odds)
table

## Part 3: What is Logistic Regression?

**Linear regression:** continuous response is modeled as a linear combination of the features:

$$y = \beta_0 + \beta_1x$$

**Logistic regression:** log-odds of a categorical response being "true" (1) is modeled as a linear combination of the features:

$$\log \left({p\over 1-p}\right) = \beta_0 + \beta_1x$$

This is called the **logit function**.

Probability is sometimes written as pi:

$$\log \left({\pi\over 1-\pi}\right) = \beta_0 + \beta_1x$$

The equation can be rearranged into the **logistic function**:

$$\pi = \frac{e^{\beta_0 + \beta_1x}} {1 + e^{\beta_0 + \beta_1x}}$$

In other words:

- Logistic regression outputs the **probabilities of a specific class**
- Those probabilities can be converted into **class predictions**

The **logistic function** has some nice properties:

- Takes on an "s" shape
- Output is bounded by 0 and 1

Notes:

- **Multinomial logistic regression** is used when there are more than 2 classes.
- Coefficients are estimated using **maximum likelihood estimation**, meaning that we choose parameters that maximize the likelihood of the observed data.

## Part 4: Interpreting Logistic Regression Coefficients

In [None]:
# plot the predicted probabilities again
plt.scatter(glass.al, glass.assorted)
plt.plot(glass.al, assorted_pred_prob, color='red')

In [None]:
# compute predicted log-odds for al=2 using the equation
logodds = logreg.intercept_ + logreg.coef_ * 2
logodds

In [None]:
# convert log-odds to odds
odds = np.exp(logodds)
odds

In [None]:
# convert odds to probability
prob = odds/(1 + odds)
prob

In [None]:
# compute predicted probability for al=2 using the predict_proba method
logreg.predict_proba(2)[:, 1]

In [None]:
# examine the coefficient for al
zip(feature_cols, logreg.coef_[0])

**Interpretation:** A 1 unit increase in 'al' is associated with a 4.18 unit increase in the log-odds of 'assorted'.

In [None]:
# increasing al by 1 (so that al=3) increases the log-odds by 4.18
logodds = 0.64722323 + 4.1804038614510901
odds = np.exp(logodds)
prob = odds/(1 + odds)
prob

In [None]:
# compute predicted probability for al=3 using the predict_proba method
logreg.predict_proba(3)[:, 1]

**Bottom line:** Positive coefficients increase the log-odds of the response (and thus increase the probability), and negative coefficients decrease the log-odds of the response (and thus decrease the probability).

In [None]:
# examine the intercept
logreg.intercept_

**Interpretation:** For an 'al' value of 0, the log-odds of 'assorted' is -7.71.

In [None]:
# convert log-odds to probability
logodds = logreg.intercept_
odds = np.exp(logodds)
prob = odds/(1 + odds)
prob



That makes sense from the plot above, because the probability of assorted=1 should be very low for such a low 'al' value.

![](images/logistic_betas.png)

Changing the $\beta_0$ value shifts the curve **horizontally**, whereas changing the $\beta_1$ value changes the **slope** of the curve.

## Part 5: Comparing Logistic Regression with Other Models

Advantages of logistic regression:

- Highly interpretable (if you remember how)
- Model training and prediction are fast
- No tuning is required (excluding regularization)
- Features don't need scaling
- Can perform well with a small number of observations
- Outputs well-calibrated predicted probabilities

Disadvantages of logistic regression:

- Presumes a linear relationship between the features and the log-odds of the response
- Performance is (generally) not competitive with the best supervised learning methods
- Sensitive to irrelevant features
- Can't automatically learn feature interactions

## Bonus: Confusion Matrix



In [None]:
from sklearn import metrics
prds = logreg.predict(X)
print metrics.confusion_matrix(y, prds)


##Top Left: True Negatives <Br>
##Top Right False Negatives <Br>
##Bottom Left: False Negatives <br>
##Bottom Right: True Positives <br>


### Meaning: 
#### Accuracy    = (157 + 28) / 214       == .8644
#### Sensitivity =  28        / (23 + 28) == .5490
#### Specificity =  157       / (157 + 6) == .9631