# 25: Logistic Regression



![](https://wordstream-files-prod.s3.amazonaws.com/s3fs-public/styles/simple_image/public/images/machine-learning1.png?SnePeroHk5B9yZaLY7peFkULrfW8Gtaf&itok=yjEJbEKD)

## Questions:

**1. What is the difference between inferential modeling and predictive modeling?**

**2. What is the difference between supervised learning and unsupervised learning?**

**3. What is the difference between classification and regression?**

Please type your answers below


*YOUR ANSWER HERE*

## What is Logistic Regression? 

![](https://miro.medium.com/max/400/1*zLfpo6F_Bfi6uvRL6iLX_Q.jpeg)
It belongs to a class of predictive models called _Generalized Linear Models_. All of these models have 2 things in common: They all define significant relationships between independent/dependent variables and they indicate the strength of the relationships. 

Different from Linear regression -- it can predict the probabilities associated with **a success or a failure**. Is this email likely spam? What is the probability that this citizen will vote Republican? Is this homeowner likely to default on their mortgage? Is this person likely to buy our product? Is this tumor likely to be cancerous or benign?

### Assumptions 
**Logistic Regression Assumptions:**

* Binary logistic regression requires the dependent variable to be binary.
* Only the meaningful variables should be included.
* The independent variables should be independent of each other. That is, the model should have little or no multi-collinearity.
* The independent variables are linearly related to the log odds.  For more about log odds vs probability, check out [this link.](https://www.statisticshowto.com/log-odds/)
* Logistic regression requires quite large sample sizes.

### Key differences from Linear Regression:
* GLM does not assume a linear relationship between dependent and independent variables. However, it assumes a linear relationship between link function and independent variables in logit model.

* The dependent variable need not to be normally distributed.

* It does not uses OLS (Ordinary Least Square) for parameter estimation. Instead, it uses maximum likelihood estimation (MLE).

* Errors need to be independent but not normally distributed.

### Logistic Regression Equation

![](https://miro.medium.com/max/571/0*tGVPGu3aa1rhTdfl.png)
Let's say we've constructed our best-fit line, i.e. our linear predictor, $\hat{L} = \beta_0 + \beta_1x_1 + ... + \beta_nx_n$.

#### The Sigmoid Function

Consider the following transformation:
$\large\hat{y} = \Large\frac{1}{1 + e^{-\hat{L}}} \large= \Large\frac{1}{1 + e^{-(\beta_0 + ... + \beta_nx_n)}}$. This is called the sigmoid function.

This function squeezes our predictions between 0 and 1. 

Suppose I'm building a model to predict whether a plant is poisonous or not, based perhaps on certain biological features of its leaves. 
* I'll let '1' indicate a poisonous plant and '0' indicate a non-poisonous plant.
* Now I'm forcing my predictions to be between 0 and 1, so suppose for test plant $P$ I get some value like 0.19.
* I can naturally understand this as the probability that $P$ is poisonous.
* If I truly want a binary prediction, I can simply round my score appropriately.

How do we fit a line to our dependent variable if its values are already stored as probabilities? We can use the inverse of the sigmoid function, and just set our regression equation equal to that. The inverse of the sigmoid function is called the logit function, and it looks like this:

$$\large f(y) = \ln\left(\frac{y}{1 - y}\right)$$

Notice that the domain of this function is $(0, 1)$.

Quick proof that logit and sigmoid are inverse functions:

$\hspace{170mm}x = \frac{1}{1 + e^{-y}}$;
$\hspace{170mm}$so $1 + e^{-y} = \frac{1}{x}$;
$\hspace{170mm}$so $e^{-y} = \frac{1 - x}{x}$;
$\hspace{170mm}$so $-y = \ln\left(\frac{1 - x}{x}\right)$;
$\hspace{170mm}$so $y = \ln\left(\frac{x}{1 - x}\right)$.)

Our regression equation will now look like this:

$\large\ln\left(\frac{y}{1 - y}\right) = \beta_0 + \beta_1x_1 + ... + \beta_nx_n$.

## Coding Logistic Regression

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# import some data to play with
from sklearn import datasets

# For our modeling steps
from sklearn.preprocessing import normalize
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [None]:
iris = datasets.load_iris()

df = pd.DataFrame(
    data= np.c_[iris['data'], iris['target']],
    columns= iris['feature_names'] + ['target']
)

df.head()

In [None]:
# Creating a large figure
fig = plt.figure(figsize=(15, 8))

# Iterating over the different features
for i in range(0, 4):
    # Figure number starts at 1
    ax = fig.add_subplot(2, 2, i+1)
    # Add a title to make it clear what each subplot shows
    plt.title(df.columns[i])
    # Use alpha to better see crossing pints
    ax.scatter(df['target'], df.iloc[:,i], c='teal', alpha=0.1)
    # Only show the tick marks for each target
    plt.xticks(df.target.unique())

In [None]:
X = df.iloc[:,:-1]
y = df.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=27)


In [None]:
logreg = LogisticRegression(fit_intercept = False, C = 1e12, solver='lbfgs', multi_class='auto')
logreg.fit(X_train, y_train)


**What do you think 'multi_class=' means in the LogisticRegression() arguments?**

In [None]:
y_hat_test = logreg.predict(X_test)
y_hat_train = logreg.predict(X_train)

In [None]:
logreg.predict_proba(X_train)

In [None]:
y_hat_test

In [None]:
residuals = y_train == y_hat_train

print('Number of values correctly predicted:')
print(pd.Series(residuals).value_counts())

In [None]:
residuals = y_test == y_hat_test

print('Number of values correctly predicted: ')
print(pd.Series(residuals).value_counts())

In [None]:
from sklearn.metrics import accuracy_score
accuracy_score(y_test, y_hat_test)

In [None]:
accuracy_score(y_train, y_hat_train)