## CHAPTER 16
---
# LOGISTIC REGRESSION

---
- Logistic regression and its extensions, like multinomial logistic regression, allow us to predict the probability that an observation is of a certain class using a straightforward and well-understood approach.
- In this chapter, we will cover training a variety of classifiers using scikit-learn.

## 16.1 Training a Binary Classifier

- You need to train a simple classifier model.
- Train a logistic regression in scikit-learn using `LogisticRegression`

In [1]:
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data with only two classes
iris = datasets.load_iris()
features = iris.data[:100,:]
target = iris.target[:100]

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0)

# Train model
model = logistic_regression.fit(features_standardized, target)

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
print('Predicted Class:', model.predict(new_observation))

# View predicted probabilities
print('Predicted Probabilities:', model.predict_proba(new_observation))

Predicted Class: [1]
Predicted Probabilities: [[0.17738424 0.82261576]]


#### Discussion:
In a logistic regression, a linear model (e.g. $\beta_0 + \beta_i x$) is included in a logistic (also called sigmoid) function, $\frac{1}{1+e^{-z }}$, such that:  
$$
P(y_i = 1 | X) = \frac{1}{1+e^{-(\beta_0 + \beta_1x)}}
$$  
where 
- $P(y_i = 1 | X)$ is the probability of the ith obsevation's target, 
- $y_i$ being class 1, 
- $X$ is the training data, 
- $\beta_0$ and $\beta_1$ are the parameters to be learned, and 
- $e$ is Euler's number.   

The effect of the logistic function is to constrain the value of the function's output to between 0 and 1 so that i can be interpreted as a probability. If $P(y_i = 1 | X)$ is greater than 0.5, class 1 is predicted; otherwise class 0 is predicted

## 16.2 Training a Multiclass Classifier

- Given more than two classes, you need to train a classifier model
- Train a logistic regression in scikit-learn with LogisticRegression using one-vs-rest or multinomial methods

In [2]:
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create one-vs-rest logistic regression object
logistic_regression = LogisticRegression(random_state=0, multi_class="ovr")

# Train model
model = logistic_regression.fit(features_standardized, target)

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
print('Predicted Class:', model.predict(new_observation))

# View predicted probabilities
print('Predicted Probabilities:', model.predict_proba(new_observation))

Predicted Class: [2]
Predicted Probabilities: [[0.0387617  0.40669108 0.55454723]]


#### Discussion:
On their own, logistic regressions are only binary classifiers, meaning they cannot handle target vectors with more than two classes. However, two clever extensions to logistic regression do just that. 
- First, in one-vs-rest logistic regression (OVR) a separate model is trained for each class predicted whether an observation is that class or not (thus making it a binary classification problem). It assumes that each observation problem (e.g. class 0 or not) is independent

- Alternatively in multinomial logistic regression (MLR) the logistic function we saw in section 16.1 is replaced with a softmax function:$$
P(y_I = k | X) = \frac{e^{\beta_k x_i}}{\sum_{j=1}^{K}{e^{\beta_j x_i}}}
$$where $P(y_i = k | X)$ is the probability of the ith observation's target value, $y_i$, is class k, and K is the total number of classes. One practical advantage of the MLR is that its predicted probabilities using predict_proba method are more reliable (i.e., better calibrated).

We can switch to an MNL by setting `multi_class='multinomial'`

In [3]:
# Create one-vs-rest logistic regression object
logistic_regression = LogisticRegression(random_state=0, multi_class="multinomial")

# Train model
model = logistic_regression.fit(features_standardized, target)

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
print('Predicted Class:', model.predict(new_observation))

# View predicted probabilities
print('Predicted Probabilities:', model.predict_proba(new_observation))

Predicted Class: [1]
Predicted Probabilities: [[0.01982185 0.74491886 0.23525928]]


## 16.3 Reducing Variance Through Regularization

- You need to reduce the variance of your logistic regression model.
- Tune the regularization strength hyperparameter, C

In [4]:
# Load libraries
from sklearn.linear_model import LogisticRegressionCV
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create decision tree classifier object
logistic_regression = LogisticRegressionCV(
    penalty='l2', Cs=10, random_state=0, n_jobs=-1)

# Train model
model = logistic_regression.fit(features_standardized, target)

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
print('Predicted Class:', model.predict(new_observation))

# View predicted probabilities
print('Predicted Probabilities:', model.predict_proba(new_observation))

Predicted Class: [1]
Predicted Probabilities: [[5.96244929e-04 9.70140320e-01 2.92634349e-02]]


#### Discussion:
Regularization is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize typically the L1 and L2 penalties

- In the L1 penalty:$$
\alpha \sum_{j=1}^{p}{|\hat\beta_j|}
$$where $\hat\beta_j$ is the parameters of the jth of p features being learned and $\alpha$ is a hyperparameter denoting the regularization strength.

- With the L2 penalty:$$
\alpha \sum_{j=1}^{p}{\hat\beta_j^2}
$$
Higher values of $\alpha$ increase the penalty for larger parameter values(i.e. more complex models). 

Scikit-Learn follows the common method of using C instead of $\alpha$ where C is the inverse of the regularization strength: $C = \frac{1}{\alpha}$. To reduce variance while using logistic regression, we can treat C as a hyperparameter to be tuned to find the value of C that creates the best model. In scikit-learn we can use the LogisticRegressionCV class to efficiently tune C.

## 16.4 Training a Classifier on Very Large Data

- You need to train a simple classifier model on a very large set of data
- Train a logistic regression in scikit-learn with LogisticRegression using the `stochastic average gradient (SAG)` solver

In [5]:
# Load libraries
from sklearn.linear_model import LogisticRegression
from sklearn import datasets
from sklearn.preprocessing import StandardScaler

# Load data
iris = datasets.load_iris()
features = iris.data
target = iris.target

# Standardize features
scaler = StandardScaler()
features_standardized = scaler.fit_transform(features)

# Create logistic regression object
logistic_regression = LogisticRegression(random_state=0, solver="sag")

# Train model
model = logistic_regression.fit(features_standardized, target)

# Create new observation
new_observation = [[.5, .5, .5, .5]]

# Predict class
print('Predicted Class:', model.predict(new_observation))

# View predicted probabilities
print('Predicted Probabilities:', model.predict_proba(new_observation))

Predicted Class: [1]
Predicted Probabilities: [[0.01982892 0.74492178 0.23524931]]


#### Discussion:
Stochastic averge gradient descent allows us to train a model much faster than other solvers when our data is very large. However, it is also very sensitive to feature scaling, so standardizing our features is particularly important

## 16.5 Handling Imbalanced Classes

- You need to train a simple classifier model
- Train a logistic regression in scikit-learn using LogisticRegression