# Logistic Regression

Despite called Regression, actually, Logistic Regression is a widely used supervised classification technique.

Allow us to predict the probability that an observation is of a certain class using straightforward and well understood approach.


# Training Binary Classifier

Train a simple classifier model.

The target vector can only take two values.

In Logistic Regression, a linear model is included in a logistic function (also called sigmoid).

P(yi = 1|X) ==  1 / [1+(e)^-(Bo+B1x)]

___

P(yi = 1|X) :probability of the ith observation's target value yi

X : is the training data

Bo, B1 : parameters to be learned

e : Euler's Number

---

The effect of the logistic function is to contrain the value of the function's output between 0 and 1 so that it can be interpreted as a probability.

If P(yi = 1|X) is greeater than 0.5 class 1 is predicted, otherwise class 0 is predicted

In [2]:
# Load libraries

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [3]:
# load data

iris = load_iris()

features = iris.data[:100,:] #shape = (100,4)
target = iris.target[:100]   #shape = (100,)

In [4]:
# Standardize features

standardizer = StandardScaler()

features_standardized = standardizer.fit_transform(features)

In [5]:
# Create logistic regression object

logistic_regression = LogisticRegression(random_state = 0)
logistic_regression

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=0, solver='lbfgs', tol=0.0001, verbose=0,
                   warm_start=False)

In [6]:
# Train a model 

model = logistic_regression.fit(features_standardized, target)

In [7]:
# Create new observation

observation = [[0.5, 0.5, 0.5, 0.5]]

In [8]:
# Predict class

model.predict(observation)

array([1])

In [9]:
# View predicted probabilities

model.predict_proba(observation)

array([[0.17738424, 0.82261576]])

In [10]:
print("Our observation had an 17,73% chance of being class 0 and 82,26 %of being class 1")

Our observation had an 17,73% chance of being class 0 and 82,26 %of being class 1


# Training a Multiclass Classifier

Given more than two classes you need to train a model classifier.

### One_VS_Rest / Multinomial methods

On their own, logistic regression are only binary classifier, meaning they cannot handle target vector with more than two classes. 

OneVSRest (OVR) : A separated model is trained for each class predicted whether an observation is that class or not (making a binary classification problem)(class 0 or not)  . It assumes that each classification problem is independent.

Multinomial Logistic Regression (MLR) : The logistic function is replaced with a softmax function. One advantage is that MLR predicts probabilities using predict_proba  which is better calibrated

the argumentmulti_class can be changed depending on our decision.

multi_class= "multinomial" / "ovr"



In [11]:
# Load libraries 

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [12]:
# Load data

iris = load_iris()

features = iris.data
target = iris.target

In [13]:
# Create Standardizer

scaler = StandardScaler()

In [14]:
# Standardize features

features_standardized = scaler.fit_transform(features)

In [15]:
# Create OneVSRest Logisctic Regression object 

logistic_regression = LogisticRegression(random_state = 0, multi_class = "ovr")

In [16]:
# Create model

model = logistic_regression.fit(features_standardized, target)

# Reducing Variance Through Regularization

You need to reduce the variance of your logistic regression model.

Tune the strenght hyperparameter, C. Scikit learn use C as the inverse of the parameter alpha C = 1 / alpha

### Regularization : 

is a method of penalizing complex models to reduce their variance. Specifically, a penalty term is added to the loss function we are trying to minimize typically the L1 and L2 penalties.

L1 Penalty : alpha * E|Bj|

Bj is the parameters of the jth of p features being learned and alpha is a hyperparameter denoting the regularization strength. 

L2 Penalty : alpha * B²j

Higher values of alpha increase the penalty for larger parameter values. (more complex models)

We can use C as a hyperparameter to be tuned to find the value of class to efficiently tune C.

#### Cs

Cs parameter can either accept a range of values for C to search over ( if a list of floats is supplied as an argument).

Will generate a list of that many candidate values drawn from a logarithmic scale between (-10000, 10000.

it would be a good idea to apply also model selection to the penalty term

In [17]:
# load libraries 

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegressionCV

In [18]:
# Load data

iris = load_iris()

features = iris.data
target = iris.target

In [19]:
# Standardize features

scaler = StandardScaler()

features_standardized = scaler.fit_transform(features)

In [20]:
# Create a decision tree classifier object

logistic_regression = LogisticRegressionCV(penalty="l2", Cs = 10, random_state = 0, n_jobs = -1)

In [21]:
# Train model

model = logistic_regression.fit(features_standardized, target)

# Training classifier on very large data

**SOLVER** Stochastic Average Gradient (SAG)

Scikit learn will select the best solver automatically for us or warn us that we cannot do something with that solver.

It allows us to train a model much faster than others solvers when our data is very large.

It is also a very sensitive to feature scaling so it is very important to standardize.

In [22]:
# load libraries 

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegressionCV

In [23]:
# Load data

iris = load_iris()

features = iris.data
target = iris.target

In [24]:
# Standardize features

scaler = StandardScaler()

features_standardized = scaler.fit_transform(features)

In [25]:
# Create a decision tree classifier object

logistic_regression = LogisticRegressionCV(max_iter = 10000, solver = "sag", random_state = 0, n_jobs = -1)

In [26]:
# Train model

model = logistic_regression.fit(features_standardized, target)

# Handling Imbalanced Classes

Training a simple classifier model.

If we have highly imbalanced classes and have not adressed it during the preprocessing we have the option of using class_weight parameter to weight the classes to make certain we have a balanced mix of each class.

class_weight = "balanced" ==> Will automatically weigh classes inversely proportional to their frequency. w = n / knj

w : weight to class 

n: number of observations

k: total number of classes

nj: number of observations in class j


In [37]:
# load libraries 

import numpy as np
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

In [38]:
# Load data

iris = load_iris()

features = iris.data
target = iris.target

In [39]:
# Make class highly imbalanced by removing first 40 observations

features = features[40:,:] #shape = (110,4)
target = target[40:]      # shape = (110,)

In [40]:
target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [41]:
# Create target vector indicating if class 0, otherwise 1

target = np.where((target == 0),0,1)
target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])

In [42]:
# Standardize features

scaler = StandardScaler()

features_standardized = scaler.fit_transform(features)

In [43]:
# Create a decision tree classifier object

logistic_regression = LogisticRegressionCV(max_iter = 10000, class_weight = "balanced", random_state = 0, n_jobs = -1)

In [44]:
# Train model

model = logistic_regression.fit(features_standardized, target)