<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#First-Classification:-Logistic-Regression-from-Linear-Regression" data-toc-modified-id="First-Classification:-Logistic-Regression-from-Linear-Regression-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>First Classification: Logistic Regression from Linear Regression</a></span><ul class="toc-item"><li><span><a href="#How-is-it-used?" data-toc-modified-id="How-is-it-used?-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>How is it used?</a></span><ul class="toc-item"><li><span><a href="#Linear-Regression" data-toc-modified-id="Linear-Regression-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>Linear Regression</a></span></li><li><span><a href="#Classification" data-toc-modified-id="Classification-1.1.2"><span class="toc-item-num">1.1.2&nbsp;&nbsp;</span>Classification</a></span></li></ul></li><li><span><a href="#Recall-Linear-Regression" data-toc-modified-id="Recall-Linear-Regression-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Recall Linear Regression</a></span><ul class="toc-item"><li><span><a href="#Formula" data-toc-modified-id="Formula-1.2.1"><span class="toc-item-num">1.2.1&nbsp;&nbsp;</span>Formula</a></span></li></ul></li><li><span><a href="#Classification:-Use-Logistic-Regression" data-toc-modified-id="Classification:-Use-Logistic-Regression-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Classification: Use Logistic Regression</a></span></li><li><span><a href="#Implementing-Logistic-Regression" data-toc-modified-id="Implementing-Logistic-Regression-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Implementing Logistic Regression</a></span></li></ul></li><li><span><a href="#Evaluating-Classifications" data-toc-modified-id="Evaluating-Classifications-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Evaluating Classifications</a></span><ul class="toc-item"><li><span><a href="#Confusion-Matrices" data-toc-modified-id="Confusion-Matrices-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Confusion Matrices</a></span></li><li><span><a href="#ROC-&amp;-AUC" data-toc-modified-id="ROC-&amp;-AUC-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>ROC &amp; AUC</a></span></li></ul></li></ul></div>

# First Classification: Logistic Regression from Linear Regression

## How is it used?

### Linear Regression

Trying to find the **relationship**

### Classification

Really more of a "yes" or "no"

> _"You're either with us, or against us"_

## Recall Linear Regression

### Formula

$$ \hat y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \ldots + \beta_n x_n = \sum_{i=0}^{N} \beta_i x_i $$

## Classification: Use Logistic Regression

Probability of belonging to a particular group

Transform from linear regression!

$$ \hat y = \sum_{i=0}^{N} \beta_i x_i $$

$$ P = \displaystyle \frac{1}{1+e^{-\hat y}} = \frac{1}{1+e^{-\sum_{i=0}^{N} \beta_i x_i}} $$

$$ = \frac{1}{1+e^{-\beta_0}e^{-\beta_1 x_1}\ldots e^{-\beta_N x_N}} $$

## Implementing Logistic Regression

[Let's implement this in another notebook](../../MachineLearning/LogisticRegression/logistic_regression.ipynb)

# Evaluating Classifications

## Confusion Matrices

[Metrics & Confusion Matrices](../../EvaluatingModels/evaluation_metrics.ipynb)

## ROC & AUC

[ROC Curve & AUC for Evaluation](../../EvaluatingModels/evaluation_curves.ipynb)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Implementing-Logistic-Regression" data-toc-modified-id="Implementing-Logistic-Regression-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Implementing Logistic Regression</a></span><ul class="toc-item"><li><span><a href="#Play-with-some-data" data-toc-modified-id="Play-with-some-data-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Play with some data</a></span></li><li><span><a href="#Prepare-the-data-to-do-the-classification" data-toc-modified-id="Prepare-the-data-to-do-the-classification-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Prepare the data to do the classification</a></span></li><li><span><a href="#Create-the-logistic-regression-model" data-toc-modified-id="Create-the-logistic-regression-model-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Create the logistic regression model</a></span></li><li><span><a href="#Evaluate-the-model" data-toc-modified-id="Evaluate-the-model-1.4"><span class="toc-item-num">1.4&nbsp;&nbsp;</span>Evaluate the model</a></span><ul class="toc-item"><li><span><a href="#Training-Set" data-toc-modified-id="Training-Set-1.4.1"><span class="toc-item-num">1.4.1&nbsp;&nbsp;</span>Training Set</a></span></li><li><span><a href="#Testing-Set" data-toc-modified-id="Testing-Set-1.4.2"><span class="toc-item-num">1.4.2&nbsp;&nbsp;</span>Testing Set</a></span></li></ul></li></ul></li></ul></div>

# Implementing Logistic Regression

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# import some data to play with
from sklearn import datasets

# For our modeling steps
from sklearn.preprocessing import normalize
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

## Play with some data

In [None]:
# Built in dataset from sklearn
iris = datasets.load_iris()

df = pd.DataFrame(
    data= np.c_[iris['data'], iris['target']],
    columns= iris['feature_names'] + ['target']
)

In [None]:
display(df.head())
display(df.describe())

In [None]:
# Note how many different targets there are
df.target.unique()

We can go ahead and explore some graphs to show that it doesn't make sense to do a linear regression


In [None]:
import matplotlib.pyplot as plt

# Creating a large figure
fig = plt.figure(figsize=(15, 8))

# Iterating over the different
for i in range(0, 4):
    # Figure number starts at 1
    ax = fig.add_subplot(2, 2, i+1)
    # Add a title to make it clear what each subplot shows
    plt.title(df.columns[i])
    # Use alpha to better see crossing pints
    ax.scatter(df['target'], df.iloc[:,i], c='teal', alpha=0.1)
    # Only show the tick marks for each target
    plt.xticks(df.target.unique())

## Prepare the data to do the classification

In [None]:
# Get the features and then the target
X = df.iloc[:,:-1]
y = df.target

In [None]:
# Normalize the data to help the model
X = normalize(X)

In [None]:
# Split for test & training  
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=27)

## Create the logistic regression model

In [None]:
logreg = LogisticRegression(fit_intercept = False, C = 1e12, solver='lbfgs', multi_class='auto')
model_log = logreg.fit(X_train, y_train)
model_log

In [None]:
y_hat_test = logreg.predict(X_test)
y_hat_train = logreg.predict(X_train)

## Evaluate the model

### Training Set

In [None]:
# Was our model correct?
residuals = y_train == y_hat_train

print('Number of values correctly predicted:')
print(pd.Series(residuals).value_counts())

In [None]:
print('Percentage of values correctly predicted: ')
print(pd.Series(residuals).value_counts(normalize=True))

### Testing Set

In [None]:
residuals = y_test == y_hat_test

In [None]:
print('Number of values correctly predicted:')
print(pd.Series(residuals).value_counts())

In [None]:
print('Percentage of values correctly predicted: ')
print(pd.Series(residuals).value_counts(normalize=True))