<table align="center">
   <td align="center"><a target="_blank" href="https://colab.research.google.com/github/ds5110/summer-2021/blob/master/09c-logistic-regression-iris.ipynb">
<img src="https://github.com/ds5110/summer-2021/raw/master/colab.png"  style="padding-bottom:5px;" />Run in Google Colab</a></td>
</table>


# 09c -- Logistic regression with the iris dataset

* [Generalized linear model (GLM)](https://en.wikipedia.org/wiki/Generalized_linear_model) -- wikipedia
* [ISLR 1st Edition](https://www.statlearning.com/) -- statlearning.com
* [Logistic regression 3-class classifier](https://scikit-learn.org/stable/auto_examples/linear_model/plot_iris_logistic.html) (iris dataset) -- scikit-learn.org
* [iris dataset](https://en.wikipedia.org/wiki/Iris_flower_data_set) -- wikipedia

## Generalized linear model (GLM)

* GMLs are flexible generalizations of ordinary linear regression
* Ordinary linear regression:
  * $ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_p x_p + \epsilon$
  * Expected value of the response $y$ depends linearly on the predictors
  * Errors have a normal distribution
* GLM:
  * Response relates to linear predictors with a nonlinear function
  * Appropriate for models that predict probability of a yes/no choice
* Model of binary events
  * Response variable has a Bernoulli distribution (response takes value 1 with probability "p")
  * Odds vs probability...
    * Consider a model that predicts whether you'll go to the beach   
    * Likelihood doubles with 10-degree temperature rise
    * If it's 75 degrees and the probability of going is 0.75
    * Temperature rise to 85 degrees doubles the odds, not the probability
    * Odds go from 2:1 to 4:1, then to 8:1
    * Probability goes from 2/3 to 4/5 to 8 / 9
  * For binary events, use the odds ratio $ \frac{p}{1-p}$




For linear regression
$$
y = \beta_0 + \sum_{i=1}^p \beta_i x_i
$$
Note that $y$ can take on any value from $- \infty $ to $+ \infty$.
With logistic regression, we model the log-odds as a linear function of $y$. 
$$
\mathrm{log} \left( \frac{p}{1-p} \right) = \mathrm{logit}(p) = y
$$
This is where the term logistic regression comes from. We're "fitting" a line to the log-odds. With least-squares linear regression, we minimize MSE. With logistic regression, we maximize the log likelihood. If the errors in linear regression have a normal distribution, then the least-squares solution is also the maximum likelihood solution.

We can solve this equation for $p(y)$
$$
p(y) = \frac{1}{1 + e^{-y}} = \mathrm{sigmoid(y)}
$$

Here's the rub: $p$ is a conditional probability. Specifically, $p(1|x)$ is the probability that the true class is 1 given the observation $x$.

In [None]:
# Plotting sigmoid(y)
import matplotlib.pyplot as plt
import numpy as np

y = np.arange(-5,5,.01)
p = 1 / (1 + np.exp(-y))

plt.plot(y, p)
plt.plot([-5, 5], [0, 0], ":k")
plt.ylabel("p")
plt.xlabel("y");

# Iris dataset

Load the dataset from scikit learn and verify things.

In [None]:
# Quickly load and visualize the data with seaborn
import seaborn as sns

df = sns.load_dataset("iris")

sns.pairplot(df, hue="species");

In [None]:
df

# 1-D Logistic Regression

* simplified logistic regression using 2 classes and 1 feature
* fit $y = \beta_0 + \beta_1 x$
* $y = \mathrm{logit}(p) = \mathrm{log}\left(\frac{p}{1-p}\right)$

In [None]:
# Extract data from the dataframe (classes are strings)
import pandas as pd

X = df.iloc[:, :2].values
Y = df['species'].values
Y = pd.factorize(Y)[0]

In [None]:
# Feature scaling -- it won't affect the solution, but makes plotting easier
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
sc.fit(X)
X_std = sc.transform(X)

In [None]:
# Pull out 1 feature
feature_index = 0
X_1D = np.expand_dims(X_std[:, feature_index], axis=1)
X_1D.shape

In [None]:
# Only the first 2 classes (data are sorted by class, 50 samples each)
X_1D = X_1D[:100, :]
y_1D = Y[:100]

In [None]:
# 1-D logistic regression with scikit-learn
from sklearn.linear_model import LogisticRegression

lr = LogisticRegression(C=1e5)
lr.fit(X_1D, y_1D)
y_pred = lr.predict(X_1D)

# Plot data values with filled red circles
plt.plot(X_1D, y_1D, 'ro', label='data')

# Extract the weights from the model
beta_0 = lr.intercept_
beta_1 = lr.coef_[0]
x = np.arange(-2,1,.01)
y = beta_0 + beta_1 * x
p = 1 / (1 + np.exp(-y))

# Plot the probability of class 1
plt.plot(x, p, label='probability of class 1');

# Plot the predicted values from the data
plt.plot(X_1D, y_pred, 'xk', label='predictions')

# Plot the y-axis passing through the origin
plt.plot([0, 0], [0, 1], 'k')

# Plot the decision boundary (dotted vertical line)
# Note: this corresponds to p=.5, i.e., y=0, which is x= -beta_0/beta_1
x_0 = - beta_0 / beta_1
plt.plot([x_0, x_0], [0, 1], ':k')

# Plot the line y=.5 (dotted horizontal line)
plt.plot([-2, 2], [0.5, .5], ':k')
plt.legend()
plt.xlabel('y')
plt.ylabel('p');

# Multi-class Logistic Regression

* 2 features (so we can plot the 2-D decision region)
* 3 classes (all 3 iris species, as integers or strings)
* Multi-class uses the maximum probability from OVR (One-vs-Rest)

In [None]:
# Extract data from the dataframe (classes are strings)
import pandas as pd

Y = df['species'].values
X = df.iloc[:, :2].values

In [None]:
# Load and process data from scikit-learn
from sklearn import datasets

# Get the iris dataset from sklearn
iris = datasets.load_iris()
X = iris.data[:, :2]  # we only take the first two features.
Y = iris.target # targets are integers

In [None]:
# Create an instance of Logistic Regression Classifier and fit the data.
logreg = LogisticRegression(C=1e5)
logreg.fit(X, Y)

# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
h = .02  # step size in the mesh
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = logreg.predict(np.c_[xx.ravel(), yy.ravel()])

# Plotting below requires integer target variables (for colors)
# However, sklearn algorithm handles categorical variables automagically
if isinstance(Y[0], str):
  print("Converting strings to integers for plotting")
  Z = pd.factorize(Z)[0]
  Y = pd.factorize(Y)[0]

# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.figure(1, figsize=(4, 3))
plt.pcolormesh(xx, yy, Z, cmap=plt.cm.Paired)

# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, edgecolors='k', cmap=plt.cm.Paired)
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')

plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(());