![@mikegchambers](../../images/header.png)

# Logistic Regression

In this notebook, we explore Logistic Regression using scikit-learn.

![Binary](binary.png)

In [None]:
from sklearn.linear_model import LogisticRegression

import numpy as np
import matplotlib.pyplot as plt

from matplotlib import style
style.use('ggplot') or plt.style.use('ggplot')

# The Data

So that we can easily play with the data, let's define it out in the open like this

In [None]:
X = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]
y = [0,0,0,0,0,0,0,0,1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1]

X = np.reshape(X, (-1, 1))

And, of course, lets have a look at the data

In [None]:
axes = plt.axes()

axes.scatter(X, y, color='gray', s=50)

plt.ylim(-0.1, 1.1)

plt.yticks([0, 0.5, 1])
plt.show()

# The Model

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html

`C` : float, default=1.0_
- _Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization._

`solver` : {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’
- _Algorithm to use in the optimization problem._

In [None]:
model = LogisticRegression(C=1000000, solver='liblinear')

In [None]:
model.fit(X, y)

Now we have the model.  Let's classify a new point and see what we get

In [None]:
test = 2.5
pred = model.predict([[test]])

print(pred)

We can also get the prediction probabilites.  We will see the probaility of classification for with 0, or 1.  The model has a build in threshold of 0.5, that's why in the previous code we recived a simple classification.

In [None]:
test = 2.5
pred = model.predict_proba([[test]])

print(pred)

If we want to be able to visualise the sigmoid function on a graph one method is to get the prediction probabiliities for sequence of values and plot the results.  It's a 'brute force' method, but it works.

So let's get some test values, as a list of lists that we can pass to the model for a batch prediction.   We use the max of the original test data as a limit as we only need the sigmoid plotted in relation to our data.

In [None]:
X_test = np.linspace(0, np.amax(X), 100)
X_test = np.reshape(X_test, (100, 1))

And get all the predictions for these points.

In [None]:
probs = model.predict_proba(X_test)

Lets take a look at the sigmoid

In [None]:
axes = plt.axes()

axes.scatter(X, y, color='gray', s=50)
axes.plot(X_test, probs[:,1], linewidth=2)
axes.plot(X_test, (probs[:,0] + probs[:,1]) / 2, linewidth=1, c="green")

plt.ylim(-0.1, 1.1)

plt.yticks([0, 0.5, 1])
plt.show()