# Topics

1. Logistic regression

## 1. Logistic regression

In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.

https://en.wikipedia.org/wiki/Logistic_regression

#### What's the difference between linear regression and logistic regression?

Logistic regression analysis is used to examine the association of (categorical or continuous) independent variable(s) with one dichotomous dependent variable. This is in contrast to linear regression analysis in which the dependent variable is a continuous variable.

https://www.javatpoint.com/linear-regression-vs-logistic-regression-in-machine-learning

#### Logistic regression in Python

https://realpython.com/logistic-regression-python/

https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8

#### Logistic regression: simple example

Import packages

In [None]:
import numpy as np
from sklearn.linear_model import LogisticRegression
from scipy.special import expit
import matplotlib.pyplot as plt

Make up data

In [None]:
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])

Note: we use reshape on `x` because when using the `LogisticRegression` function the x array must be two-dimensional. 

Using `reshape()` with the arguments -1, 1 gives us as many rows as needed and one column. 

Create the model


In [None]:
model = LogisticRegression(solver='liblinear')

Train the model

In [None]:
model.fit(x, y)

Alternatively, we can create and fit the model in just one step

In [None]:

model = LogisticRegression(solver='liblinear', random_state=0).fit(x, y)

Our potential y-values... not very surprising

In [None]:
model.classes_

The model's intercept

In [None]:
model.intercept_

The model's coefficient

In [None]:
model.coef_

Evaluate the model

In [None]:
model.predict_proba(x)

This returns the matrix of probabilities that the predicted output is equal to zero or one.  The first column is the probability of the predicted output being zero, that is 1 - 𝑝(𝑥). The second column is the probability that the output is one, or 𝑝(𝑥).

You can get the actual predictions, based on the probability matrix and the values of 𝑝(𝑥), with .predict().
This function returns the predicted output values as a one-dimensional array.

In [None]:
model.predict(x)

Plot the results

In [None]:
plt.scatter(x, y, 
            color='black', 
            s = 100, 
            label = "actual data")
plt.legend()

In [None]:
plt.scatter(x, y,
           color='black', 
           s = 100, 
           label = "actual data")
plt.scatter(x, model.predict(x), 
            color='red', 
            label = "predicted")
plt.legend()

In [None]:
x_test = np.linspace(0, 10, 300).reshape(-1,1)
plt.scatter(x, y, 
            color='black', 
            s = 100, 
            label = "actual data")
plt.scatter(x_test, 
            model.predict(x_test), 
            color='red', 
            label = "predicted")
plt.legend()

In [None]:
sigmoid = expit(x_test * model.coef_ + model.intercept_).ravel()
plt.scatter(x, y, 
            color='black', 
            s = 100, 
            label = "actual data")
plt.scatter(x_test, 
            model.predict(x_test), 
            color='red', 
            label = "predicted")
plt.plot(x_test, 
         sigmoid, 
         color='green', 
         linewidth=3, 
         label = 'logistic sigmoid')
plt.axhline(y = 0.5, 
            color = 'black', 
            ls = '--', 
            label = 'y = 0.5')
plt.legend()

Get the model score

In [None]:

model.score(x, y)

.score() takes the input and output as arguments and returns the ratio of the number of correct predictions to the number of observations.

### We can also use the `StatsModels` packages, which provides some more statistical details

In [None]:
# import packages
import statsmodels.api as sm

# create data
x = np.arange(10).reshape(-1, 1)
y = np.array([0, 1, 0, 0, 1, 1, 1, 1, 1, 1])
x = sm.add_constant(x)

# create model
model = sm.Logit(y, x)

# fit model
result = model.fit()

# get results
result.params

In [None]:
result.summary()