###### Content under Creative Commons Attribution license CC-BY 4.0, code under BSD 3-Clause License © 2021 Lorena A. Barba

# Logistic regression

In Lesson 1 of this module, you learned about fitting a line to data (linear regression) using the method of gradient descent to find the model parameters (slope and $y$-intercept). But what if the observational data looks nothing like it has a linear relationship?

A major class of problems deals with binary classification, that is, the data belong to one of two categories. 
Typically, we code the two categories with $0$ and $1$. For example, is an email spam, or not spam? Is a credit-card transaction fraudulent, or legitimate? 
In these settings, often the data correspond to numbers between $0$ and $1$ that represents some _probability_ (e.g., that the email be spam or not).

## Logistic function

We can build a model that will output a value between zero and one by making a non-linear transformation of the linear regression. 
This is achieved with the _logistic function_:

$$ \sigma(z) = \frac{1}{1+e^{-z}}$$

With the data consisting of two arrays, $x, y$, the model is a composition: $z= wx + b$ and $\sigma(z)$ is the output.

Let's play with this function using SymPy. We'll need NumPy and Matplotlib later, so might as well load all our libraries now.

In [None]:
import sympy
import numpy

from matplotlib import pyplot
%matplotlib inline

In [None]:
z = sympy.Symbol('z', real=True)

logistic = 1/(1+ sympy.exp(-z))
logistic

In [None]:
sympy.plotting.plot(logistic);

That's a well-groomed $S$-shaped function: it's called a _sigmoid_ curve. Notice that when $z=0$ it takes the value $0.5$, and it approaches zero on the left, and one on the right.

Let's generate some synthetic data to play with. (We take this example from the SciPy 2019 tutorial by Eric Ma [1]). 
Our goal is to use gradient descent to find the model parameters $w$ and $b$ that best fit the data, in some sense that we need to discover.

In [None]:
# synthetic data
x_data = numpy.linspace(-5, 5, 100)
w = 2
b = 1
z_data = w * x_data + b + numpy.random.normal(size=len(x_data))
y_data = 1 / (1+ numpy.exp(-z_data))

pyplot.scatter(x_data, y_data, alpha=0.4);

## Logistic loss function

To use gradient descent, we need a loss function that we can optimize with respect to the parameters. 
If we were to use a mean-square-error loss function, like in linear regression, taking the derivatives to optimize would involve $\sigma^{\prime}(z)$, so let's look at that.

In [None]:
lprime = logistic.diff(z)
lprime

In [None]:
sympy.plotting.plot(lprime);

The derivative of the logistic function takes very small values at the long tails on each side of $z=0$. 
If our loss function has a $\sigma^{\prime}(z)$ factor (coming from the chair rule), this would lead to _slow learning_. Can we work with a better loss function, that avoids this problem? (For a more detailed discussion, we recommend Chapter 3 of Michael Nielsen's free ebook [2]).

We note that $\sigma^{\prime}(z)$ has the same expression in the denominator as $\sigma(z)$, but squared. Let's play around with this...

In [None]:
lprime/logistic

OK, we can try to express this in terms of $\sigma(z)$: add and subtract $1$ to the numerator and factor:

$$\frac{e^{-z}}{1+e^{-z}} = \frac{1+e^{-z}-1}{1+e^{-z}} = 1 - \frac{1}{1+ e^{-z}} = 1 - \sigma(z)$$

We get this interesting property for the derivative of the logistic function:

$$\sigma^{\prime}(z) = \sigma(z) (1-\sigma(z))$$

In [None]:
a, y = sympy.symbols('a y', real=True)

In [None]:
dLda = (a-y)/a/(1-a)
dLda

In [None]:
L = sympy.integrate(dLda, a)
L

In [None]:
sympy.simplify(L)

## References

1. Eric Ma, "Deep Learning Fundamentals: Forward Model, Differentiable Loss Function & Optimization," SciPy 2019 tutorial. [video on YouTube](https://youtu.be/JPBz7-UCqRo) and [archive on GitHub](https://github.com/ericmjl/dl-workshop/releases/tag/scipy2019).
2. Michael A. Nielsen, "Neural Networks and Deep Learning" (2015), Determination Press, http://neuralnetworksanddeeplearning.com

In [None]:
# Execute this cell to load the notebook's style sheet, then ignore it
from IPython.core.display import HTML
css_file = '../style/custom.css'
HTML(open(css_file, "r").read())