## Background
If we have data for the heights of people in a population, it can be plotted as a histogram. We can look to model that data with a function, such as a Gaussian, which we can specify with two parameters, rather than holding all the data in the histogram.

The Gaussian function is given as,
$$f(\mathbf{x};\mu, \sigma) = \frac{1}{\sigma\sqrt{2\pi}}\exp\left(-\frac{(\mathbf{x} - \mu)^2}{2\sigma^2}\right)$$

$\chi^2$ is the squared difference of the data and the model, i.e $\chi^2 = |\mathbf{y} - f(\mathbf{x};\mu, \sigma)|^2$. This is represented in the figure as the sum of the squares of the pink and orange bars.

To improve the fit, we will want to alter the parameters $\mu$ and $\sigma$, and ask how that changes the $\chi^2$. That is, we will need to calculate the Jacobian,
$$ \mathbf{J} = \left[ \frac{\partial ( \chi^2 ) }{\partial \mu} , \frac{\partial ( \chi^2 ) }{\partial \sigma} \right]\;. $$

Let's look at the first term, $\frac{\partial ( \chi^2 ) }{\partial \mu}$, using the multi-variate chain rule, this can be written as,
$$ \frac{\partial ( \chi^2 ) }{\partial \mu} = -2 (\mathbf{y} - f(\mathbf{x};\mu, \sigma)) \cdot \frac{\partial f}{\partial \mu}(\mathbf{x};\mu, \sigma)$$

$$ \frac{\partial ( \chi^2 ) }{\partial \sigma} = -2 (\mathbf{y} - f(\mathbf{x};\mu, \sigma)) \cdot \frac{\partial f}{\partial \sigma}(\mathbf{x};\mu, \sigma)$$


In [None]:
import matplotlib.pyplot as plt
import numpy as np
from mfml.resources.data import x_heights, y_heights

In [None]:
# Gaussian function.
def gaussian(x, mu, sig) :
    return np.exp(-(x - mu)**2 / (2 * sig**2)) / np.sqrt(2 * np.pi) / sig
# derivative wrt mu
def dfdmu (x, mu, sig) :
    return gaussian(x, mu, sig) * (x - mu) / sig**2
# derivative wrt sigma
def dfdsig (x, mu, sig) :
    return gaussian(x, mu, sig) * (((x - mu)**2) / sig**3 - 1 / sig)

Steepest descent moves around in parameter space proportional to the negative of the Jacobian,
i.e., $\begin{bmatrix} \delta\mu \\ \delta\sigma \end{bmatrix} \propto -\mathbf{J} $, with the constant of proportionality being the *aggression* of the algorithm.

In [None]:
def steepest_step (x, y, mu, sig, aggression) :
    J = np.array([
        -2*(y - gaussian(x,mu,sig)) @ dfdmu(x,mu,sig),
        -2*(y - gaussian(x,mu,sig)) @ dfdsig(x,mu,sig)
    ])
    step = -J * aggression
    return step

In [None]:
# trial parameters
mu = 155
sig = 6
p = np.array([[mu, sig]])
# doing a few rounds of steepest descent
for i in range(50):
    dmu, dsig = steepest_step(x_heights, y_heights, mu, sig, 2000)
    mu += dmu
    sig += dsig
    p = np.append(p, [[mu,sig]], axis=0)