# $\chi^2$ (chi-square)
## finding how well a model fits the data

![]()

### Prof. Robert Quimby
&copy; 2019 Robert Quimby

## In this tutorial you will...

- Learn how to quantify the likelihood that a set of data follow from a given model
- Find the set of model parameters that maximize this likelihood
- Learn about the $\chi^2$ parameter and how it relates to this likelihood

In [None]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
plt.rc('axes', labelsize=14)
plt.rc('axes', labelweight='bold')
plt.rc('axes', titlesize=16)
plt.rc('axes', titleweight='bold')
plt.rc('font', family='sans-serif')
plt.rcParams['figure.figsize'] = (10, 7)

## Lets start with some real data and a model

In [None]:
data = np.genfromtxt('media/data.dat', names='x, y, ey')
plt.errorbar(data['x'], data['y'], yerr=data['ey'], fmt='ro')

## How well does each data point agree with the model?

If we assume the data are normally (=Gaussian) distributed about the model, then the (relative) probability of measuring a data point, ${\rm data}_i$, that deviates some amount from the model prediction, ${\rm model}_i$, is:

$$ P({\rm data}_i) = {1 \over \sqrt{2\pi\sigma_i^2}} e^{ -\Delta Y_i^2 / 2 \sigma_i^2 } $$

where:
- $\Delta Y_i$ is ${\rm data}_i - {\rm model}_i$
- $\sigma_i$ is the uncertainty in the $i^{\rm th}$ measurement


## Total probability

The total probability of getting a set of data, $I$, given a model, $\theta$, is the product of the individual probabilities of each data point:

$$P\left(I \,\middle| \, \theta \right) = \prod_i {1 \over \sqrt{2\pi\sigma_i^2}} e^{ -\Delta Y_i^2 / 2 \sigma_i^2 }$$

The technical term for this probability is the "likelihood", and it is often written with a fancy L instead of the P:

$${\cal L}\left(I \,\middle| \, \theta \right)$$

## It is often easier to work with the natural log of the likelihood

$$\ln {\cal L}\left(I \,\middle| \, \theta \right) =
-{1 \over 2}\left[ \sum_i \ln 2\pi\sigma_i^2 + \sum_i{ \left( { \Delta Y_i \over \sigma_i} \right)^2 } \right] $$

Ignoring the first term (which does not depend on the model), we see that:
$$\ln {\cal L}\left(I \,\middle| \, \theta \right) \sim  -{1 \over 2} \sum_i{ \left( { \Delta Y_i \over \sigma_i} \right)^2 }$$

## The $\chi^2$ (chi-squared) value

$$\chi^2 = \sum_i{ \left( { \Delta Y_i \over \sigma_i} \right)^2 }$$

## Relation between likelihood and $\chi^2$

The (relative) probability of randomly drawing a specific sample of data given a model is related to the $\chi^2$ of the data given the model:

$${\cal L}\left(I \,\middle| \, \theta \right) \sim  e^{-\chi^2 / 2}$$


In other words, the set of parameters that **minimizes** the $\chi^2$ defines the model that **maximizes** the likelihood that the data were drawn from said model.

## $\chi^2$ minimization (= likelihood maximization)

This what we have been doing with our least-squares fitting, but there are other ways to look at this problem.

For example, we can write a function to return the $\chi^2$ value for a set of data given some model parameters, and then adjust the model parameters to minimize the $\chi^2$.

## Lets calculate the $\chi^2$ value for our data and model

In [None]:
# define the model (y = mx + b)
def line_model(params, x):
    y = ????
    return y

In [None]:
def get_chisq(params, data):
    model = ????
    dY = ????
    return ????

In [None]:
params = ????
modelx = ????
modely = ????
plt.errorbar(data['x'], data['y'], yerr=data['ey'], fmt='ro')
plt.plot(modelx, modely);

## Determine model parameters by brute force

In [None]:
# brute-force fitting
ntest = ????
m = np.linspace(????)
b = np.linspace(????)
chisq = np.zeros( (ntest, ntest) )
for i in range(ntest):
    for j in range(ntest):
        chisq[i, j] = ????

In [None]:
# plot the 2D probability distribution
prob = ????
plt.imshow(prob, extent=[b[0], b[-1], m[0], m[-1]], aspect='auto', origin='lower')
plt.xlabel('Offset ($b$)')
plt.ylabel('Slope ($m$)');

In [None]:
# best fit values
i, j = ????
print(m[i], b[j])

#### What is the uncertainty in these parameters?

## Marginalized probabilities

We can integrate (sum) over one dimension (e.g., the offset parameter $b$) to get the total probability for another dimension (e.g., the slope parameter $m$).

In [None]:
prob_m = ????
plt.plot(m, prob_m);

## A Faster Way to Find the $\chi^2$ Minimum

In [None]:
# use scipy optimize
from scipy.optimize import fmin

guess = ????
fmin(????)