<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Guided Problem Set 5: Likelihoods and the $\chi^2$ Distribution</h1>


<a name='section_5_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P5.0 Overview</h2>


<h3>Navigation</h3>

<table style="width:100%">
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_5_1">P5.1 Warm-up Exercise</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_5_1">P5.1 Problems</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_5_2">P5.2 The Likelihood Function</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_5_2">P5.2 Problems</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_5_3">P5.3 Maximum Likelihood and Chi-Squared</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_5_3">P5.3 Problems</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_5_4">P5.4 A Fitting Example</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#problems_5_4">P5.4 Problems</a></td>
    </tr>
</table>



<h3>Learning Objectives</h3>

In this recitation we will explore the following objectives:

- Understand and construct likelihood and log-likelihood functions
- Examine the $\chi^2$ metric
- Perform a fit by maximizing a log-likelihood function using `lmfit`



<h3>Importing Data (Colab Only)</h3>

If you are in a Google Colab environment, run the cell below to import the data for this notebook. Otherwise, if you have downloaded the course repository, you do not have to run the cell below.

In [None]:
#>>>RUN: P5.0-runcell00

!git init
!git remote add -f origin https://github.com/mitx-8s50/nb_LEARNER/
!git config core.sparseCheckout true
!echo 'data/P05' >> .git/info/sparse-checkout
!git pull origin main


<h3>Importing Libraries</h3>

Before beginning, run the cell below to import the relevant libraries for this notebook.

In [None]:
#>>>RUN: P5.0-runcell01

#install lmfit if you have not done so yet
!pip install lmfit

In [None]:
#>>>RUN: P5.0-runcell02

import numpy as np                 #https://numpy.org/doc/stable/
import matplotlib.pyplot as plt    #https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.html
from mpl_toolkits import mplot3d   #https://matplotlib.org/2.0.2/mpl_toolkits/mplot3d/tutorial.html

from scipy.stats import chisquare  #https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.chisquare.html
from scipy.stats import poisson    #https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.poisson.html
from scipy.stats import norm       #https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html

import lmfit
from lmfit import Parameters, minimize, fit_report  #https://lmfit.github.io/lmfit-py/parameters.html
                                                    #https://lmfit-py.readthedocs.io/en/latest/fitting.html#the-minimize-function
                                                    #https://lmfit-py.readthedocs.io/en/latest/fitting.html#getting-and-printing-fit-reports

<h3>Setting Default Figure Parameters</h3>

The following code cell sets default values for figure parameters.

In [None]:
#>>>RUN: P5.0-runcell03

#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (9,6)

medium_size = 12
large_size = 15

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title

<a name='section_5_1'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P5.1 Warm-up Exercise</h2>    

| [Top](#section_5_0) | [Previous Section](#section_5_0) | [Problems](#problems_5_1) | [Next Section](#section_5_2) |


<h3>Overview</h3>

Let $x_i$ be the x coordinates of the data points we observe, and let $f(x_i)$ be the model we are trying to fit. We assume that the data that we observe is generated from this model with some extra error, usually due to some sort of noise.

$$y_i = f(x_i) + \epsilon_{i} $$

where $\epsilon_{i}$ is a set of random variables drawn from a Gaussian distribution with a mean of 0. In some physics experiments, we will be able to quantify the width of the $\epsilon$ distribution by making repeated measurements and calculating the standard errors,  $\sigma_i$, which are usually plotted as error bars like in the code below.

In [None]:
#>>>RUN: P5.1-runcell01

np.random.seed(9)

y_err = 1. #play with changing this, what does the graph look like?

x = np.arange(0,9, .5)
y_true = x 
y_data = np.add(np.random.normal(size=len(x),scale=y_err), y_true) #y=x+e where e is Gaussian noise
err = np.ones(len(x))*y_err #constant error

plt.errorbar(x,y_data,err, fmt = 'o', label='data') #plot the data
plt.plot(x, y_true, color='orange', label='f(x)') #plot the line y=x
plt.legend(loc=2)
plt.ylabel('y')
plt.xlabel('x')
plt.show()

<a name='problems_5_1'></a>     

| [Top](#section_5_0) | [Restart Section](#section_5_1) | [Next Section](#section_5_2) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.1.1</span>

We assume that $y_i$ is a random variable with mean $f(x_i)$, and variance $\sigma_i^2$. Given this, what is the probability distribution for $y_i$? Insert the formula in the function definition below. You can optionally plot this distribution, for a given $f(x_i)$ value that we call `y_meani`.

In [None]:
#>>>PROBLEM: P5.1.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

def P(yi, f_xi, sigma):
    return #INSERT CODE HERE


#Setting the mean and variance of the random variable (PLAY WITH THESE!)
mean = 0
variance = 1

#Creating PDF distribution
y_array = np.arange(-4, 4, .01)
y_dist = P(y_array, mean, np.sqrt(variance))

#Plotting
plt.xlabel("yi")
plt.ylabel("PDF")
plt.plot(y_array,y_dist)
plt.show()

>#### Follow-up 5.1.1a (ungraded)
>
>Try plotting a histrogram of $y_i - f(x_i)$. What should be the shape of this distribution? Does this match your expectation? Try varying `y_err` and the number of data points that are generated.
>
>Also try plotting a histogram of $(y_i - f(x_i))/\sigma_{i}$. What was this distribution called?

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.1.2</span>

Given the answer to the previous problem, and assuming that the probability distributions for each of the $y_i$ are independent, what is the joint probability distribution of two random variables $y_1$ and $y_2$? Complete the code below and submit your answer (you can simplify your answer by using the function `P(yi, f_xi, sigma)` that your already defined).

Afterwards, run the completed code, which makes a 3D plot of the joint probability distribution with one axis as $y_1$ and the other $y_2$. Once you have a working solution, try playing around with the means, variances, and ranges to see how the plot changes.

You may also find it instructive to read the documentation:

- <a href="https://numpy.org/doc/stable/reference/generated/numpy.meshgrid.html" target="_blank">`np.meshgrid`</a>
- <a href="https://matplotlib.org/stable/api/_as_gen/matplotlib.pyplot.figure.html" target="_blank">`plt.figure`</a>
- <a href="https://matplotlib.org/2.0.2/mpl_toolkits/mplot3d/tutorial.html" target="_blank">`mplot3d`</a>

In [None]:
#>>>PROBLEM: P5.1.2
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

def P_2D(y1, f_x1, sigma1, y2, f_x2, sigma2):
    return #INSERT CODE HERE

#Setting the mean and variance of the random variables (PLAY WITH THESE!)
mean1 = 0
variance1 = 1
mean2 = 0
variance2 = 1

#Creating PDF distribution
y1 = np.arange(-4, 4, .01)
y2 = np.arange(-4, 4, .01)
Y1, Y2 = np.meshgrid(y1, y2)
Y = P_2D(Y1, mean1, np.sqrt(variance1), Y2, mean2, np.sqrt(variance2))

#Plotting
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(y1, y2, Y, 100)
ax.set_xlabel('y1')
ax.set_ylabel('y2')
ax.set_zlabel('PDF')
plt.show();


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.1.3</span>
 
Suppose we again combine two independent variables, $y_1$ and $y_2$, but this time $y_1$ and $y_2$ are not taken from Gaussian distributions, but instead from two different, more exotic, distributions (i.e. the functional forms of $P(y_{1})$ and $P(y_{2})$ are not the same). Would it still be necessarily true that $P(y_1, y_2) = P(y_1) P(y_2)$?


<a name='section_5_2'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P5.2 The Likelihood Function</h2>    

| [Top](#section_5_0) | [Previous Section](#section_5_1) | [Problems](#problems_5_2) | [Next Section](#section_5_3) |


<h3>Overview</h3>


The previous exercise gives the probability distribution of observing the data $y$ given a particular model $f(x)$ as a function of the independent data points $x$ and the function's parameters $\alpha_1, \alpha_2, \alpha_3, ..., \alpha_m $. This is known as the <b>likelihood</b> function $P(y|\alpha)$. Note that the probability of observing the data given a particular model is *not* the same as the probability of a particular model being true given observed data (although they are related). To begin, however, we will assume that if we find the model that maximizes the probability of the observed data, this model is close to the true model. This technique is called <b>Maximum Likelihood Estimation</b> (MLE).

Suppose we take the errors $\epsilon$ in $y_i = f(x_i; \alpha) + \epsilon $ to be sampled from some general distribution $\rho(y_i - f(x_i, \alpha))$ instead of a Gaussian. We can then construct a  likelihood $P(y|\alpha)$ (and corresponding log-likelihood) function:

$$ P(y|\alpha) = \prod_i \rho(y_i - f(x_i; \alpha))$$

Maximizing this likelihood with respect to the set of parameters $\alpha$ will give us the parameterization of the model that best fits the given data. However, it is often more convenient (and more common) to maximize $\log(P(y|\alpha))$ instead.

<a name='problems_5_2'></a>     

| [Top](#section_5_0) | [Restart Section](#section_5_2) | [Next Section](#section_5_3) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.2.1</span>


Why is it okay to maximize the log likelihood instead of the likelihood in order to find the fit function parameters that most closely describe the data? Choose the best answer from the options below.

Hint: What is the only purpose of the likelihood function that has been mentioned so far? How is it affected by taking the log?


A. The log of any general likelihood function is the same as the function itself

B. Maximizing the log of the likelihood is the same as maximizing the likelihood

C. Once we find the parameters that maximize the log of the likelihood function, we can take the inverse log (also know as exponentiating) of those parameters to get the parameters that maximize the likelihood function itself.

D. The log of the likelihood will be close enough to the likelihood that we can just approximate them to be the same and then run all of our analysis on the log of the likelihood


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.2.2</span>

To convince yourself that your previous answer is correct, take the log of the arbitrary function defined in the code cell below, and then find the $(x, y)$ position of its maximum. Compare this result to the $(x, y)$ position of the maximum of the original function.

Enter the $x$ value where the original function has its maximum, `x_orig`, and the $x$ value where the log-function has its maximum, `x_log`, as a list `[x_orig,x_log]`, with precision 1e-2.

In [None]:
#>>>PROBLEM: P5.2.2
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.


def find_maximum(x,y):
  #Function that takes in two arrays, representing the input of a function,x,
  #and the output of that function, y, and returns the (x,y) coordinate of
  #the maximum as a tuple
  max_y = max(y)
  max_x = #INSERT CODE HERE
  return (max_x,max_y)

def arbitrary_function(x):
    return 2*(2*np.sin(x) + 3*np.sin(2*x) + .5 * np.sin(10*x) + 3*np.sin(.3*x)) -.1*x**2 + x + 2

def log_arbitrary_function(x):
    return #INSERT CODE HERE

x = np.arange(0, 10, .01)

y = arbitrary_function(x)
print("Maximum of arbitrary function: ", find_maximum(x,y))
plt.plot(x, y)
plt.show()

###########################################################
###### INSERT CODE HERE TO PLOT THE LOG ARBITRARY.  #######
###### FUNCTION AND PRINT ITs MAXIMUM X AND Y COORD #######
###########################################################



### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.2.3</span>

Is the $y$ coordinate of the log of the arbitrary function at its maximum point the same as the $y$ maximum of the original arbitrary function?


<a name='section_5_3'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P5.3 Maximum Likelihood and Chi-Squared</h2>    

| [Top](#section_5_0) | [Previous Section](#section_5_2) | [Problems](#problems_5_3) | [Next Section](#section_5_4) |


<h3>Maximum Likelihood Estimation for Gaussian Errors</h3>

Let's return to having $\epsilon$ in $y_i = f(x_i) + \epsilon $ be a Gaussian distribution. We would like to find the parameters that maximize the likelihood distribution,  i.e.

$$\alpha^{\text{best}}_j = \text{argmax}_{\alpha_j} \left( \frac{1}{(2\pi)^{k/2}\prod_i\sigma_i} \exp\left( -\frac{1}{2} \sum_i\frac{(y_i - f(x_i; \alpha_1, \alpha_2, ...))^2}{\sigma_i^2}\right)\right)$$

We know that since the logarithm is a monotonically increasing function, if we maximize the logarithm of this function, we maximize the function itself. This means that

$$\alpha^{\text{best}}_j = \text{argmax}_{\alpha_j} \left( -\frac{1}{2} \sum_i\frac{(y_i - f(x_i; \alpha_1, \alpha_2, ...))^2}{\sigma_i^2} - \sum_i \log (\sigma_i\sqrt{2\pi}) \right)$$


The term involving the $\sum_i \log (\sigma_i\sqrt{2\pi})$ is a constant, so we can ignore it in the maximization. 


$$\alpha^{\text{best}}_j = \text{argmax}_{\alpha_j} \left( -\frac{1}{2} \sum_i\frac{(y_i - f(x_i; \alpha_1, \alpha_2, ...))^2}{\sigma_i^2}  \right)$$


Similarly, the factor of $-\frac{1}{2}$ in front is also a constant. However, we cannot simply ignore it because it has a negative value, If we want to **maximize** a function $-\frac{1}{2}g(z)$, we need to **minimize** the function $g(z)$. So, this gives us the following:


$$\alpha^{\text{best}}_j = \text{argmin}_{\alpha_j} \left(\sum_i\frac{(y_i - f(x_i; \alpha_1, \alpha_2, ...))^2}{\sigma_i^2}  \right)$$



<h3>The Chi-squared Function</h3>

The maximum likelihood solution for the parameters can be found by minimizing the sum of squared residuals divided by the variances. This sum is commonly referred to as the $\chi^2$, defined as:


$$\chi^2(\alpha_1, \alpha_2 ... \alpha_m) = \sum_i\frac{(y_i - f(x_i; \alpha_1, \alpha_2, ...))^2}{\sigma_i^2}$$


**So, many fitting problems that you encounter come down to minimizing this sum!**

In an earlier Lesson, we had a similar sum, but without the $\sigma_i^2$ in the denominator. If we assume that each point has the same uncertainty, then minimizing this $\chi^2$ is the same as minimizing the sum we saw earlier. The quantity shown above is known as the <b>weighted least squares</b>, as opposed to what was used the earlier derivation, which is the <b>ordinary least squares</b>.

<h3>Numerical Minimization</h3>

In an earlier Lesson, we saw how the minimization can be solved analytically, using linear algebra or vector calculus. However, this can get very complicated for anything other than very simple fit functions. As a more versatile alternative, we can use the computer to numerically minimize the $\chi^2$. The most commonly-used minimization technique is known as gradient descent, which will be covered in one of the later Lessons. Gradient descent is what the Python package `lmfit` uses.

<h3>What does Chi-squared tell us?</h3>

For now we will not discuss the minimization itself, but instead delve deeper into the significance of the $\chi^{2}$, in particular how we can use it to assess the overall quality (often called goodness) of a fit. 

Let's see what the $\chi^2$ can tell us when comparing data to a model. First, run the code below to read model points from a file and plot them as a histogram.

In [None]:
#>>>RUN: P5.3-runcell01

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import chisquare

# model
pred = np.genfromtxt('data/P05/chisq_model.txt', delimiter=',')    # read in file
x = np.array([n[0] for n in pred])    # get x values
pred_y = np.array([n[1] for n in pred])    # get y values
plt.xlim(-1,1)
plt.xlabel(r'$x$')
plt.ylabel(r'Frequency')
plt.plot(x,pred_y,linewidth=1,drawstyle='steps')
plt.show()


**Data Sample 1**

Next, read the data points in the file "chisq_data1.txt" and plot them (with error bars) on top of the model histogram. Note that this data file does not contain uncertainties. As you can see in the function `plot_error`, we assume that the uncertainties are Poisson, so we can calculate them as simply the square root of the number of counts for each data point.


In [None]:
#>>>RUN: P5.3-runcell02

def plot_error(y):
    plt.errorbar(x,y,yerr=np.sqrt(y),ecolor='k',elinewidth=1,capsize=4,linestyle='')

In [None]:
#>>>RUN: P5.3-runcell03

data1 = np.loadtxt('data/P05/chisq_data1.txt', delimiter=',')    # read in file
plt.xlim(-1,1)
plt.xlabel(r'$x$')
plt.ylabel(r'Frequency')
plt.plot(x,pred_y,linewidth=1,drawstyle='steps')
plt.scatter(x,data1,c='k')
plot_error(data1)

Compute and print the $\chi^2$ coefficient and its probability. Note that we first need to normalize our model to our data in order to calculate the correct $\chi^2$ value. 

For finding the $\chi^2$ coefficient and its probability, we will use the `scipy.chisquare` function, which takes data and a prediction and computes the $\chi^{2}$. This computation makes the assumption that the uncertainty in the prediction is Poisson, namely $\sigma_{i}=\sqrt{y_{\rm pred}}$. 

In the code cell below, we first check that we understand what is going on by taking two data and prediction points and calculating their $\chi^{2}$. Given the assumption of Poisson predictions, this should give:

$$\chi^2 =\sum_{i=1}^2\frac{({\rm data}_i - {\rm pred}_i)^2}{\sigma_i^2}= \sum_{i=1}^2\frac{({\rm data}_i - {\rm pred}_i)^2}{{\rm pred}_i}$$

You can see that the $\chi^{2}$ returned by `chisquare` (the quantity labeled "statistic =") and the explicit calculation are identical.

The `chisquare` function also outputs the probability of randomly getting a $\chi^2$ greater than or equal to the value observed. To find this probability, it assumes that the residuals are all perfectly Gaussian. We will discuss how this probability computation is done later.  


In [None]:
#>>>RUN: P5.3-runcell04

#quick test whether we understand what is going on
print("Check:",chisquare([10,20],[9.9,20.1]))

print("This should be : ", (0.1**2/9.9 + 0.1**2/20.1))
print()

pred_y = pred_y/np.sum(pred_y) * np.sum(data1) #Normalize model to data
#print(pred_y)

chi2,p = chisquare(data1,pred_y)
print("Chi-squared:",chi2)
print("Probability: {:.4%}".format(p)) #a bit dodgy!




For this case, you can see in the plot that a very large fraction of the data points are 1$\sigma$ or more away from the prediction. So, it's not surprising that the $\chi^{2}$ is large and has a very low probability.

**Data Sample 2**

Try again for a different set of data, called "chisq_data2.txt". Again, compute and print the Chi-squared coefficient and probability.

In [None]:
#>>>RUN: P5.3-runcell05

data2 = np.loadtxt('data/P05/chisq_data2.txt', delimiter=',')    # read in file
plt.xlim(-1,1)
plt.xlabel(r'$x$')
plt.ylabel(r'Frequency')
plt.plot(x,pred_y,linewidth=1,drawstyle='steps')
plt.scatter(x,data2,c='k')
plot_error(data2)
plt.show()

pred_y = pred_y/np.sum(pred_y) * np.sum(data2) #Normalize model to data
chi2,p = chisquare(data2,pred_y)
print("Chi-squared:",chi2)
print("Probability: {:.2%}".format(p)) #quite consistent!

Here, you see a much larger fraction of the data points falling less than 1$\sigma$ from the prediction, so the $\chi^{2}$ is smaller and has a larger probability.

**Data Sample 3**

Finally, see what happens with a third set of data, called "chisq_data3.txt". Again, compute and print the $\chi^2$ coefficient and its probability.

In [None]:
#>>>RUN: P5.3-runcell06

data3 = np.loadtxt('data/P05/chisq_data3.txt', delimiter=',')    # read in file
plt.xlim(-1,1)
plt.xlabel(r'$x$')
plt.ylabel(r'Frequency')
plt.plot(x,pred_y,linewidth=1,drawstyle='steps')
plt.scatter(x,data3,c='k')
plot_error(data3)

pred_y = pred_y/np.sum(pred_y) * np.sum(data3) #Normalize model to data
chi2,p = chisquare(data3,pred_y)
print("Chi-squared:",chi2)
print("Probability: {:.2%}".format(p)) #too consistent...

Here, the probability of the observed $\chi^{2}$ value is very high. 

<a name='problems_5_3'></a>     

| [Top](#section_5_0) | [Restart Section](#section_5_3) | [Next Section](#section_5_4) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.3.1</span>

In physics, the distributions that we measure are sometimes too complicated to be modeled with a straight forward analytical fit function. In such cases, our model (which we compare our real data to) will be a set of simulated data. 

For this problem, you are provided the $y$ value arrays for a real dataset (`data`) and a simulated one (`MC`). Compute the $\chi^2$ value of the data as compared to the simulation and enter your answer as a number with precision 1e-3. *Remember that it is important to normalize the simulation to the data so that they have the same total number of events.*


In [None]:
#>>>PROBLEM: P5.3.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

bin_edges = np.array([  0.  ,   3.75,   7.5 ,  11.25,  15.  ,  18.75,  22.5 ,  26.25,
        30.  ,  33.75,  37.5 ,  41.25,  45.  ,  48.75,  52.5 ,  56.25,
        60.  ,  63.75,  67.5 ,  71.25,  75.  ,  78.75,  82.5 ,  86.25,
        90.  ,  93.75,  97.5 , 101.25, 105.  , 108.75, 112.5 , 116.25,
       120.  , 123.75, 127.5 , 131.25])

bin_centers = np.array([(bin_edges[i]+bin_edges[i+1])/2 for i in range(len(bin_edges)-1)])

data = np.array([ 403, 1114, 1345, 1379, 1236, 1056,  865,  659,  506,  380,  319,
        178,  163,  119,   89,   54,   57,   37,   24,   19,   19,   19,
          5,    9,    6,    4,    7,    3,    3,    2,    3,    0,    0,
          4,    2])

#Monte Carlo Simulation
MC = np.array([3057, 7241, 9282, 9404, 8559, 7564, 6259, 5124, 3976, 3095, 2445,
       1816, 1398, 1019,  763,  569,  399,  307,  194,  174,  117,   96,
         67,   60,   29,   20,   22,   13,    7,    6,    5,    5,    2,
          3,    3])


#YOUR CODE HERE
#Normalize model to data and print the chisquare result


#NOW PLOTTING
plt.bar(bin_centers, MC, width=3.75, color='red') #Plot MC
plt.errorbar(bin_centers,data,yerr=np.sqrt(data),marker='o',linestyle='none') #Plot data
plt.ylabel("Number of entries")
plt.show();

plt.plot(np.arange(0, 135, .01), np.ones(len(np.arange(0, 135, .01))), color='black')
plt.errorbar(bin_centers, data/MC,yerr=np.sqrt(data)/MC,marker='o',linestyle='none') #Plot ratio of data/MC (should be 1)
plt.ylabel("Data/Simulation")
plt.ylim(0,2)
plt.show();

>#### Follow-up 5.3.1a (ungraded)
>
>Does the p-value make sense for this data? What could be wrong?

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.3.2</span>

If you haven't done so, make the the first plot in the preceding problem, which compares a histogram of the normalized prediction to data points with error bars.

Looking closely, you will notice that the peak in the data is a bit narrower than that for the simulation, resulting in very poor agreement between the two. Luckily, this dataset has a correction to account for this difference.

Use the new `data_corr` histogram in the code cell below and again perform a $\chi^2$ test. What is the p-value for this new set of data? Enter your answer as a number with precision 1e-3.



In [None]:
#>>>PROBLEM: P5.3.2
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

bin_edges = np.array([  0.  ,   3.75,   7.5 ,  11.25,  15.  ,  18.75,  22.5 ,  26.25,
        30.  ,  33.75,  37.5 ,  41.25,  45.  ,  48.75,  52.5 ,  56.25,
        60.  ,  63.75,  67.5 ,  71.25,  75.  ,  78.75,  82.5 ,  86.25,
        90.  ,  93.75,  97.5 , 101.25, 105.  , 108.75, 112.5 , 116.25,
       120.  , 123.75, 127.5 , 131.25])

bin_centers = np.array([(bin_edges[i]+bin_edges[i+1])/2 for i in range(len(bin_edges)-1)])

data_corr = np.array([405, 958, 1240, 1316, 1161, 1042, 881, 712, 528, 425, 319, 
                      250, 183, 142, 104, 77, 55, 42, 25, 23, 15, 12, 9, 4, 4, 1, 
                      4, 3, 0, 0, 1, 2, 1, 0, 3])

MC = np.array([3057, 7241, 9282, 9404, 8559, 7564, 6259, 5124, 3976, 3095, 2445,
       1816, 1398, 1019,  763,  569,  399,  307,  194,  174,  117,   96,
         67,   60,   29,   20,   22,   13,    7,    6,    5,    5,    2,
          3,    3])


#YOUR CODE HERE
#Normalize model to data and print the chisquare result


plt.bar(bin_centers, MC, width=3.75, color='red') #Plot MC
plt.errorbar(bin_centers,data_corr,yerr=np.sqrt(data_corr),marker='o',linestyle='none') #Plot data
plt.ylabel("Number of entries")
plt.show();

plt.plot(np.arange(0, 135, .01), np.ones(len(np.arange(0, 135, .01))), color='black')
plt.errorbar(bin_centers, data_corr/MC,yerr=np.sqrt(data_corr)/MC,marker='o',linestyle='none') #Plot ratio of data/MC (should be 1)
plt.ylabel("Data/Simulation")
plt.ylim(0,2)
plt.show();

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.3.3</span>

This second $\chi^2$ test outcome looks much better. However, maybe we can do even better.

Looking at the ratio plot, we see that most of the data points that have large deviations from the simulation are located in the last few bins. Furthermore, the bulk of the distribution (i.e. the area around the peak) is found in the first 10$-$15 bins.

Try cutting the data and simulation arrays to include only the first 10 bins and perform the $\chi^2$ test a third time. What is the new p-value? Enter your answer as a number with precision 1e-3.

In [None]:
#>>>PROBLEM: P5.3.3
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

bin_edges = np.array([  0.  ,   3.75,   7.5 ,  11.25,  15.  ,  18.75,  22.5 ,  26.25,
        30.  ,  33.75,  37.5 ,  41.25,  45.  ,  48.75,  52.5 ,  56.25,
        60.  ,  63.75,  67.5 ,  71.25,  75.  ,  78.75,  82.5 ,  86.25,
        90.  ,  93.75,  97.5 , 101.25, 105.  , 108.75, 112.5 , 116.25,
       120.  , 123.75, 127.5 , 131.25])

bin_centers = np.array([(bin_edges[i]+bin_edges[i+1])/2 for i in range(len(bin_edges)-1)])

data_corr = np.array([405, 958, 1240, 1316, 1161, 1042, 881, 712, 528, 425, 319, 
                      250, 183, 142, 104, 77, 55, 42, 25, 23, 15, 12, 9, 4, 4, 1, 
                      4, 3, 0, 0, 1, 2, 1, 0, 3])

MC = np.array([3057, 7241, 9282, 9404, 8559, 7564, 6259, 5124, 3976, 3095, 2445,
       1816, 1398, 1019,  763,  569,  399,  307,  194,  174,  117,   96,
         67,   60,   29,   20,   22,   13,    7,    6,    5,    5,    2,
          3,    3])



maxbin=10
bin_centers = #INSERT CODE HERE
MC = #INSERT CODE HERE
data_corr = #INSERT CODE HERE

#YOUR CODE HERE
#Normalize model to data and print the chisquare result

plt.bar(bin_centers, MC, width=3.75, color='red') #Plot MC
plt.errorbar(bin_centers,data_corr,yerr=np.sqrt(data_corr),marker='o',linestyle='none') #Plot data
plt.ylabel("Number of entries")
plt.show()

plt.plot(np.arange(0, 3.6*maxbin, .01), np.ones(len(np.arange(0, 3.6*maxbin, .01))), color='black')
plt.errorbar(bin_centers, data_corr/MC,yerr=np.sqrt(data_corr)/MC,marker='o',linestyle='none') #Plot ratio of data/MC (should be 1)
plt.ylabel("Data/Simulation")
plt.ylim(0,2)
plt.show()

>#### Follow-up 5.3.3a (ungraded)
>
>The cut improves the fit between the data and the simulation, but it also cuts out a significant amount of data. In particular, if we are interested in properties like the length of the tail on the high side of the peak, this version of the fit misses that entirely. Play around with the number of bins that are included. At what cut value are you happy with how much data was thrown out vs. fit quality? Can you write a process to automatically select an ideal number of bins based on some fit criteria?

<a name='section_5_4'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">P5.4 A Fitting Example</h2>   

| [Top](#section_5_0) | [Previous Section](#section_5_3) | [Problems](#problems_5_4) |


<h3>Overview</h3>

To investigate the first step of the minimization process, let's look at a different example.

Suppose you're an astrophysicist looking at a distant star. Photons hit your telescope at random, independent intervals, so the number that you detect within your period of observation is Poisson distributed.

Also, this star is really important, and $N\gg 1$ telescopes are looking at it. Your data $D$ is therefore is $\{n_1, n_2, \dots, n_N\}$, the array of counts observed by all $N$ of the telescopes during one day.

Let's generate some Poisson-distributed sample data for each telescope. We'll assume a parameter of $\lambda=5$ counts per day, with $N=100$ telescopes.

In [None]:
#>>>RUN: P5.4-runcell01

import numpy as np
np.random.seed(15)

LAMBDA = 5
N = 100

counts = np.random.poisson(LAMBDA, N);
#Optionally print the counts
#print(counts)

#Look at the plot compared to the true function
bins = np.arange(np.max(counts)+2)-0.5
xs = bins[:-1]+0.5
y_hist_vals, binEdges = np.histogram(counts, bins=bins)
plt.hist(counts, bins=bins)
plt.plot(xs, N * poisson.pmf(xs,LAMBDA), label="True distro", color="C1", linewidth=2)
plt.legend(loc=1)

Since each telescope's detection $n_i$ is independent, the probability of detecting data set $D$ given some estimate of $\lambda$, is simply the product of the probability for each telescope to detect $n_i$. This probability is Poisson distributed.

We would like to use the `lmfit` minimization function, which does not actually request the likelihood as an input. Instead, it asks for the logarithm of the likelihood of each data point, and assumes that each data point is independent. Internally, it adds all the likelihoods from all data points to get the log-likelihood of the data.

Run the cell below to define the `log_likelihood`.

In [None]:
#>>>RUN: P5.4-runcell02

import numpy as np
from scipy.stats import poisson

def log_likelihood(l, data):
    return np.log(poisson.pmf(data, l))

Use the `lmfit`'s minimization function to maximize the likelihood (i.e. minimize the negative likelihood) to see if it finds the true value $\lambda=5$.

In [None]:
#>>>RUN: P5.4-runcell03

from lmfit import Parameters, minimize, fit_report

def negative_log_likelihood(l, data):
    return -log_likelihood(l, data)

params = Parameters()
params.add('l', min=0, value=1)

result = minimize(negative_log_likelihood, params, args=(counts,))
print(fit_report(result))

You see that the minimization result is not exactly $\lambda=5$, but is equal to $5$ within the fit uncertainty. To further evaluate the fit quality, let's plot the data, true, and best fit distributions.

In [None]:
#>>>RUN: P5.4-runcell04

import matplotlib.pyplot as plt

bins = np.arange(np.max(counts)+2)-0.5
xs = bins[:-1]+0.5
plt.hist(counts, bins=bins, label="Photon data", fill=False, histtype="step", color='k', linewidth=2)

y, binEdges = np.histogram(counts, bins=bins)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
plt.errorbar(bincenters,y,yerr=np.sqrt(y),marker='o',color='k', linestyle='none')

plt.plot(xs, N * poisson.pmf(xs, result.params['l'].value), label="Best fit distro", color="C0", linewidth=2)
plt.plot(xs, N * poisson.pmf(xs,LAMBDA), label="True distro", color="C1", linewidth=2)

plt.xlabel("Number of photons observed")
plt.ylabel("Number of telescopes")
plt.legend()

pred_y = (N) * poisson.pmf(bincenters, result.params['l'].value)
pred_y = pred_y/np.sum(pred_y) * np.sum(y)

chi2,p = chisquare(y, pred_y)
print("Chi-squared:",chi2)
print("Probability: {:.4%}".format(p)) 

<a name='problems_5_4'></a>   

| [Top](#section_5_0) | [Restart Section](#section_5_4) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.4.1</span>

To represent the uncertainty on $\lambda$ in a graphical way, we will plot uncertainty bounds on our histogram. In particular, we will do the following:

- compute the Poisson distribution for 5 values of $\lambda$ in the range $[\lambda - 2\sigma_\lambda$, $\lambda+2\sigma_\lambda$], where $\sigma_\lambda$ is the uncertainty on $\lambda$ generated by the fit, which you can get with `result.params['l'].stderr`
- for each bin, obtain the minimum and maximum Poisson predicted values from the 5 distributions
- use `plt.fill_between` to shade the area between the min and max values in each bin.

Ultimately, we want the lower edge of the error band in each bin to represent the lowest Poisson predicted value among all 5 distributions computed above. Similarly, the higher edge should represent the highest Poisson predicted value among the 5 distributions.

Having completed the task above, consider this question: **If 10 telescopes report observing 6 photons, is this consistent with the value of $\lambda$ that you found from your fit procedure, within $2\sigma$?**

In [None]:
#>>>PROBLEM: P5.4.1
# Use this cell for drafting your solution (if desired),
# then enter your solution in the interactive problem online to be graded.

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import poisson

bins = np.arange(np.max(counts)+2)-0.5
xs = bins[:-1]+0.5
plt.hist(counts, bins=bins, label="Photon data", fill=False, histtype="step", color='k', linewidth=2)

y,binEdges = np.histogram(counts, bins=bins)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
plt.errorbar(bincenters,y,yerr=np.sqrt(y),marker='o',color='k', linestyle='none')

plt.plot(xs, N * poisson.pmf(xs, result.params['l'].value), label="Best fit distro", color="C0", linewidth=2)
plt.plot(xs, N * poisson.pmf(xs,LAMBDA), label="True distro", color="C1", linewidth=2)

####################
# Insert Code Here #
####################

minimum = None # Placeholder Value - Fill in the correct line
maximum = None # Placeholder Value - Fill in the correct line

####################

plt.xlabel("Number of photons observed")
plt.ylabel("Number of telescopes")
plt.legend()
plt.show();

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Problem 5.4.2</span>

For some number of photon counts in the range `[0,10]`, find a number of telescope observations that falls below the $2\sigma$ bound outlined in the previous problem. Report your numbers as integers in the following format: `[number of photons, number of telescopes]`.

>#### Follow-up 5.4.2a (ungraded)
>
>Suppose one of our telescopes also records information about the energy of each photon from the star. Over time, we could produce a histogram of these values representing the spectrum of the star's light output. Suppose we also have a model of the star that predicts a certain spectrum, given some parameters. How could we make a best estimate of these parameters (assuming systematic uncertainties are small relative to statistical uncertainties)?
