# Workshop 8

The purpose of this workshop is to expose you to concepts related to maximum log likelihood estimation. This should also help you to learn about numpy, scipy methods. Go through these exercises will also help you with your homework 4.


# 1. Maximum log likelihood estimator

In a single bin counting experiment, we measure an observed outcome $k$ that follows a Poisson distribution with an expectation of $\lambda$. The likelihood function of this measurement is simply Poisson($k|\lambda$). The expectation is parameterized as $\lambda = \mu\cdot s + b$ where $s$ and $b$ are the expected number of signal and expected number of background evnets, respectively, and $\mu$ is the so-called signal strength parameter that can scales up and down the expected signal contribution in the $\lambda$. In the measurement, $\mu$ is the parameter of interest that we want to determine from the observation, and $s$ and $b$ are constants. 

In a maximum log likelihood fit, we would need to maximize the value of the log likelihood function of the measurement. Equivally, we would minimize the **negative log likelihood (NLL)** function of the measurement, which is given as 

$$
-\mathrm{log}L(k|\lambda) = -\lambda + k\mathrm{log}(\lambda) - \mathrm{log}(k!),
$$
where $\lambda = \mu\cdot s + b$.






In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from scipy.stats import poisson

Task 1: define a function to implement this single bin negative log likelihood function. The function should have four input arguments, $\mu$, $k$, $s$, and $b$. Hint: this can be easily done with the `logpmf` method of the scipy.stats.poisson

You may name this function as NLP for the sake of simplicity.

In [None]:
def NLP(mu, obs, sig, bkg):
    # develop your code here
    return 

Set $s$ = 5, $b$ = 100, and $k$ = 115. Show the `NLP` as a function of $\mu$. The sequence of $\mu$ values should be given by `mu = np.linspace(1,5,401)`. If your NLP is defined correctly, your plot should look like this one.

<img src='https://portal.nersc.gov/project/m3438/physics77/WS08/fig1.png' width=500>


The $\mu$ value that minimizes the negative log likelihood function is our measured $\mu$. What is it? 

Hint - you may use np.argmin() to find the position of the entry with the minimum value, and then use this position to determine the $\mu$ value that minimizes the negative log likelihood function.

In [None]:
# Develop your code here

### scipy.optimize.minimize

Now we use the scipy.optimize.minimize method to determine the $\mu$ value. Let's look at the example below.

In [None]:
# define a function for minimization
def func(x,a,b,c):
    return (x[0]-a)**2 + (x[1]-b)**2 + c

# the minimum of this function should be c when x = [a,b] 

# x is a list or numpy array (preferred)
# if your function has N free parameters to be determined by the minimization
# then x should have a shape of (N,) or x should be a list with N entries

In [None]:
result = minimize(func,x0=[0,0], args=(3,4,20))

#This line minimizes the function of "func"
# args sets the constant parameters of func, 
#in this case we have a = 3, b = 4, c = 20
# the free parameters in func is given by x
# x0=[0,0] sets some initial guess of these values
# these values are determined by the minimization

In [None]:
# the result of the minimization is saved in the object call result
# Let's print it out
print(result)

# "fun" is the minimized value of the functionh
# x is the array storing the values that minimize the function
# is x matching x = [a,b]?
# is the minimized value the same as c?

In [None]:

# to retrive values from the returning object of the minimize method
# simply do

print(result.fun)
print(result.x)
print(result.x[0])
print(result.x[1])

### Now, use the scipy.optimize.minimize to determine the $\mu$ value that minimizes the negative log likelihood of this measurement

- recall the $k$, $s$, and $b$ should be constants in the NLP function
- $\mu$ is the single free parameter in the fit

How does this compare with the $\mu$ value you determined by scanning the NLP as a function of $\mu$?

## Functions with multiple minima

While minimize is a great tool, sometimes it is still intuitive to perform the scan of the negative log likelihood. 

Now, consider the expectation is parameterized as $$ \lambda = \kappa^{2} s + b$$

where $\kappa$ is the parameter we want to measure. In this case, our negative log likelihood function will depend on $\kappa$ and it is given in the cell below.

In [None]:
# This cell requires you define a NLP function in the earlier part of this workshop

def NLP_vs_kappa( kappa, obs, sig, bkg ):
    return NLP( kappa**2, obs, sig, bkg)



Let's visualize the negative log likelihood function's dependence on $\kappa$

In [None]:
kappa = np.linspace(-3,3,6001)
NLP_kappascan=NLP_vs_kappa(kappa,obs,sig,bkg)
plt.plot(kappa,NLP_kappascan)
plt.xlabel('$\kappa$')
plt.ylabel('Negative Log Likelihood Function')
plt.text(-2,5,'Obs = 110, s = 5, b = 100')

**Write your code below to determine the $\kappa$ values that minimize the negative log likelihood function. Note that there are two minima in this function.**

In [None]:
# Your code here


#### Now let's see what would this line with scipy.optimize.minimize find

In [None]:
result_2 = minimize(NLP_vs_kappa, x0=np.ones(1), args=(obs,sig,bkg), method='Nelder-Mead' )
print(result_2)

#### How about this one?

*Are these fit results different? Why? What is different between this line and the one above?*

In [None]:
result_3 = minimize(NLP_vs_kappa, x0=np.ones(1)*(-1), args=(obs,sig,bkg), method='Nelder-Mead' )
print(result_3)

You may have also noticed that in the minimization function, an option of method is specified. You can see a full list of avaialble methods here
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html
These are different ways to converge on the minimum.

## local and global minima

Now we switch to a slightly more complicated parameterization of $\lambda$. We have 
$$ \lambda = (-0.5\kappa+1)\kappa^{2}\cdot s + b 
$$
    This parameterization would give you two minima in the NLL curve. The one that has smaller value is called a `global minimum`, and the one that has a larger value is called a `local minimum`.

In [None]:
# This cell requires you define a NLP function in the earlier part of this workshop
def NLP_vs_kappa_v2( kappa, obs, sig, bkg ):
    return NLP( (-0.5*kappa+1)*(kappa)**2 , obs, sig, bkg)

**visualize the negative log likelihood function's dependence on $\kappa$. Your plot should look like**

<img src="https://portal.nersc.gov/project/m3438/physics77/WS08/fig2.png" width =500>

This should indicates that the global minimum is around $\kappa$ ~ -1.4, but there is a local minimum at around +1.4 in $\kappa$.  

**Also determine the minimum of the negative log likelihood function in the range of -3 < $\kappa$ < +3**

In [None]:
# Write your code here

# plot



#minimum 
print('The kappa value that minimizes the negative log likelihood is {:4.4f}'.format())
print('The minimum of NLL is {:4.4f} '.format()



**Develop a minimization code to determine the $\kappa$ (kappa) value that minimizes the NLL**
- Try setting the initial value of $\kappa$ to + 1 and - 1 by changing the `x0=` option and see if the minization result is different

In [None]:
result_4 = minimize(NLP_vs_kappa_v2, x0=np.ones(1), args=(obs,sig,bkg), method='Nelder-Mead' )
print(result_4)

In [None]:
result_5 = minimize(NLP_vs_kappa_v2, x0=-np.ones(1), args=(obs,sig,bkg), method='Nelder-Mead' )
print(result_5)

You will see if the initial value of $\kappa$ is positive, then the minimization is trapped in the local minimum. This speaks to the importance of properly setting the initial value as well as the need to develop robust minimization methods. 

### Basinhopping
Below is another minimization method provided in scipy.optimize, which has strategies to get out of the local minimum.

In [None]:
from scipy.optimize import basinhopping 


Duplicate the lines below into new cells, and try different stepsize (0.1, 1, 2 ) and different niter (50,500). **Are there any particular configurations that allow you to find the global minimum?**

Consult this page to understand the meaning of stepsize and niter
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.basinhopping.html





In [None]:
result_basinhopping=basinhopping(NLP_vs_kappa_v2, x0=np.ones(1), stepsize=0.2, niter=120, minimizer_kwargs={"args":(obs,sig,bkg)})
print(result_basinhopping)

In [None]:
# Try a different parameter configuration here

In [None]:
# Try a different parameter configuration here

In [None]:
# Try a different parameter configuration here

In [None]:
# Try a different parameter configuration here

# 2. Fit

For a given function with a set of tunable parameters, we can perform a fit to data to determine the set of parameter values that makes the function best describe data. We will see a few different ways to fit data in this exercise.

## Generate a sample of normally distributed random numbers

**Use numpy to generate a sample of random numbers that follow a Gaussian distribution with a mean of 125 and a standard devation of 2.**

In [None]:
# Write your code here

# declare a random number generator
rng = 

# generate the random set
data = 

## rv_continuous fit
In scipy.stats, if a function is continuously defined, meaning that its input variable is continuous rather than discrete, then it has a fit method. For example, a Gaussian function is such a function. scipy.stats use "rv_continuous" to refer to such functions. 

The default approach used here is a maximum log likelihood method, in other words, a log likelihood function is constructed, then values of parameters that maximize the log likelihood function are returned. However, the rv_continuous does not return the maximized log likelihood values. Nonetheless, one can easily construct it.


In [None]:
# we need to import norm from scipy.stats first
# norm represents normal distribution 
from scipy.stats import norm

In [None]:
# the fit is done by a single line
# the returning arguments are the mean and the standard deviation of the fitted function
# see https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.norm.html for more
mean, std = norm.fit(data)

print(mean,std)



In [None]:
#To visualize the fit result

# Plot the random numbers
bincontent, binedge, others = plt.hist(data,bins=20,range=(120,130),density=True,label="Generated")


# Get the x values for pdf calculation
x = (binedge[0:-1]+binedge[1:])/2

# Take note how you draw the fitted pdf 
plt.plot(x, norm.pdf(x,mean,std), 'r-',lw=5, alpha=0.6, label='Fitted pdf')
plt.xlabel('x')
plt.ylabel('Fraction of entries')
plt.legend()

plt.text(mean,np.max(bincontent)*0.1,'Fitted mean = {:4.3f}'.format(mean))
plt.text(mean,np.max(bincontent)*0.05,'Fitted sigma = {:4.3f}'.format(std))

**Calculate the maximized log likelihood value for this given fit.**
- Hint: scipy.stats.norm has the logpdf method  


In [None]:
# Your code here
maxLL = 
print('The maximized log likelihood value is {:4.4f}'.format(maxLL))

## curve_fit
scipy.optimize provides a curve_fit method, which is based a least-squares method. In a least-squares fit, the value of $\sum_{i}(f(x_i,\theta) - y_i)^2$ is minimized. Here $x_i, y_i$ are parameters that define the data points, and $f(x_i,\theta)$ is the fit function, and $\theta$ represents a set of parameters that are *variable* in the fit.  

Consult 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
for more information

In [None]:
from scipy.optimize import curve_fit

In [None]:
# Define a gaussian function
def gaussian(x,mean,std):
    return norm.pdf(x,mean,std)

In [None]:
# Curve_fit returns two sets of parameters
# the first one, a, are values of free parameters in the fit
# the second one, b, are covariance of the fit paramters
a,b = curve_fit(gaussian,x,bincontent,bounds=([120,1.0],[130,5]))



Visualize the fit result. Here you will see the fit results are different from the rv_continuous.fit, because the figure of merit used in the minimization is different.

In [None]:
print(a,b)

plt.scatter(x,bincontent,label="Generated Data")
plt.plot(x,gaussian(x,*a),'g--', label="Scipy curve_fit result")
plt.text(123,np.max(bincontent)*0.1,'Fitted mean = {:4.3f}'.format(a[0]))
plt.text(123,np.max(bincontent)*0.05,'Fitted sigma = {:4.3f}'.format(a[1]))
plt.xlabel('x')
plt.ylabel('Fraction of entries')
plt.legend()

## Binned Log likelihood fit

Here we develop code to perform a binned log likelihood fit. 

Earlier we used this line to produce a histogram of the random distribution

`bincontent, binedge, others = plt.hist(data,bins=20,range=(120,130),density=True,label="Generated")`

This histogram has 20 bins between 120 and 130. Each bin has a width of 0.5. The number of events in the bins are given by the numpy array `bincontent`, and the centers of the bin can be derived from the numpy array `binedge` (you should have seen my line doing just this earlier in the notebook.)

The expected number of events in a bin $i$ is $\lambda = \mathrm{normal}(x_i,\mu,\sigma)\cdot w_i$, where $\mathrm{normal}(x_i,\mu,\sigma)$ is the prediction of a normal function with a mean of $\mu$ and a standard deviation of $\sigma$ at the value of $x_i$, and $w_i$ is the bin width.

The observed number of events is given by `bincontent`.

In this exercise:
- Write a negative log likelihood function for this fit. This function should have 
    - a numpy array named as `par` that has the mean and standard deviation of the normal distribution as its elements. The par parameters will be floating in the fit, i.e., the fit will tune these parameters
    - a numpy array x that corresponds to the bin ceneters where the expectation is evaluated
    - a numpy array y that corresponds to the observed number counts in the bins
    - a binwidth parameter that is 0.5 for this problem




In [None]:
def optimizationfunction(par,x,y,binwidth):

    # Your code here
    
    return

- Use the scipy.optimize.minimize method to determine the mean and standard deviation 
    - think about what should be the function to optimize, what values should be given as x0, if you need to give anything to args=?

In [None]:
# minimize code
result = minimize()

Overlay the fitted PDf with the random data. Your output should look like this one
<img src="https://portal.nersc.gov/project/m3438/physics77/WS08/fig3.png" width =500>


In [None]:
# Your plotting code here

Congrats for completing the workshop!