<hr style="height: 1px;">
<i>This notebook was authored by the 8.S50x Course Team, Copyright 2022 MIT All Rights Reserved.</i>
<hr style="height: 1px;">
<br>

<h1>Lesson 23: Markov Chain Monte Carlo - Part II</h1>



<a name='section_23_0'></a>
<hr style="height: 1px;">


## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L23.0 Overview</h2>


<h3>Navigation</h3>

<table style="width:100%">
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_23_4">L23.1 Quantum simulations</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_23_4">L23.1 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_23_5">L23.2 MCMC using a professional sampler</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_23_5">L23.2 Exercises</a></td>
    </tr>
    <tr>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#section_23_6">L23.3 Gravitational Waves</a></td>
        <td style="text-align: left; vertical-align: top; font-size: 10pt;"><a href="#exercises_23_6">L23.3 Exercises</a></td>
    </tr>
</table>

<h3>Learning Objectives</h3>

Here we continue our exploration of MCMC methods, and consider some interesting physical situations where we can apply and extend what we've. In particular, we will look at examples from quantum mechanics and gravitational wave analysis (i.e., LIGO data). Thus, we discuss more professional contexts where MCMC is used.

<h3>Slides</h3>

You can access the slides related to this lecture at the following link: <a href="https://github.com/mitx-8s50/slides/raw/main/module3_slides/L23_slides.pdf" target="_blank">L23 Slides</a>

<h3>Installing Tools</h3>

Before we do anything, let's make sure we install the tools we need.

In [None]:
#>>>RUN: L23.0-runcell01

!pip install corner
!pip install lmfit
!pip install bilby
!pip install gwpy lalsuite

<h3>Importing Libraries</h3>

Before beginning, run the cell below to import the relevant libraries for this notebook. 

In [None]:
#>>>RUN: L23.0-runcell01

import imageio
from PIL import Image

import lmfit
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import csv
import math
from scipy import optimize as opt
from scipy import stats
import matplotlib.cm as cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter
import corner

import bilby
import scipy.signal as sig
from bilby.gw.source import lal_binary_black_hole
from bilby.gw.conversion import convert_to_lal_binary_black_hole_parameters

<h3>Setting Default Figure Parameters</h3>

The following code cell sets default values for figure parameters.


In [None]:
#>>>RUN: L23.0-runcell02

#set plot resolution
%config InlineBackend.figure_format = 'retina'

#set default figure parameters
plt.rcParams['figure.figsize'] = (9,6)

medium_size = 12
large_size = 15

plt.rc('font', size=medium_size)          # default text sizes
plt.rc('xtick', labelsize=medium_size)    # xtick labels
plt.rc('ytick', labelsize=medium_size)    # ytick labels
plt.rc('legend', fontsize=medium_size)    # legend
plt.rc('axes', titlesize=large_size)      # axes title
plt.rc('axes', labelsize=large_size)      # x and y labels
plt.rc('figure', titlesize=large_size)    # figure title

<a name='section_23_4'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L23.1 Quantum Simulations </h2>  

| [Top](#section_23_0) | [Previous Section](#section_23_3) | [Exercises](#exercises_23_4) | [Next Section](#section_23_5) |

*The material in this section is discussed in the videos **<a href="https://courses.mitxonline.mit.edu/learn/course/course-v1:MITxT+8.S50.3x+3T2023/block-v1:MITxT+8.S50.3x+3T2023+type@sequential+block@seq_LS23/block-v1:MITxT+8.S50.3x+3T2023+type@vertical+block@vert_LS23_vid1" target="_blank">HERE</a>.** You are encouraged to watch that video and use this notebook concurrently.*

<h3>Overview</h3>

Now, in the interest of showing the diversity of Markov Chain Monte Carlo, we would like to show how we can use this to augment simulations. The strategy here is to rely on the fact that

 * We have a formula that describes the full behavior of our system and outputs a likelihood of it happening
 * We don't know the parameters to get this system to behave naturally

In this instance, the Markov Chain MC step can be used to get the right conditions for the system to work.

<h3>Solving Schrodinger's Equation for the Hydrogen Atom</h3>

To give you a concrete example, let's solve the Hydrogen atom with Markov Chain MC. What we are going to do is solve Schrodinger's equation for a wave function $\psi$. We can write the equation as:

$$
H\psi = E \psi \\
(H-E)\psi = 0 \\
H=\frac{p^2}{2m} + \frac{ke^2}{r^2} \\
-\frac{\hbar^2}{2m}\frac{\partial^{2} \psi}{\partial r^2} - \left(  \frac{ke^2}{r} -\frac{\hbar^2\ell(\ell+1)}{2mr^2}  \right)\psi  - E \psi = 0,
$$

where the last equation includes a term due to angular momentum $l$. Note that the momentum here gets transplanted to a derivative in the wave function. Hence $p^{2}/2m$ gets translated to a second partial derivative. 


If we use so-called "natural" units by setting $\hbar$, $m$, and $k$ to 1, we can write:
$$
-\frac{1}{2}\frac{\partial^{2} \psi}{\partial r^2} - \left(  \frac{1}{r} -\frac{\ell(\ell+1)}{2r^2}  \right)\psi  - E \psi = 0
$$


This is a differential equation we can solve by minimizing the above equation with a proposal for the wavefunction that has a form characteristic of our final solution. Given the above equation, we can write a proposed solution of:

$$
\psi(r) = \alpha r e^{-\alpha r}   \\
\frac{\partial \psi}{\partial r} = \alpha e^{-\alpha r} -\alpha^{2} r e^{-\alpha r} \\
\frac{\partial^{2} \psi}{\partial r^{2}} = -2\alpha^{2} e^{-\alpha r} + \alpha^{3} r e^{-\alpha r}
$$

We can then plug these terms into the Schrodinger equation:

$$
\alpha^{2} e^{-\alpha r} - \frac{1}{2}\alpha^{3} r e^{-\alpha r}  - \left(  \frac{1}{r} -\frac{\ell(\ell+1)}{2r^2}  \right)  \alpha r e^{-\alpha r} - E  \alpha r e^{-\alpha r}= 0 \\
\alpha^{2} -  \frac{1}{2} \alpha^{3} r   - \left(  \frac{1}{r} -\frac{\ell(\ell+1)}{2r^2}  \right)  \alpha r =  E  \alpha r
$$
Setting the angular momentum $\ell$ to 0 (as expected for the ground state) and dividing through by $\alpha r$ gives:
$$
\frac{\alpha}{r} -  \frac{\alpha^{2}}{2}    -   \frac{1}{r}    =  E_{0}   
$$

Now, we can solve for $\alpha$ by finding the minimum of $E_{0}$.

What we are going to do is compute the expectation of $E_{0}$ defined by

$$
\langle E_{0} \rangle = \frac{\langle \psi | E_{0} | \psi \rangle}{\langle \psi | \psi \rangle}\\
\langle E_{0} \rangle = \frac{\int_{0}^{\infty}\psi(r)^2 E_{0} dr}{\int_{0}^{\infty}\psi(r)^2  dr}
$$

Let's plot $\psi(r)$ and look at $\langle E_{0} \rangle (\alpha)$. The integral to find $\langle E_{0} \rangle$ will be done by simply generating a set of $r$ values and summing the results. The $\alpha$ dependence is found by finding $\langle E_{0} \rangle$ using a mesh of $r$ and $\alpha$ values.

In [None]:
#>>>RUN: L23.1-runcell01

def psi(ialpha,iR):
    return np.where(iR > 0., ialpha*iR*np.exp(-ialpha*iR), 0.)#np.zeros(iR.shape))

def prob(ialpha,iR,iNorm=-1):
    if iR < 0:
        return 0
    if iNorm == -1:
        rvals=np.arange(0.01,30,0.01)
        iNorm = np.sum(psi(ialpha,rvals)**2*rvals)
    return psi(ialpha,iR)**2/iNorm
    
def minE0(ialpha,iR):
    return np.where(ialpha > 0, ialpha/iR -ialpha**2/2-1./iR, 0.)

def expect(ialpha,irvals=np.arange(0.01,30,0.01)):
    alphas, rvals = np.meshgrid(ialpha, irvals)
    E0=np.sum(psi(alphas,rvals)*minE0(alphas,rvals)*psi(alphas,rvals),axis=0)
    bot=np.sum(psi(alphas,rvals)*psi(alphas,rvals),axis=0)
    return E0/bot

rvals=np.arange(0.01,5,0.01)
plt.plot(rvals,psi(0.5,rvals),label='0.5')
plt.plot(rvals,psi(1.0,rvals),label='1.0')
plt.plot(rvals,psi(1.5,rvals),label='1.5')
plt.xlabel('r')
plt.ylabel('$\psi$')
plt.legend()
plt.show()

alphas=np.arange(0.5,1.5,0.01)
plt.plot(alphas,expect(alphas),label='alpha')
plt.xlabel('alpha')
plt.ylabel('E')
plt.show()



Note that we plotted only 3 examples of the wave function, but found the energies by scanning over $\alpha$ in steps of 0.01.

Now, we can define a variational approach by evolving the wave function over many steps and computing the best fit parameters. The way we evolve the wave function is to note that the probability is given by

$$
p(r) = \psi^{2}(r)
$$

which means that we can create a wave function by taking a variety of samples and evolve them as if they are in a Markov chain with a probability given by this $p(r)$.

To do that, we will follow the chain:
  1. Create some random samples
  
  
  2. Randomly sample a distribution to step
  
  
  3. Compute the change in probabilities
    * define by weight $w_{i+1} = p_{i+1}/p_{i}$
  
  
  4. Sample a flat distribution to get $p_{\rm samp}$
    * Accept the new step if $p_{\rm samp} < w_{i+1}$
  
  
  5. Compute observables like $E_{0}$  

Let's give this a try!

In [None]:
#>>>RUN: L23.1-runcell02

def variational(ialpha=1.0,iR=1.0,iNSteps=2000,iStepSize=0.5,iNSamps=1000):
    Rold=np.random.uniform(0,3,iNSamps)
    for step in range(iNSteps):
        rand=np.random.uniform(-1,1,iNSamps)
        Rnew=Rold+rand*iStepSize
        weight=(psi(ialpha,Rnew)/(psi(ialpha,Rold)+0.01))**2
        randpos=np.random.uniform(0.01,1,iNSamps)
        Rold = np.where(randpos < weight,Rnew,Rold)
    return Rnew,expect(ialpha,Rnew)

def variational_MCint(ialpha=1.0,iR=1.0,iNSteps=2000,iStepSize=0.5,iNSamps=1000):
    Rold=np.random.uniform(0,3,iNSamps)
    for step in range(iNSteps):
        rand=np.random.uniform(-1,1,iNSamps)
        Rnew=Rold+rand*iStepSize
        weight=(psi(ialpha,Rnew)/(psi(ialpha,Rold)+0.01))**2
        randpos=np.random.uniform(0.01,1,iNSamps)
        Rold = np.where(randpos < weight,Rnew,Rold)
    return Rnew,expect(ialpha,Rnew), np.mean(minE0(ialpha,Rold))

R07,E07,E07exp=variational_MCint(0.7)
R10,E10,E10exp=variational_MCint(1.0)
R13,E13,E13exp=variational_MCint(1.3)

_,bins,_ = plt.hist(R07,bins=20,alpha=0.5,label='0.7',density=True)
plt.hist(R10,bins=bins,alpha=0.5,label='1.0',density=True)
plt.hist(R13,bins=bins,alpha=0.5,label='1.3',density=True)
plt.legend()
plt.xlabel('r')
plt.ylabel('N')
plt.show()
print("E",E07,E10,E13)
print("E-MC Integral",E07exp,E10exp,E13exp)

Now vary $\alpha$ and the pdf distribution, using this procedure, except that we need to also include the probabilities for energy in the weight:

$$
p_{E} = e^{-\beta H} \rightarrow w_{i+1} = \frac{p_{i+1}}{p_{i}}\frac{e^{-H_{i+1}}}{e^{-H_{i}}}\\
 w_{i+1} = \frac{\psi^{2}_{i+1}}{\psi^{2}_{i}}e^{H_{i}-H_{i+1}}
$$

In [None]:
#>>>RUN: L23.1-runcell03

def expectFlat(ialpha,irvals=np.arange(0.01,30,0.01)):
    E0=np.sum(psi(ialpha,irvals)*minE0(ialpha,irvals)*psi(ialpha,irvals),axis=0)
    bot=np.sum(psi(ialpha,irvals)*psi(ialpha,irvals),axis=0)
    return E0/bot

def split(iAVals,iEVals):
    arange = np.arange(0,3,0.2)
    aesplit = []
    for i0 in range(len(arange)-1):
        avalL  = arange[i0]
        avalM  = arange[i0+1]
        pEVals = iEVals[(iAVals > avalL) & (iAVals < avalM) ]
        aesplit.append(pEVals)
    return arange, aesplit
        
def variational(iNSteps=25000,iStepSize=0.01,iNSamps=1000):
    alphas=np.random.uniform(0.5,1.4,iNSamps)
    Rold=np.random.uniform(0,3,iNSamps)
    for step in range(iNSteps):
        rand1=np.random.uniform(-1,1,iNSamps)
        Rnew=Rold+rand1*iStepSize
        rand2=np.random.uniform(-1,1,iNSamps)
        alphasNew=alphas+rand2*iStepSize
        weight1=(psi(alphas,Rnew)/(psi(alphas,Rold)+0.01))**2
        #weight1=(psi(alphasNew,Rnew)/(psi(alphas,Rold)+0.01))**2
        deltaE=minE0(alphas,Rold)-minE0(alphasNew,Rnew)
        weight2=np.exp(0.1*deltaE)
        randpos1=np.random.uniform(0.01,1,iNSamps)
        randpos2=np.random.uniform(0.01,1,iNSamps)
        Rold   = np.where(randpos1 < weight1,Rnew,Rold)
        alphas = np.where(randpos2 < weight2,alphasNew,alphas)
    return Rold,minE0(alphas,Rold),alphas

R10,E10,A10=variational()

plt.hist(E10,bins=bins,alpha=0.5)
plt.xlabel('<E> (fitted)')
plt.ylabel('N')
plt.show()

plt.hist(R10,bins=bins,alpha=0.5)
plt.xlabel('r (fitted)')
plt.ylabel('N')
plt.show()

plt.hist(A10,bins=bins,alpha=0.5)
plt.xlabel('alpha (fitted)')
plt.ylabel('N')
plt.show()

plt.plot(A10,E10,'.')
plt.ylim(-10,5)
plt.xlim(0,4)
plt.ylabel('E$_{0}$')
plt.xlabel('alpha')
plt.show()


ar,ae = split(A10,E10)
ac    = 0.5*(ar[:-1] + ar[1:])
plt.violinplot(ae, ac, widths=0.1, showmeans=True,showextrema=True, showmedians=False, bw_method=0.5)
plt.ylabel('E$_{0}$')
plt.xlabel('alpha')
plt.show()

print("E",np.mean(E10))
print("alpha",np.mean(A10))
print("r",np.mean(R10))
for i0,evals in enumerate(ae):
    print("alpha:",ac[i0],"Emin",np.mean(evals),"std:",np.std(evals))

What we plot above are the results of of our toys where we vary $\alpha$ and $r$ following the MCMC chain. By scanning $\alpha$, we see that the variation along the y-axis (E) is large except at the pivot point of 1.0 where we have the exact solution. This is a clear point of stability. However, if we average over our points in bins of $\alpha$ and plot the mean and RMS (error pars), as we do in the violin plot, you can see very clearly the RMS drops to zero. 

While we can see that there is a clear stability in our dataset, this variation is clearly biasing us in the sense that our alpha and energy values are not uniformly populating and we would want to populate the full wave fucntion. Not populating the full wave function at each point is a real issue with MCMC. So what's the solution?

Let's go back a step, and perform a 1D variation in just $E$ so that we populate the wave function for a fixed $\alpha$. This will be function `variationalOne` below. 


In [None]:
#>>>RUN: L23.1-runcell04

def variational(iNSteps=2000,iStepSize=0.5,iNSamps=10000):
    alpharange=np.arange(0.5,1.5,0.1)
    lNSamps = int(iNSamps/len(alpharange))    
    alphas = np.array([])
    for pAlpha in alpharange:
        alrange = pAlpha*np.ones(lNSamps)
        alphas = np.append(alphas,alrange)
    Rold  =np.random.uniform(0,3,iNSamps)
    for step in range(iNSteps):
        rand1=np.random.uniform(-1,1,iNSamps)
        Rnew=Rold+rand1*iStepSize
        weight1=(psi(alphas,Rnew)/(psi(alphas,Rold)+0.01))**2
        randpos1=np.random.uniform(0.01,1,iNSamps)
        Rold   = np.where(randpos1 < weight1,Rnew,Rold)
    return Rold,minE0(alphas,Rold),alphas


def variationalOne(ialpha=1.0,iR=1.0,iNSteps=2000,iStepSize=0.5,iNSamps=1000):
    Rold=np.random.uniform(0,3,iNSamps)
    for step in range(iNSteps):
        rand=np.random.uniform(-1,1,iNSamps)
        Rnew=Rold+rand*iStepSize
        weight=(psi(ialpha,Rnew)/(psi(ialpha,Rold)+0.01))**2
        randpos=np.random.uniform(0.01,1,iNSamps)
        Rold = np.where(randpos < weight,Rnew,Rold)
    return Rnew,expect(ialpha,Rnew)


R10,E10,A10=variational()
R15,E15=variationalOne(1.5)
plt.hist(R10[-1000:],bins=bins,alpha=0.5,label='1.0')
plt.hist(R15,bins=bins,alpha=0.5,label='1.5')
plt.xlabel('r')
plt.ylabel('N')
plt.legend()
plt.show()


ac = np.arange(0.5,1.5,0.1)
ae = np.reshape(E10,(10,1000))
ae1 = []
ae2 = []
for i0 in range(ae.shape[0]):
    pR,pE=variationalOne(ac[i0])
    ae1.append(pE)
    ae2.append(ae[i0])
plt.violinplot(ae1, ac, widths=0.1, showmeans=True,showextrema=True, showmedians=False, bw_method=0.5)
plt.violinplot(ae2, ac, widths=0.1, showmeans=True,showextrema=True, showmedians=False, bw_method=0.5)
plt.ylabel('E$_{0}$')
plt.xlabel('alpha')
plt.ylim(-0.75,-0.25)
plt.show()


Our conclusion from all of this is that MCMC is a good way to populate a wave function. What this really means is that it's a good way to compute an integral without going through the pain of analytically integrating. As we saw above, the key here is to be able to ensure that we populate the phase space with the right probability. Using Metropolis to populate, provided we have a good notion of the ratio of probabilities, is a great way to do this.

<h3>Solving Schrodinger's Equation for the Helium Atom</h3>

Now, in the above, we showed a solution to the Hydrogen atom. It shouldn't come as a surprise to anybody that the wave equation solution to the Hydrogen atom is an exponential of the above form. In light of this, what we can now consider is an approximate solution to the Helium atom schroedinger equation that doesn't have a perfect form. Let's go ahead and construct the schroedinger equation. 


The Helium atom has a potential akin to our n-body equations that we were solving a few Lessons ago. The Hamiltonian for that is :

$$
r_{12} = \left|\vec{r_{1}} - \vec{r_{2}}\right| \\
\mathcal{H} = \frac{p_{1}^2}{2m} + \frac{p_{2}^2}{2m} - \frac{2ke^2}{r_{1}} - \frac{2ke^2}{r_{2}} + \frac{ke^2}{r_{12}}  \\
\hat{\mathcal{H}} = \frac{\hbar}{2m}\frac{\partial^{2} \psi}{\partial r_{1}^{2}} + \frac{\hbar}{2m}\frac{\partial^{2} \psi}{\partial r_{2}^{2}}  - \frac{2ke^2}{r_{1}} - \frac{2ke^2}{r_{2}} + \frac{ke^2}{r_{12}} 
$$

where $\vec{r_{1}}$ and  $\vec{r_{2}}$ are the positions of the electrons.

The solution to this Schroedinger's equation does not have an analytic form, but rather has been approximated through various perturbative methods to get the form of the interactions. The key issue here is that there is this repulsive term between the electrons $r_{12}$ that makes the modelling particularly difficult.

To make a long story short, we can make an intellectual guess for the shape of the solution by assuming an approximate functional form given by:


$$
\psi\left(\vec{r_{1}},\vec{r_{2}}\right) = A \exp\left(-\alpha\left(r_{1} + r_{2}\right)  \right)
$$

As before, we use this proposed solution to find the Hamiltonian (again using natural unit), and then use that to solve for the energy.

This gives us:

$$
E_{1} \psi  = \left(\left(\alpha - Z\right)\left(\frac{1}{r_{1}} + \frac{1}{r_{2}} \right)  + \frac{1}{r_{12}}-\alpha^{2}\right)\psi
$$


This is an example based on the source below:

>source: http://compphysics.github.io/ComputationalPhysics2/doc/pub/vmc/html/vmc-bs.html<br>
>attribution: Morten Hjorth-Jensen <br>
>license: CC Attribution-NonCommercial 4.0 license

Now, let's put these results into code. Since we are now dealing with two particles, our problem goes from a 1 dimensional problem in $r$ to a 2 dimensional problem in the plane of $r_1$ and $r_2$.


In [None]:
#>>>RUN: L23.1-runcell05

#This is the simplified wavefunction without the beta term
def psi(r,alpha,beta): #input r[0,0],r[0,1] = part 1 x and y => r[1,0],r[1,1] = part 2 x and y
    r1 = np.sqrt(r[:,0,0]**2 + r[:,0,1]**2)
    r2 = np.sqrt(r[:,1,0]**2 + r[:,1,1]**2)
    return np.exp(-0.5*alpha*(r1+r2))

def minE0(r,alpha,beta):  
    r1 = np.sqrt(r[:,0,0]**2 + r[:,0,1]**2)
    r2 = np.sqrt(r[:,1,0]**2 + r[:,1,1]**2)
    r12 = np.sqrt((r[:,0,0]-r[:,1,0])**2 + (r[:,0,1]-r[:,1,1])**2)
    #deno = 1.0/(1+beta*r12)
    #deno2 = deno*deno
    return (alpha-2)*(1./r1 + 1./r2) + 1./r12 -alpha**2 #+ 1.0/r12+deno2*(alpha*r12-deno2+2*beta*deno-1.0/r12)


def variationalHe(ialpha=1.0,ibeta=1.0,iR=1.0,iNSteps=10000,iStepSize=0.5,iNSamps=1000):
    Rold = np.zeros((iNSamps,2,2), np.double)
    Rnew = np.zeros((iNSamps,2,2), np.double)
    Rold = np.random.uniform(-1,1,Rold.shape)*iStepSize
    for step in range(iNSteps):
        rand=np.random.uniform(-1,1,Rold.shape)
        Rnew=Rold+rand*iStepSize
        weight=(psi(Rnew,ialpha,ibeta)/(psi(Rold,ialpha,ibeta)+0.01))**2
        randpos=np.random.uniform(0.01,1,iNSamps)
        updaterows = np.where(randpos < weight)
        Rold[updaterows] = Rnew[updaterows]
    return Rnew,minE0(Rnew,ialpha,ibeta)

alphascan = np.arange(0.5,4.0,0.25)
escan = []
for alpha in alphascan:
    print(alpha)
    pR,pE=variationalHe(alpha)
    escan.append(pE)

plt.violinplot(escan, alphascan, widths=0.1, showmeans=True,showextrema=True, showmedians=False, bw_method=0.5)
plt.ylabel('E$_{0}$')
plt.xlabel('alpha')
plt.ylim(-5,2)
plt.show()

pR,pE=variationalHe(alpha)
plt.hist( np.sqrt(pR[:,0,0]**2 + pR[:,0,1]**2) )
plt.xlabel(r'$\sqrt{r_1^2 + r_2^2}$')
plt.ylabel('N')
plt.legend()
plt.show()

plt.show()

Ok from this, we see that roughly 2 for $\alpha$ seems like the right solution, but its not clear what would really be the best solution for this. To do a better job, we need to add more parameters and solve the Hydrogen atom with these additional parameters to get the best solution. 


Now, we can use the full "modern-ish" approximation of the wavefunction for the Helium atom:

$$
\psi\left(\vec{r_{1}},\vec{r_{2}}\right) = A \exp\left(-\alpha\left(r_{1} + r_{2}\right) - \frac{1}{2} \frac{r_{12}}{1+\beta r_{12}} \right)
$$


We can also write the energy for the updated system as:

$$
E_{2} \psi  = E_{1} \psi + \frac{1}{\left(1+\beta r_{12}\right)^{2}}\left( -\frac{1}{r_{12}} + \frac{\beta}{\left(1+\beta r_{12}\right)} - \frac{1}{4}\left(\frac{1}{1+\beta r_{12}}\right)^{2}  + \alpha\hat{r}_{12}\cdot\left(\hat{r_{1}}-\hat{r_{2}}\right) \right)\psi
$$

Now we have this additoinal $\beta$ term and $\alpha$ term that we need to optimize. So we will scan each parameter, populate the wavefucntion, and finally compute our mean expected energy. Let's code this up. Ultimately we are numerically solving the Schroedinger equation for the helium atom. 

In [None]:
#>>>RUN: L23.1-runcell06

#NOTE: running this cell will take several minutes (~15 min timed in Colab)

#This is the full wavefunction with the beta term
def psi(r,alpha,beta): #input r[0,0],r[0,1] = part 1 x and y => r[1,0],r[1,1] = part 2 x and y
    r1 = np.sqrt(r[:,0,0]**2 + r[:,0,1]**2)
    r2 = np.sqrt(r[:,1,0]**2 + r[:,1,1]**2)
    r12 = np.sqrt((r[:,0,0]-r[:,1,0])**2 + (r[:,0,1]-r[:,1,1])**2)
    d = r12/(1+beta*r12)
    return np.exp(-0.5*alpha*(r1+r2)+0.5*d)

def minE0(r,alpha,beta):  
    r1 = np.sqrt(r[:,0,0]**2 + r[:,0,1]**2)
    r2 = np.sqrt(r[:,1,0]**2 + r[:,1,1]**2)
    r12 = np.sqrt((r[:,0,0]-r[:,1,0])**2 + (r[:,0,1]-r[:,1,1])**2)
    dr12hatx = (r[:,0,0]/r1-r[:,1,0]/r2)*(r[:,0,0]-r[:,1,0])/r12#r12x*(r1x-r2x)
    dr12haty = (r[:,0,1]/r1-r[:,1,1]/r2)*(r[:,0,1]-r[:,1,1])/r12#r12y*(r1y-r2y)
    d    = 1.0/(1+beta*r12)
    E1   = (alpha-2)*(1./r1 + 1./r2) + 1./r12 -alpha**2
    E2   = (d**2)*(-1/r12 + beta*d - 0.25*d**2 + 0.5*alpha*(dr12hatx+dr12haty))
    return E1 + E2

def variationalHe(ialpha=1.0,ibeta=1.0,iR=1.0,iNSteps=10000,iStepSize=1.0,iNSamps=1000):
    Rold = np.zeros((iNSamps,2,2), np.double)
    Rnew = np.zeros((iNSamps,2,2), np.double)
    Rold = np.random.uniform(-2,2,Rold.shape)*iStepSize
    Eavg = np.zeros(iNSteps)
    for step in range(iNSteps):
        rand=np.random.uniform(-1,1,Rold.shape)
        Rnew=Rold+rand*iStepSize
        weight=(psi(Rnew,ialpha,ibeta)/(psi(Rold,ialpha,ibeta)))**2
        randpos=np.random.uniform(0.0,1,iNSamps)
        updaterows = np.where(randpos < weight)
        Rold[updaterows] = Rnew[updaterows]
        Eavg[step] = np.mean(minE0(Rold,ialpha,ibeta))
    return Rold,minE0(Rold,ialpha,ibeta),Eavg

alphascan = np.arange(0.5,4.0,0.25)
betascan  = np.arange(0.0,2.0,0.125)
#alphascan = np.arange(0.5,4.0,0.5)
#betascan  = np.arange(0.0,2.0,1.0)
coords = []
escan = []
X, Y = np.meshgrid(alphascan, betascan)
eavg = np.zeros(X.shape)
for i0,beta in enumerate(betascan):
    for i1,alpha in enumerate(alphascan):
        pR,pE,pEAvg=variationalHe(alpha,beta)
        escan.append(pE)
        eavg[i0,i1] = np.mean(pE[-2000:])
        coords.append(np.array([alpha,beta]))


Ok, here we go, we have populated many wave functions. Lets maake some nice plots of our expected energy for $\beta$ and $\alpha$. We can minimize the energy to find the ground state wavefuntion. 

In [None]:
#>>>RUN: L23.1-runcell07

#make the plots from the data generated above!

import matplotlib.cm as cm
from matplotlib.ticker import LinearLocator, FormatStrFormatter

# Prepare for plots
fig = plt.figure()
ax = plt.axes(projection="3d")
# Plot the surface.
surf = ax.plot_surface(X, Y, eavg,cmap=cm.coolwarm,linewidth=0, antialiased=False)
# Customize the z axis.
zmin = np.matrix(eavg).min()
zmax = np.matrix(eavg).max()
ax.set_zlim(zmin, zmax)
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$\beta$')
ax.set_zlabel(r'$\langle E \rangle$')
ax.zaxis.set_major_locator(LinearLocator(10))
ax.zaxis.set_major_formatter(FormatStrFormatter('%.02f'))
# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

# Prepare for plots
fig = plt.figure()
ax = plt.axes()
# Plot the surface.
plt.pcolor(X, Y, eavg,cmap=cm.coolwarm)
# Customize the z axis.
ax.set_xlabel(r'$\alpha$')
ax.set_ylabel(r'$\beta$')
fig.colorbar(surf, shrink=0.5, aspect=5)
plt.show()

<a name='exercises_23_4'></a>     

| [Top](#section_23_0) | [Restart Section](#section_23_4) | [Next Section](#section_23_5) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.1.1</span>

Let's look at our populated wavefunctions for two choices of parameters. Let's fix $\beta$=1 and plot $p$ vs. $r_{12}$ for different values of $\alpha$ (recall the definition $r_{12} = |\vec{r_{1}} - \vec{r_{2}}|$).

Specifically, generate these plots for $\alpha$=2 (lowest energy) and for $\alpha$=4 (roughly the highest energy). What is the expected $r_{12}$ difference between the two electrons for each value $\alpha$? Report both mean $r_{12}$ values as a list of two numbers `[rval_2, rval_4]` with precision 1e-1.

<br>

In [None]:
#>>>EXERCISE: L23.1.1

def plotProperties(ialpha=2,ibeta=1): 
    pR,pE,pEAvg=variationalHe(ialpha,ibeta)
    r1=#your code
    r2=#your cdoe
    r12=#your code
    
    bins=np.arange(0,5,0.5)
    plt.hist(r12,bins=bins,alpha=0.5,density=True)
    plt.xlabel("r$_{12}$")
    plt.ylabel("p")
    plt.show()
    print("Corr:",np.corrcoef(r1,r2),"Mean: r12",np.mean(r12),"E-avg (at end):",pEAvg[-1])

    rvals=np.vstack((r1,r2)).T
    fig = corner.corner(rvals,show_titles=True,labels=['r$_1$','r$_2$'],plot_datapoints=True,quantiles=[0.16, 0.5, 0.84])
    plt.show()


plotProperties(2,1)
plotProperties(4,1)

<a name='section_23_5'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L23.2 MCMC Using a Professional Sampler</h2>  

| [Top](#section_23_0) | [Previous Section](#section_23_4) | [Exercises](#exercises_23_5) | [Next Section](#section_23_6) |

*The material in this section is discussed in the video **<a href="https://courses.mitxonline.mit.edu/learn/course/course-v1:MITxT+8.S50.3x+3T2023/block-v1:MITxT+8.S50.3x+3T2023+type@sequential+block@seq_LS23/block-v1:MITxT+8.S50.3x+3T2023+type@vertical+block@vert_LS23_vid2" target="_blank">HERE</a>.** You are encouraged to watch that video and use this notebook concurrently.*

<h3>Overview</h3>

In the interest of showing MCMC in the wild, we are going to show how to use the `bilby` package, which was originally written as part of the LIGO experiment to analyze gravitational wave data  (see more details <a href="https://arxiv.org/abs/1811.02042">here</a> and <a href="https://lscsoft.docs.ligo.org/bilby/">here</a>). While there are other packages that exist, we thought this one would be good considering the related Project from 8.S50.1x.

For this package, we are first going to do a very basic example of fitting a line. What we will do is define the likelihood of fitting a line with the formula

$$
f(x) = m x + c \\
\mathcal{L} = \exp\left(-\sum_{i} \left(\frac{f(x_{i})-y_{i}}{2\sigma_{i}}\right)^{2}\right)
$$

The above is written in `bilby` using a standardized function `GaussianLikelihood`.

Let's go ahead and setup the problem.


In [None]:
#>>>RUN: L23.2-runcell01

def model(x, m, c):
    return m * x + c

#make some toy data mx+c (m=1,c=0) smear y with sigma=0.1
m = 1
c = 0
sigma = 0.1
N = 100
x = np.linspace(0, 1, N)
y = model(x, m, c) + np.random.normal(0, sigma, N)

#Now compute likelihood
likelihood = bilby.core.likelihood.GaussianLikelihood(x, y, model,sigma=0.1)
likelihood.parameters['m'] = 0.9
likelihood.parameters['c'] = 0.1
print(likelihood.log_likelihood())    

plt.errorbar(x,y,yerr=0.1*np.ones(len(y)),marker='o',linestyle='dotted')
plt.xlabel('x-value')
plt.ylabel('y-value')
plt.show()



Now, to do a linear fit with an MCMC, we are going to have to make proposals for what to do. To make some proposals, all we need to do is define a bunch of priors. In this case, we use uniform distributions from 0 to 5 and -2 to 2 for the slope and intercept, respectively. For a simple task like fitting a line, this is clearly overkill. However, we want to illustrate how this works in a simple case before we go to cases where the likelihood is intractable, and we really need to run a minimizer that doesn't compute derivatives. 

In [None]:
#>>>RUN: L23.2-runcell02

#NOTE: running this cell will take several minutes

priors = dict()
priors['m'] = bilby.core.prior.Uniform(0, 5, 'm')
priors['c'] = bilby.core.prior.Uniform(-2, 2, 'c')
priors['sigma'] = sigma

#uses dlogz=1
full_result = bilby.run_sampler(likelihood=likelihood, priors=priors, sampler='dynesty', npoints=5000,outdir='test',dlogz=1, label='full')
full_result.plot_corner()

From the above, we see clearly that we get the usual best fit slope of 1 and constant of 0. What we also see is that since we have toys, we can plot the detailed correlation between c and m. This naturally extends to many more parameters allowing us to make the full covariance matrix of a many parameter fit without how having to systematically scan an N-dimensional space as we would do with a profile likelihood approach. 

Lets try to extend the fit to more parameters. What happens if we try fitting the data with a function which has a quadratic term, i.e. one that was not included in the random generation of the data? Lets see how our covariance matrix looks with this fit if we go forward. 

In [None]:
#>>>RUN: L23.2-runcell03

def model1(x,m2,m1,c):
    return m2*x**2+m1*x+c

likelihood = bilby.core.likelihood.GaussianLikelihood(x, y, model1,sigma=0.1)

priors = dict()
priors['c'] = bilby.core.prior.Uniform(-0.05, 0.05, 'c') #range change compared to video
priors['m1'] = bilby.core.prior.Uniform(0, 6, 'm1')
priors['m2'] = bilby.core.prior.Uniform(-2, 2, 'm2')
priors['sigma'] = 0.1
partial_result = bilby.run_sampler(likelihood=likelihood, priors=priors, sampler='dynesty', dlogz=1, npoints=500,outdir='tmp', label='tmpquad',check_point=False
)
partial_result.plot_corner()



Again, the fit results are what we expected, with a slope close to 1, an intercept close to 0, and the nonexistent quadratic dependence giving a value close to 0. Interestingly, adding this extra term to the fit changes the correlation behavior dramatically. Before, when the slope was shifted a little from the correct value, a reasonably close fit could be found by a corresponding change to the intercept. In this case, changes in the slope are instead compensated by a small quadratic term, and the correlation of the intercept with the other two parameters is very weak.

Now that we've demonstrated how to build such fits, you can  clearly see that this is definitely not the most efficient way to fit a line. However, it does do the job and it does give us MC results that we can use to further analyze.

<a name='exercises_23_5'></a>     

| [Top](#section_23_0) | [Restart Section](#section_23_5) | [Next Section](#section_23_6) |


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.2.1</span>

Ok, so to get confortable with advanced MCMC fitters, we are going to fit a similar datset to the one we fit in the previous lesson, a top hat function. As we did before, let's highlight several steps.

<h3>Step 1: Generate Data</h3>

Generate two sets of random data, that will ultimately create a top-hat shape. Run the following code to visualize this shape (note, this plotting code is not included in the code checker, but it is in the related notebook cell):

<pre>
#generate data
np.random.seed(0)
vals=np.random.rand(1000)*10
vals=np.append(vals,np.random.rand(1000)*5 + 2.5)
hist,bin_edges=np.histogram(vals,bins=np.arange(0,10.25,0.25))
bin_centers=0.5*(bin_edges[:-1]+bin_edges[1:])
plt.errorbar(bin_centers,hist,np.sqrt(hist),fmt='o', color='k')
plt.xlabel("x")
plt.ylabel("events")
plt.show()
</pre>


<h3>Step 2: Define a Fit Function</h3>

**Here is where the question comes in.** Now define a fit function with 3 parameters and a constant, like the following:

$$
f(x,a_0,x_0,c) = c + a_0 \theta(x > (5-x_0)) - a_0 \theta(x > (5+x_0))
$$

where $\theta$ is the heaviside step function. Use the answer-checker below to submit your code.

<br>

In [None]:
#>>>EXERCISE: L23.2.1

#Generate the data -> just run this code if 
#you want to see what it looks like first
#-----------------------------------------------
np.random.seed(0)
vals=np.random.rand(1000)*10
vals=np.append(vals,np.random.rand(1000)*5 + 2.5)
hist,bin_edges=np.histogram(vals,bins=np.arange(0,10.25,0.25))
bin_centers=0.5*(bin_edges[:-1]+bin_edges[1:])
plt.errorbar(bin_centers,hist,np.sqrt(hist),fmt='o', color='k')
plt.xlabel("x")
plt.ylabel("events")
plt.show()


#define the step-function used for fitting
#-----------------------------------------------
def model(x,a0,x0,c):
    val = ### Your formula
    return val


#optionally, plot the fit function for some random parameter choices:
#-----------------------------------------------

# Define the range of x values
x_values = np.linspace(1, 10, 1000)  # x goes from 1 to 10 with 1000 points

# Define the parameters for the model
a0 = 2    # Amplitude of the step
x0 = 1    # Offset from the center (x=5)
c = 0     # Constant offset

# Calculate the output of the function for these parameters
y_values = model(x_values, a0, x0, c)

# Plot the result
plt.figure(figsize=(10, 6))
plt.plot(x_values, y_values, label=f'a0={a0}, x0={x0}, c={c}')
plt.xlabel('x')
plt.ylabel('Function Value')
plt.title('Plot of the model function')
plt.legend()
plt.grid(True)
plt.show()

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.2.2</span>

<h3>Step 3: Perform the Fit</h3>

Now we will run the fit using `bilby`. Complete the code below to perform the fit. To do this you must:

- Define a set of uniform priors for the parameters as follows: `c` over the range `[0,100]`, `x0` over the range `[0,10]`, `a0` over the range `[0,100]`. One of these is done for you!

- Define the uncertainties on the parameters. Again, one of these is done for you.

Note, to get the fit to work, we set `dlogz=10`. This is the change in the log likelihood per MC step, and ideally we want to get this to be below 1, but that can take quite a long time. Still, the code will take about 10 min to run (timed in Colab).

After running the fit, report the uncertainty that you find on `x0` as a number with precision `1e-2`.  

<br>

In [None]:
#>>>EXERCISE: L23.2.2

likelihood = bilby.core.likelihood.GaussianLikelihood(bin_centers, hist, model,sigma=np.sqrt(hist))
priors['c']  = bilby.core.prior.Uniform(0, 100, 'c')
priors['x0'] = #YOUR CODE HERE
priors['a0'] = #YOUR CODE HERE

partial_result = bilby.run_sampler(likelihood=likelihood, priors=priors, sampler='dynesty', dlogz=10, npoints=5000,outdir='t0', label='base2',check_point=False)
partial_result.plot_corner()
#partial_result.plot_with_data(model, bin_centers, hist)

cbst  = partial_result.posterior["c"].mean()
a0bst = partial_result.posterior["a0"].mean()
x0bst = partial_result.posterior["x0"].mean()

cunc  = partial_result.posterior["c"].std()
a0unc = #YOUR CODE HERE
x0unc = #YOUR CODE HERE

print("c:",  cbst,"+/-",cunc)
print("a0:",a0bst,"+/-",a0unc)
print("x0:",x0bst,"+/-",x0unc)


yvals=model(bin_centers,a0bst,x0bst,cbst)
plt.errorbar(bin_centers,hist,np.sqrt(hist),fmt='o', color='k')
plt.plot(bin_centers,yvals,c='r')
plt.xlabel("x")
plt.ylabel("events")
plt.show()

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.2.3</span>

Alright now lets generalize this to have more parameters and examine their correlations. Update the function with the form below: 

$$
f(x,a0,x0,c) = c + a0 \theta(x > x0) + a1 \theta(x > x1)
$$

where $\theta$ is the heaviside step function. Use the answer-checker below to submit your code. 


<br>

In [None]:
#>>>EXERCISE: L23.2.3

#Generate the data -> just run this code if 
#you want to see what it looks like first
#-----------------------------------------------
np.random.seed(0)
vals=np.random.rand(1000)*10
vals=np.append(vals,np.random.rand(1000)*5 + 2.5)
#vals=np.append(vals,np.random.rand(1000)*2+4.)
hist,bin_edges=np.histogram(vals,bins=np.arange(0,10.25,0.25))
bin_centers=0.5*(bin_edges[:-1]+bin_edges[1:])
plt.errorbar(bin_centers,hist,np.sqrt(hist),fmt='o', color='k')
plt.xlabel("x")
plt.ylabel("events")
plt.show()


#define the step-function used for fitting
#-----------------------------------------------
def model(x,a0,x0,a1,x1,c):
    val= #YOUR CODE HERE
    return val


#optionally, plot the fit function for some random parameter choices:
#-----------------------------------------------

# Define the range of x values
x_values = np.linspace(1, 10, 1000)  # x goes from 1 to 10 with 1000 points

# Define the parameters for the model
a0 = 2    # Amplitude of the first step
x0 = 4    # Position of the first step
a1 = -3   # Amplitude of the second step
x1 = 7    # Position of the second step
c = 0.5   # Constant offset

# Calculate the output of the function for these parameters
y_values = model(x_values, a0, x0, a1, x1, c)

# Plot the result
plt.figure(figsize=(10, 6))
plt.plot(x_values, y_values, label=f'a0={a0}, x0={x0}, a1={a1}, x1={x1}, c={c}')
plt.xlabel('x')
plt.ylabel('Function Value')
plt.title('Plot of the new model function')
plt.legend()
plt.grid(True)
plt.show()


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.2.4</span>

Now we will run the fit using `bilby`. Complete the code below to perform the fit. To do this you must:

- Define a set of uniform priors for the parameters as follows: `c` over the range `[0, 100]`, `x0` over the range `[0, 10]`, `x1` over the range `[0, 10]`, `a0` over the range `[0, 100]`, and `a1` over the range `[-100, 0]`.

- Define the uncertainties on the parameters.

To get the fit to work, set `dlogz=10`. Additionally, we will add a constraint requiring `x1 > x0` to do this, we create a new variable `x10=x1-x0` and we put a constriant on this. 

After running the code, which varaibles appear correlated in the corner plot? Select ALL that apply.

A) `x0` and `c`\
B) `x1` and `c`\
C) `a0` and `c`\
D) `a1` and `c`\
E) `x0` and `x1`\
F) `x0` and `a0`\
G) `x0` and `a1`\
H) `x1` and `a0`\
I) `x1` and `a1`\
J) `a0` and `a1`



<br>

In [None]:
#>>>EXERCISE: L23.2.4

from bilby.core.prior import PriorDict, Uniform, Constraint

likelihood = bilby.core.likelihood.GaussianLikelihood(bin_centers, hist, model,sigma=np.sqrt(hist))
#likelihood = bilby.core.likelihood.PoissonLikelihood(bin_centers, hist, model,sigma=np.sqrt(hist))

def conv_constraint(parameters):
    parameters['x10'] = parameters['x1'] - parameters['x0']
    return parameters

priors = PriorDict(conversion_function=conv_constraint)
priors['c']  = #YOUR CODE HERE
priors['x0'] = #YOUR CODE HERE
priors['x1'] = #YOUR CODE HERE
priors['a0'] = #YOUR CODE HERE
priors['a1'] = #YOUR CODE HERE
priors['x10'] = Constraint(minimum=0, maximum=10)

partial_result = bilby.run_sampler(likelihood=likelihood, priors=priors, sampler='dynesty', dlogz=10, npoints=5000,outdir='t0', label='base_big1',check_point=False)
partial_result.plot_corner()


cbst  = partial_result.posterior["c"].mean()
a0bst = partial_result.posterior["a0"].mean()
x0bst = partial_result.posterior["x0"].mean()
a1bst = partial_result.posterior["a1"].mean()
x1bst = partial_result.posterior["x1"].mean()

cunc  = #YOUR CODE HERE
a0unc = #YOUR CODE HERE
x0unc = #YOUR CODE HERE
a1unc = #YOUR CODE HERE
x1unc = #YOUR CODE HERE

print("c:",  cbst,"+/-",cunc)
print("a0:",a0bst,"+/-",a0unc)
print("x0:",x0bst,"+/-",x0unc)
print("a1:",a1bst,"+/-",a1unc)
print("x1:",x1bst,"+/-",x1unc)

yvals=model(bin_centers,a0bst,x0bst,a1bst,x1bst,cbst)
plt.errorbar(bin_centers,hist,np.sqrt(hist),fmt='o', color='k')
plt.plot(bin_centers,yvals,c='r')
plt.xlabel("x")
plt.ylabel("events")
plt.show()

partial_result.plot_corner()

<a name='section_23_6'></a>
<hr style="height: 1px;">

## <h2 style="border:1px; border-style:solid; padding: 0.25em; color: #FFFFFF; background-color: #90409C">L23.3 Gravitational Waves </h2>  

| [Top](#section_23_0) | [Previous Section](#section_23_5) | [Exercises](#exercises_23_6) |

*The material in this section is discussed in the video **<a href="https://courses.mitxonline.mit.edu/learn/course/course-v1:MITxT+8.S50.3x+3T2023/block-v1:MITxT+8.S50.3x+3T2023+type@sequential+block@seq_LS23/block-v1:MITxT+8.S50.3x+3T2023+type@vertical+block@vert_LS23_vid3" target="_blank">HERE</a>.** You are encouraged to watch that video and use this notebook concurrently.*

<h3>Overview</h3>

If you took the first course in this sequence (8.S50.1x), you might remember that the fit was not particularly good in the gravitational wave Project. That is because the full fit really involves using real gravitational waves, not the functional form approximation. For LIGO, they have developed a MCMC sampler that is built on the emcee library we have used above. In practice, the procedure follows exactly what we have done above.  However, now we need to have an approach to minimize the likelihood and perform parameter estimation.

To do this, we are going to perform a very basic MCMC fit of the black hole masses of the 8.S50.1 Project, just to see how this works.

Note that the `bilby` library includes the waveforms for gravitational wave along with the wrapper for performing MCMC fits.


Now before we do anything, let's learn how to generate our own waveform. With `bilby` it is relatively straightforward to generate waveforms. In fact, we can make a function to generate waveforms. For the masses of the two merging black holes, we use (50, 50), (10, 50) and (100, 50) in units of solar masses.


In [None]:
#>>>RUN: L23.3-runcell01

duration = 4.#4s
sampling_frequency = 2048.#Hz
waveform_arguments = {
    'waveform_approximant': 'IMRPhenomPv2',
    'reference_frequency': 50.,  # most sensitive frequency
    'minimum_frequency': 20.
}
waveform_generator = bilby.gw.WaveformGenerator(
    duration=duration, sampling_frequency=sampling_frequency,
    parameter_conversion=convert_to_lal_binary_black_hole_parameters,
    frequency_domain_source_model=lal_binary_black_hole,
    waveform_arguments=waveform_arguments)

def generateWaveForm(iM1,iM2):
    injection_parameters = dict(
    mass_1=iM1, mass_2=iM2, a_1=0., a_2=0., tilt_1=0., tilt_2=0.,
    phi_12=0., phi_jl=0., luminosity_distance=500, theta_jn=0., psi=0.,
    phase=0.2, geocent_time=1243309096, ra=0., dec=0.)
    polarizations_td = waveform_generator.time_domain_strain(injection_parameters)
    return polarizations_td

def shiftplus(iPolar,sampling,duration):
    plus_td  = np.roll(iPolar['plus'],  int(sampling * duration/2.))
    cross_td = np.roll(iPolar['cross'], int(sampling * duration/2.))
    return plus_td
    
polarizations_td0 = generateWaveForm(50,50)
polarizations_td1 = generateWaveForm(10,50)
polarizations_td2 = generateWaveForm(100,50)

plus_td0=shiftplus(polarizations_td0,sampling_frequency,duration)
plus_td1=shiftplus(polarizations_td1,sampling_frequency,duration)
plus_td2=shiftplus(polarizations_td2,sampling_frequency,duration)

time = np.linspace(0.,duration,len(plus_td0))

plt.plot(time, plus_td0, label='plus 50,50')
plt.plot(time, plus_td1, label='plus 10,50')
plt.plot(time, plus_td2, label='plus 100,50')
plt.legend()
plt.show()

plt.plot(time, plus_td0, label='plus 50,50')
plt.plot(time, plus_td1, label='plus 10,50')
plt.plot(time, plus_td2, label='plus 100,50')
plt.legend()
plt.xlim(1.7,2.1)
plt.show()

# And their ASD
def asd(its):
    NFFT = int(4 * sampling_frequency)
    freq, plus_psd = sig.welch(its, fs=sampling_frequency, nperseg=NFFT)
    plus_asd = np.sqrt(plus_psd)
    return freq, plus_asd

freq, plus_asd0 = asd(plus_td0)
freq, plus_asd1 = asd(plus_td1)
freq, plus_asd2 = asd(plus_td2)

fig = plt.figure(figsize=(8, 5))
plt.loglog(freq, plus_asd0, label='Plus')
plt.loglog(freq, plus_asd1, label='Plus')
plt.loglog(freq, plus_asd2, label='Plus')
plt.xlim(10, 1024)
plt.xlabel('Frequency [Hz]')
plt.ylabel('ASD')
plt.legend()
plt.show()


The first two plots are the time series data (with the second plot zoomed in), while the third plot show the Fourier power spectrum versus frequency. The low frequency cutoff at 20 Hz is where the detector output gets filtered. Shortly before the merger, the frequency gets higher and higher (the so-called "chirp") with decreasing amplitude, a trend which cuts off abruptly at the time of merger itself.

Now that we have created a functional form, we need a likelihood to do a fit. The best likelihood for gravitational wave data is effectively a $\chi^{2}$, but in the Fourier domain:

$$
\log\left(\mathcal{L}\right) = -\frac{1}{2}\sum_{i} \left(\frac{d_{i}-\mu\left(\vec{\theta}\right)}{\sigma_{i}}\right)^{2} - \frac{1}{2}\log\left(2\pi\sigma_{i}^2\right)
$$

where $d_{i}$ is the amplitude in a specific Fourier bin, $\sigma_{k}$ is the uncertainty in that bin due to noise in the detector,  and $\mu\left(\vec{\theta}\right)$ is our predicted waveform. The noise is found by generating the ASD power spectrum for a region close in time to the merger. This drives our whole fit, now using the full library for the waveform.

First, we need fetch the data and prep it to be fit. More details of this procedure were given in the 8.S50.1 Project, but here we will go through this quickly.

In [None]:
#>>>RUN: L23.3-runcell02

from gwpy.timeseries import TimeSeries

time_of_event = 1126259462.413

H1 = bilby.gw.detector.get_empty_interferometer("H1")
L1 = bilby.gw.detector.get_empty_interferometer("L1")

# Definite times in relatation to the trigger time (time_of_event), duration and post_trigger_duration
post_trigger_duration = 2
duration = 4
analysis_start = time_of_event + post_trigger_duration - duration
#ok so get the data we are going to use for the fit
H1_analysis_data = TimeSeries.fetch_open_data("H1", analysis_start, analysis_start + duration, sample_rate=4096, cache=True)
L1_analysis_data = TimeSeries.fetch_open_data("L1", analysis_start, analysis_start + duration, sample_rate=4096, cache=True)

# Use gwpy to fetch the open data and get the data around the time to compute the psd
psd_duration = duration * 32
psd_start_time = analysis_start - psd_duration
H1_psd_data = TimeSeries.fetch_open_data( "H1", psd_start_time, psd_start_time + psd_duration, sample_rate=4096, cache=True)
L1_psd_data = TimeSeries.fetch_open_data( "L1", psd_start_time, psd_start_time + psd_duration, sample_rate=4096, cache=True)
plt.plot(H1_psd_data)
plt.plot(L1_psd_data)
plt.plot(H1_analysis_data)
plt.plot(L1_analysis_data)
plt.show()

#Set this to our model
H1.set_strain_data_from_gwpy_timeseries(H1_analysis_data)
L1.set_strain_data_from_gwpy_timeseries(L1_analysis_data)

#Now compute the PSDs and set this 
psd_alpha = 2 * H1.strain_data.roll_off / duration
H1_psd = H1_psd_data.psd(fftlength=duration, overlap=0, window=("tukey", psd_alpha), method="median")
L1_psd = L1_psd_data.psd(fftlength=duration, overlap=0, window=("tukey", psd_alpha), method="median")
H1.power_spectral_density = bilby.gw.detector.PowerSpectralDensity(frequency_array=H1_psd.frequencies.value, psd_array=H1_psd.value)
L1.power_spectral_density = bilby.gw.detector.PowerSpectralDensity(frequency_array=L1_psd.frequencies.value, psd_array=L1_psd.value)

fig, ax = plt.subplots()
idxs = H1.strain_data.frequency_mask  # This is a boolean mask of the frequencies which we'll use in the analysis
ax.loglog(H1.strain_data.frequency_array[idxs],np.abs(H1.strain_data.frequency_domain_strain[idxs]))
ax.loglog(H1.power_spectral_density.frequency_array[idxs], H1.power_spectral_density.asd_array[idxs])
ax.set_title("Hanford")
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("Strain [strain/$\sqrt{Hz}$]")
plt.show()

H1.maximum_frequency = 1024
L1.maximum_frequency = 1024

fig, ax = plt.subplots()
idxs = L1.strain_data.frequency_mask  # This is a boolean mask of the frequencies which we'll use in the analysis
ax.loglog(L1.strain_data.frequency_array[idxs],np.abs(L1.strain_data.frequency_domain_strain[idxs]))
ax.loglog(L1.power_spectral_density.frequency_array[idxs], L1.power_spectral_density.asd_array[idxs])
ax.set_title("Livingston")
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("Strain [strain/$\sqrt{Hz}$]")
plt.show()



As before, we plot the time series data for the detectors at Hanford and Livingston followed by the power spectra. In the time series plots, the red and green sections are the data for the 4 seconds centered at the merger time, and the earlier results are used to find the noise. For the frequency plots, the blue is for the data near the merger, while the orange is the noise spectrum.

Now, with all of that done, let's go ahead and create our likelihood object and our fit function. Because this code is tailor-made for LIGO, it will be relatively easy to put it all together. However, there is a lot under the hood that we are not covering here.


In [None]:
#>>>RUN: L23.3-runcell03

# First, put our "data" created above into a list of interferometers (the order is arbitrary)
interferometers = [H1, L1]

# Next create a dictionary of arguments which we pass into the LALSimulation waveform - we specify the waveform approximant here
waveform_arguments = dict(waveform_approximant='IMRPhenomPv2', 
                          reference_frequency=100., catch_waveform_errors=True)

# Next, create a waveform_generator object. This wraps up some of the jobs of converting between parameters etc
waveform_generator = bilby.gw.WaveformGenerator(
    frequency_domain_source_model=bilby.gw.source.lal_binary_black_hole,
    waveform_arguments=waveform_arguments,
    parameter_conversion=convert_to_lal_binary_black_hole_parameters)


Finally, let's add all of our priors and run the fit! Notice that this first fit randomly samples only 3 parameters, the chirp mass, the mass ratio, and the event time. The chirp mass and mass ratio yield the mass of the two mergers, re-parameterzed, the event time is well deined. All of the other parameters are set to fixed values. Even with this limited variation in parameter space, the fit takes some time. These parameters include details about the black hole properties, as well as sky loclization (RA/DEC), and distance away. Details about various parameters can be found <a href="https://academic.oup.com/mnras/article/499/3/3295/5909620" target="_blank">here.</a>

Note that if the fit is very long, please adjust dlogz to be larger, this is the stopping point for change in significance within a time step. 

In [None]:
#>>>RUN: L23.3-runcell04

#NOTE: running this cell will take several minutes (~30 min timed in Colab)

from bilby.core.prior import Uniform

prior = bilby.core.prior.PriorDict()
prior['chirp_mass'] = Uniform(name='chirp_mass', minimum=25.0,maximum=35.5)
prior['mass_ratio'] = Uniform(name='mass_ratio', minimum=0.5, maximum=1)
prior['phase']        = 1.3#Uniform(name="phase", minimum=0, maximum=2*np.pi)
prior['geocent_time'] = Uniform(name="geocent_time", minimum=time_of_event-0.1, maximum=time_of_event+0.1)
prior['a_1'] =  0.0
prior['a_2'] =  0.0
prior['tilt_1'] =  0.0
prior['tilt_2'] =  0.0
prior['phi_12'] =  0.0
prior['phi_jl'] =  0.0
prior['dec'] =  -1.2232
prior['ra'] =  2.19432
prior['theta_jn'] =  1.89694
prior['psi'] =  0.532268
prior['luminosity_distance'] = 412.066

# Finally, create our likelihood, passing in what is needed to get going
likelihood = bilby.gw.likelihood.GravitationalWaveTransient(interferometers, waveform_generator, priors=prior,
    time_marginalization=True, phase_marginalization=False, distance_marginalization=False)

result_short = bilby.run_sampler(
    likelihood, prior, sampler='dynesty', outdir='short3', label="GW150914",
    conversion_function=bilby.gw.conversion.generate_all_bbh_parameters,
    sample="unif", nlive=500, dlogz=3  # <- Arguments are used to make things fast - not recommended for general use
)

Now, we can look at the parameters and start to learn some details about the best fit gravitational wave, first by plotting the posteriors of our fit.

In [None]:
#>>>RUN: L23.3-runcell05

result_short.plot_corner(parameters=["chirp_mass", "mass_ratio", "geocent_time"], prior=True)
#plt.show()
#print(result_short.posterior)

Notice now that our chirp mass is spot on! When we ran a functional form fit in the past classes, we weren't getting the right chirp mass.  Why is this this so much better than an approximate functional form? Well this is really because we are finally using the actual Gravitational Waveform that defines our fit, this captures all the general relativistic effects that happen right close to the merger. Let's go ahead and look at the best fit for this guy.


Alright now, lets look at this in terms of gravitational wave variables. We will show the signal in Fourier space by computing hte ASD transofrm of the data. Additionally, in blue we will show the signal so you can see how the sensitiviey compares. Also, we can show the fit in time space, you can see we capture many of the salient features. 


In [None]:
#>>>RUN: L23.3-runcell06

from bilby.gw.result import CBCResult

cbc_result = CBCResult.from_json("short3/GW150914_result.json")
for ifo in interferometers:
    cbc_result.plot_interferometer_waveform_posterior(
        interferometer=ifo, n_samples=500, save=False
    )
    plt.show()
    plt.close()
    
print(result_short)
#result_short.plot_marginals()


Now that we got some fit that works, we can try probing other different parameters, to get more informations. Lets try see how far this graivtational wave event is. Gravitataional wave amplitudes decrease by $1/r$, so by noting the amplitude along with the masses of the objects, and our knowledge of GR we can get a measure of the distance, often referred to the as the luminosity distance to reflect the fact that the universe is expanding. 

What about trying to fit for the luminosity distance?

We know that the probability of finding something further away will be given by a power law. In this instance, our function will be defined by

$$
p(x) = A x^{-2}
$$

We can inject this sampling prior to get the distance of the merger. Furthermore, we sample the relative phase of the merger to improve the overal quality of the fit.  

In [None]:
#>>>RUN: L23.3-runcell07

#NOTE: running this cell will take several minutes (~30 min timed in Colab)

prior = bilby.core.prior.PriorDict()
prior['chirp_mass']   = Uniform(name='chirp_mass', minimum=25.0,maximum=35.5)
prior['mass_ratio']   = Uniform(name='mass_ratio', minimum=0.5, maximum=1)
prior['phase']        = Uniform(name="phase", minimum=0, maximum=2*np.pi)
prior['geocent_time'] = Uniform(name="geocent_time", minimum=time_of_event-0.1, maximum=time_of_event+0.1)
prior['a_1'] =  0.0
prior['a_2'] =  0.0
prior['tilt_1'] =  0.0
prior['tilt_2'] =  0.0
prior['phi_12'] =  0.0
prior['phi_jl'] =  0.0
prior['dec'] =  -1.2232
prior['ra'] =  2.19432
prior['theta_jn'] =  1.89694
prior['psi'] =  0.532268
prior['luminosity_distance'] = bilby.core.prior.PowerLaw(alpha=2., minimum=50., maximum=800., name='luminosity_distance')

# Finally, create our likelihood, passing in what is needed to get going
likelihood = bilby.gw.likelihood.GravitationalWaveTransient(interferometers, waveform_generator, priors=prior,
    time_marginalization=True, phase_marginalization=True, distance_marginalization=True)

result_short = bilby.run_sampler(
    likelihood, prior, sampler='dynesty', outdir='short_dist', label="GW150914",
    conversion_function=bilby.gw.conversion.generate_all_bbh_parameters,
    sample="unif", nlive=500, dlogz=3  # <- Arguments are used to make things fast - not recommended for general use
)

In [None]:
#>>>RUN: L23.3-runcell08

result_short.plot_corner(parameters=["chirp_mass", "mass_ratio", "geocent_time","luminosity_distance","phase"], prior=True)


The bottom row of plots show an example where the correlation and the 1D probability distributions are critical. As you can see, the listed final value of 2.11 is incorrect because there are actually two peaks. This occurs because of a phase ambiguity, where two values separated by $\pi$ cannot be distinguished. It's also interesting that this expanded fit shows much weaker correlations between the 3 parameters used previously. The apparent "cut-off" in the plots for the mass ratio $q$ occur because there is no way to tell which black hole is which, and so mass ratios larger than 1 are excluded. 

Ok lets visualize the final fit. You can see it pretty much looks like the other fit. 

In [None]:
#>>>RUN: L23.3-runcell09

from bilby.gw.result import CBCResult

cbc_result = CBCResult.from_json("short_dist/GW150914_result.json")
for ifo in interferometers:
    cbc_result.plot_interferometer_waveform_posterior(
        interferometer=ifo, n_samples=500, save=False
    )
    plt.show()
    plt.close()
    
print(cbc_result.log_likelihood_evaluations)
#result_short.plot_marginals()

In [None]:
#>>>RUN: L23.3-runcell10

plt.hist(cbc_result.log_likelihood_evaluations)
plt.show()
print(cbc_result.log_bayes_factor)

<a name='exercises_23_6'></a>     

| [Top](#section_23_0) | [Restart Section](#section_23_6) | 


### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.3.1</span>

Now let's fix the mass ratio to 0.5 instead of floating it. Run the code below to find the chirp mass and report your answer as a number with a precision of one solar mass unit.

<br>

In [None]:
#>>>EXERCISE: L23.3.1

from bilby.core.prior import Uniform

prior = bilby.core.prior.PriorDict()
prior['chirp_mass'] = Uniform(name='chirp_mass', minimum=25.0,maximum=35.5)
prior['mass_ratio'] = #YOUR CODE HERE
prior['phase']        = 1.3#Uniform(name="phase", minimum=0, maximum=2*np.pi)
prior['geocent_time'] = Uniform(name="geocent_time", minimum=time_of_event-0.1, maximum=time_of_event+0.1)
prior['a_1'] =  0.0
prior['a_2'] =  0.0
prior['tilt_1'] =  0.0
prior['tilt_2'] =  0.0
prior['phi_12'] =  0.0
prior['phi_jl'] =  0.0
prior['dec'] =  -1.2232
prior['ra'] =  2.19432
prior['theta_jn'] =  1.89694
prior['psi'] =  0.532268
prior['luminosity_distance'] = 412.066

# Finally, create our likelihood, passing in what is needed to get going
likelihood = bilby.gw.likelihood.GravitationalWaveTransient(interferometers, waveform_generator, priors=prior,
    time_marginalization=True, phase_marginalization=False, distance_marginalization=False)

result_short = bilby.run_sampler(
    likelihood, prior, sampler='dynesty', outdir='short4', label="GW150914",
    conversion_function=bilby.gw.conversion.generate_all_bbh_parameters,
    sample="unif", nlive=500, dlogz=3  #Arguments are used to make things fast - not recommended for general use
)

result_short.plot_corner(parameters=["chirp_mass", "geocent_time"], prior=True)

### <span style="border:3px; border-style:solid; padding: 0.15em; border-color: #90409C; color: #90409C;">Exercise 23.3.2</span>

The choice of priors yields different interpretations of gravitational wave events. Lets re-run the fit that we performed earlier in this section, using a flat prior for the luminosity distance. Recall our previous result for the luminosity distance was about 299 Mpc, using a power law.

How does a flat prior change our result for the luminosity distance that is found? Report your answer with a precision of one Mpc.

<br>

In [None]:
#>>>EXERCISE: L23.3.2

#NOTE: running this cell will take several minutes

prior = bilby.core.prior.PriorDict()
prior['chirp_mass']   = Uniform(name='chirp_mass', minimum=25.0,maximum=35.5)
prior['mass_ratio']   = Uniform(name='mass_ratio', minimum=0.5, maximum=1)
prior['phase']        = Uniform(name="phase", minimum=0, maximum=2*np.pi)
prior['geocent_time'] = Uniform(name="geocent_time", minimum=time_of_event-0.1, maximum=time_of_event+0.1)
prior['a_1'] =  0.0
prior['a_2'] =  0.0
prior['tilt_1'] =  0.0
prior['tilt_2'] =  0.0
prior['phi_12'] =  0.0
prior['phi_jl'] =  0.0
prior['dec'] =  -1.2232
prior['ra'] =  2.19432
prior['theta_jn'] =  1.89694
prior['psi'] =  0.532268
prior['luminosity_distance'] = Uniform(minimum=50., maximum=800., name='luminosity_distance')

# Finally, create our likelihood, passing in what is needed to get going
likelihood = bilby.gw.likelihood.GravitationalWaveTransient(interferometers, waveform_generator, priors=prior,
    time_marginalization=True, phase_marginalization=True, distance_marginalization=True)

result_short = bilby.run_sampler(
    likelihood, prior, sampler='dynesty', outdir='short_dist_flat', label="GW150914",
    conversion_function=bilby.gw.conversion.generate_all_bbh_parameters,
    sample="unif", nlive=500, dlogz=3  # <- Arguments are used to make things fast - not recommended for general use
)

result_short.plot_corner(parameters=["chirp_mass", "mass_ratio", "geocent_time","luminosity_distance","phase"], prior=True)
