**Tutorial 6b - Microlensing event statistics: selection**

In this tutorial, we will look at some gravitational microlensing data.  This is where a star is magnified by the presence of some compact massive object very close to the line of sight to the star.  Each microlensing event is detected by a brightening of a star in the galactic bulge, LMC or SMC.   From the light curve of the star during the event, three things can be measured - the time it occurs, the duration of the event, $\Delta t$, and the Einstein crossing time, $t_E$.  $\Delta t$ is defined as the length of time the star is magnified by greater than 1.34, which is the magnification the source will have when its angular separation from the lens is less than one Einstein radius, $R_E$.

The Einstein radius in angular units is $ R_E = \sqrt{ \frac{4 G m}{c^2} \frac{D_{ls}}{D_l D_s} } = \sqrt{ 2 R_{sh} \frac{D_{ls}}{D_l D_s} }$ where $m$ is the mass of the lens and $R_{sh}$ is the Schwarzschild radius of the lens.  $D_{ls}$ is the radial distance between the lens and the source star, $D_l$ is the distance from us to the lens, and $D_s$ is the distance from us to the source.

The Einstein crossing time is 

$t_E = \frac{2R_E}{v_\perp}$ where $v_\perp$ is the velocity of the lens transverse to our line of sight relative to the source.

In [None]:
import os
import numpy as np
import matplotlib.pyplot as plt
import pandas as pa
from math import erf
from scipy.integrate import quad

Under the simplifications that :

* the density of lenses is constant between the observer and sources
* the lenses' velocities are Gaussian distributed with zero mean
* all the lenses have the same mass

the event rate, $\Gamma$, as a function of Einstein times can be found to be

$\frac{d\Gamma}{dt_E} = \frac{\sigma^2 D_s \eta}{4 t_o} \left( \frac{t_o}{t_E} \right)^4 \left[  \pi \left( \left( \frac{t_o}{t_E} \right)^4  + 8\left( \frac{t_o}{t_E} \right)^2 + 3 \right) \exp\left[ - \frac{1}{8}  \left( \frac{t_o}{t_E} \right)^2 \right] {\rm erf}\left(  \left( \frac{t_o}{3t_E} \right)^{3/2} \right) - \frac{ \left( \frac{t_o}{t_E} \right)^2 +12 }{ \left( \frac{t_o}{t_E} \right)^4} \right]$



The charactoristic timescale is $t_o = \frac{2 R_{sh} D_s}{\sigma^2}$ where $\sigma^2$ is the velocity dispersion of the lenses.  $\eta$ is the number density of lenses.  We will not be concerned with measuring the normalization here.  We will only seek to recover $t_o$.  Since we can estimate $D_s$ and $\sigma^2$ from other sources, this would be a measure of the lenses' mass.

1) Code up the Einstein time distribution.  The normalization needs to be calculated numerically.  Calculate it by integrating over $t_e$ with quad().  Keep the normalization constant out of the function definition.

In [None]:

norm = 1
def dGammdt(te,to) :
    if te<0.1*to :  # this avoids some numerical problems
        return 0
    t = to/te
    return  ...

tes = np.linspace(0.01,1,100)

norm = 

dGammdt_v = np.vectorize(dGammdt)  # this allows a vector input

2) Plot the normalized $t_E$ distribution for $t_o=50$ and $t_o=80$ on the same plot with labels.

3) By integrating your probability distribution, calculate the cumulative distribution function of $t_E$ for $t_o=60$, $F(t_E|t_o)$ and plot it.

In [None]:
cdf = np.empty_like(tes)
for i,t in enumerate(tes) :
    cdf[i] = ....

plt.plot(tes*to,cdf)
plt.xlabel(r'$t_E$')
plt.ylabel(r'CDF($t_E| t_O$')
plt.ylim(0,1)
plt.show()

4) Calculate approximately what the value of $t_o$ should be for events in the galactic bulge.  Use $D_s=8$ kpc, $\sigma = 200$ km/s and lens mass 1 $M_{sun}$.  To do this, you might find the following modules in astropy useful.

In [None]:

import astropy.constants as const
from astropy import units 

# some examples
print(const.M_sun)
print(const.M_sun/const.c**2)
print(const.M_sun*const.G/const.c**2)
print('10 days in seconds = ',10*units.day.to('s')*units.s)


to = ...

print(to.value,'days')

5) Read in the data and make a histogram of both the $t_E$'s and the $\Delta t$'s on the same plot.  Make the histograms transparent so you can see the overlapping regions (hint: use the alpha keyword).

In [None]:
df = pa.read_csv('microlensing_events.csv')
te_ob = ...
t_ob = ...

.
.
.

5) Write a negative log-likelihood function for the $t_E$ data.   Use it to plot the likelihood as a function of $t_o$.

In [None]:


def loglike(to) :
    return ...

tos = np.linspace(50,100,100)
ll_noselection = np.empty_like(tos)
for i,t in enumerate(tos) :
    

plt.plot(...)
plt.xlabel(r'$t_o$')
plt.ylabel(r'$L(t_E | t_o)$')
plt.show()


6) Find the maximum likelihood solution for $t_o$ using scipy.optimize.minimize(). 

In [None]:
from scipy.optimize import minimize 

...

print('maximum likelihood t_o = ',best.x)

The Einstein time, $t_E$, is measurable from the shape of the light curve for each event, but it is not the actual length of the microlensing event.  It is the time it would take for the source to travel 2 $R_E$ .  The actual duration of the event is the time the source spends *within* one $R_E$ of the lens.  This diagram might make it clearer.

<img src="einstein_time.png" alt="t_e" class="bg-primary" width="400px">

The two times are equal only if the path of the source passes directly through the lens.

Since the impact parameter is uniformly distributed, the distribution of times is :

$p(\Delta t|t_E) =  \left\{\begin{array}{cc}
\frac{1}{t_E}\frac{\Delta t}{\sqrt{t_E^2 - \Delta t^2}} & \Delta t< t_E \\
0 & \Delta t > t_E
\end{array} \right.
$

7) Write functions that give pdf $p(\Delta t | t_E)$, the cdf $F(\Delta t | t_E)$ and the quantile function $Q(u | t_E)$.  These can all be found analytically.

In [None]:
def pt_te(t,te) :
    if(t>=te or t<0) :
        return 0
    return ...

## cumulative F(t|te)
def cdft_te(t,te) :
    if(t > te) :
        return ..
    return ...

## quantile function Q(u|te)
def quantile_te(u,te) :
    return ...

8) Find $p(\Delta t|to)$ by integrating over $t_E$ for $t_o=60$.  Remember the product rule!  This must be integrated for each $t_E$.

Plot both $p(t_E|t_o)$ again and $p(\Delta t|to)$ in the same plot.

In [None]:
to=60
## make the function p(te,t|to)
def pte(te,t,to) :
    return ...


p = np.empty_like(tes)
for i,t in enumerate(tes) :
    p[i] = ... # integrate over te

...
plt.plot(tes*to,...label=r'$p(\Delta t|t_o)$')
plt.plot(tes*to,...label=r'$p(t_E|t_o)$')

plt.legend()
plt.xlabel(r'$t_E ~, ~t$')
plt.show()



Events that are too long or short cannot be detected because of the observation cadence and the duration of the monitoring campaign.  We will simplify these restrictions by saying that events with $\Delta t < t_{min}$ or  $\Delta t > t_{max}$ are not observable in our data set.

9) Now construct the likelihood that takes into account the *selection*.  The likelihood for one event with the selection function $S(\Delta t)$ is:

$ L(t,t_E | t_o) = \frac{p(t|t_E,t_o)p(t_E|t_o)S(t)}{\int_0^\infty dt_E p(t_E|to) \int_0^\infty dt p(t|t_E) S(t) } 
= \frac{p(t|t_E)p(t_E|t_o)}{\int_0^\infty dt_E p(t_E|to) \left[  F(t_{max}|t_E) - F(t_{min}|t_E) \right]} $

$t$ here is $\Delta t$.

In [None]:
# denominator of above
def dnormdte(te,to,tmin,tmax) :
    return ...

def lnL(to,tmin,tmax) :
   norm = quad( ...  )
   output = ....
   for i,te in enumerate(te_ob) :
       ....
       output += np.log( .... )
   ...
    return output
    


10) Using $t_{min} = 10$ days and $t_{max} = 60$ days, loop through tos to make a vector of the likelihood as a function of $t_o$.  This may take a while to run.

In [None]:
tmin=10
tmax=60
ll = np.empty_like(tos)

for i,to in enumerate(tos) :
    ll[i] = ...

11) Make a plot of the posterior found before using only $t_E$'s and no selection, and the posterior with selection taken into account.  Normalize these numerically using scipy.integrate.trapz()

In [None]:
from scipy.integrate import trapz

.
.
.

plt.legend()
plt.xlabel(r'$t_o$')
plt.ylabel(r'$L(t_E | t_o)$')
plt.show()

12)  Find the maximum likelihood value for $t_o$ with the selection effects.  Find this by finding the location of the maximum in the plot you just made.

In [None]:
print("ML to = ",...)
    

13) Find the mean and variance of the posterior using numerical integration of the table already created  ( use scipy.integrate.trapz() ).

In [None]:
.
.
.
print('<to> = ', ... ,' +/- ', ... )

14) Calculate the mass of the lenses for the maximum likelihood solution for $t_o$ taking $D_s=8$ kpc, $\sigma = 200$ km/s.

In [None]:

print('lens mass = ', ,' +/- ',...,' Msun' )

The actual case is a bit more complicated.  This is a calculation of the OGLE microlensing survey's true selection function as function of $t_E$ and the the closest approach in units of $R_E$ (Peel & Dennison, 2006).

<img src="OGLE-detection-efficiency.png" alt="t_e" class="bg-primary" width="400px">

 This is calculated by creating fake events and seeing what fraction of them are detected. This is calculated by running an event detection algorithm on simulated data.