In [1]:
import itertools
import warnings

# Our numerical workhorses
import numpy as np
import pandas as pd
import scipy.stats as st
import scipy.special

# The MCMC Hammer
import emcee

#import numba

# BE/Bi 103 utilities
import bebi103

# Import plotting tools
import matplotlib.pyplot as plt
import seaborn as sns
import corner

# Magic function to make matplotlib inline; other style specs must come AFTER
%matplotlib inline

# This enables high res graphics inline (only use with static plots (non-Bokeh))
# SVG is preferred, but there is a bug in Jupyter with vertical lines
%config InlineBackend.figure_formats = {'png', 'retina'}

# JB's favorite Seaborn settings for notebooks
rc = {'lines.linewidth': 2, 
      'axes.labelsize': 18, 
      'axes.titlesize': 18, 
      'axes.facecolor': 'DFDFE5'}
sns.set_context('notebook', rc=rc)
sns.set_style('darkgrid', rc=rc)

# Suppress future warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

In [2]:
df = pd.read_csv("./data/gardner_hw6/gardner_mt_catastrophe_only_tubulin.csv", comment = "#")

# Part A

The probability distribution for catastrophe times for a three-step process ($m=3$) has been defined as:


\begin{align}
P(t\mid \tau_1, \tau_2, \tau_3, 3, I) = \frac{1}{\tau_1\tau_2\tau_3}\int_0^t\mathrm{d}t_1 \int_{t_1}^t\mathrm{d}t_2\, \mathrm{e}^{-t_1/\tau_1}\,\mathrm{e}^{-(t_2-t_1)/\tau_2}\,\mathrm{e}^{-(t-t_2)/\tau_3}.
\end{align}

This is because we are working under the assumption that each subprocess that contributes towards a catastrophe event is essentially a Poisson process, with each subprocess occuring independently of each other at a defined rate $1/\tau_j$. For a three step process, the number of independent random variables $(X_i)$ would be two, which explains why we are integrating over 0 to t and then $t_1$ to t.

We can simplify the above expression by pairing $\tau_1$, $\tau_2$, and $\tau_3$ with their respective exponential terms to get:

\begin{align}
P(t\mid \tau_1, \tau_2, \tau_3, 3, I) = \int_0^t\mathrm{d}t_1 \int_{t_1}^t\, \big[ R_1 . \mathrm{e}^{-R_1 . \Delta t_1}\,R_2 . \mathrm{e}^{-R_2 . \Delta t_2}\,R_3 . \mathrm{e}^{-R_3 . \Delta t_3}\big] \mathrm{d}t_2 \\
\text {where} && R_1 = \frac{1}{\tau_1}, \,\,R_2 = \frac{1}{\tau_2},\,\, R_3 = \frac{1}{\tau_3} \\
\\
\text {and} && \Delta t_1 = t_1,\,\, \Delta t_2 = t_2-t_1, \,\, \Delta t_3 = t - t_2
\end{align}

If the three subprocesses take $t_!$, $t_2$, and $t_3$ to complete, then the waiting time for the first subprocess, between the first and the second subprocess, and between the second and the third subprocess would be $t_1$, $t_2-t_1$, and $t-t_2$, respectively. These waiting times will then be a part of an exponential distribution, with parameter $\lambda_i$, which in this case is $1/\tau_i$, where $i \in \{1,2\dots m\}$. Hence, the expression for the probability distribution for catastrophe times is nothing but the product of three exponential functions (waiting times between three independent Poisson processes) with parameters as described above. 

Another point worth noting is that we will always solve the second integral (containing $\mathrm{d}t_2$) before solving the first one because $t_1$ is contained within $t_2$.