In [1]:
import os
import glob
import pandas as pd
import numpy as np

import bokeh.io
import bokeh.plotting

bokeh.io.output_notebook()

%load_ext blackcellmagic

In [15]:
def ecdf_vals(data):
    """Computes the ECDF for a one-dimensional NumPy array of data.
    Returns the x and y values for plotting a 'dot-style' ECDF."""
    
    # Sort the data array
    x = np.sort(data)
    
    # Compute ECDF
    y = (1 + np.arange(len(data))) / len(data)
    
    return x, y

# Exercise 7.5: Monte Carlo simulation of transcriptional pausing

In this exercise, we will put random number generation to use and do a Monte Carlo simulation. The term Monte Carlo simulation is a broad term describing techniques in which a large number of random numbers are generated to (approximately) calculate properties of probability distributions. In many cases the analytical form of these distributions is not known, so Monte Carlo methods are a great way to learn about them.

We seek the probability distribution of backtrack times, $P(t_{bt})$, where $t_{bt}$ is the time spent in the backtrack. We could solve this analytically, which requires some sophisticated mathematics. But, because we know how to draw random numbers, we can just compute this distribution directly using Monte Carlo simulation!

We start at $x = 0$ at time $t = 0$. We “flip a coin,” or choose a random number to decide whether we step left or right. We do this again and again, keeping track of how many steps we take and what the $x$ position is. As soon as $x$ becomes positive, we have exited the backtrack. The total time for a backtrack is then $\tau n_{\textrm{steps}}$, where $\tau$ is the time it takes to make a step. Depken, et al., report that $\tau ≈ 0.5$ seconds.

a) Write a function, `backtrack_steps()`, that computes the number of steps it takes for a random walker (i.e., polymerase) starting at position $x=0$ to get to position $x=+1$. It should return the number of steps to take the walk.

In [53]:
import numba

@numba.njit

def backtrack_steps():
    """Computes the number of steps it takes for a 1D random walker starting
    at position x = 0 to get to position x = +1.
    
    Returns the number of steps to take the walk."""
    
    x = 0
    steps = 0
    
    while x != 1:
        x += 2 * np.random.randint(0, 2) - 1
        steps += 1
        
    return steps

b) Generate 10,000 of these backtracks in order to get enough samples out of $P(t_{\textrm{bt}})$. Some of these samples may take a very long time to acquire. (If you are interested in a way to really speed up this calculation, ask me about Numba. If you do use Numba, note that you must use the standard Mersenne Twister RNG for Numba; that is using `np.random.....`)

In [60]:
import tqdm

num_samps = 10000

array_backtracks = [backtrack_steps() for i in tqdm.tqdm(range(0, num_samps))]

100%|██████████| 10000/10000 [00:00<00:00, 11486.64it/s]


c) Generate an ECDF of your samples and plot the ECDF with the $x$ axis on a logarithmic scale.

In [61]:
bt, ECDF = ecdf_vals(array_backtracks)

In [62]:
p = bokeh.plotting.figure(
    frame_height = 300,
    frame_width = 400,
    x_axis_type = 'log',
    x_axis_label = 'number of steps',
    y_axis_label = 'ECDF'
)

p.circle(
    bt,
    ECDF,
)

bokeh.io.show(p)

d) Complementary cumulative distribution (CCDF)

In [72]:
CCDF = 1 - ECDF
tbt = 0.5 * bt

p = bokeh.plotting.figure(
    frame_height = 300,
    x_axis_type = 'log',
    y_axis_type = 'log', 
    x_axis_label = 't_bt',
    y_axis_label = 'CCDF',
)

p.circle(
    tbt,
    CCDF,
)

bokeh.io.show(p)

e) By doing some mathematical heavy lifting, we know that, in the limit of large $t_{bt}$,

$$P(t_{bt}) \propto t_{bt}^{-3/2},$$

so the plot you did in part (e) should have a slope of $−1/2$ on a log-log plot. Is this what you see?

In [73]:
x = [1, 10**8]
y = [1, 10**-4]

p.line(
    x,
    y,
)

bokeh.io.show(p)

In [74]:
%load_ext watermark
%watermark -v -p numpy,pandas,bokeh,colorcet,black,numba,tqdm,jupyterlab

CPython 3.7.7
IPython 7.13.0

numpy 1.18.1
pandas 0.24.2
bokeh 2.0.2
colorcet 2.0.2
black 19.10b0
numba 0.49.1
tqdm 4.46.1
jupyterlab 1.2.6
