# Tunable Ribosome Occupancy Model
_Author: Raghav Chanchani_

_For the Zid Lab at UC San Diego_

Date last modified: 3/1/2019
#### A discrete-time single probability chain to simulate ribosome distribution along a given gene using the assumptions listed below. There is no final absorption state. Ribosomes are recycled to the -1 index after they move off of the simulated mRNAs. This model does not consider binding energies of the ribosomes to sequences in the gene of interest or tRNA binding affinity. Time is simulated to 0.01sec. precision.

#### Note: inefficient data structures and redundant methods written to fit specifications given by supervisor for future use and backward compatibility with existing programs
__References:__
1. https://book.bionumbers.org/what-is-faster-transcription-or-translation
2. https://www.columbia.edu/~ks20/stochastic-I/stochastic-I-MCI.pdf
3. https://github.com/gvanderheide/discreteMarkovChain

__Assumptions:__
1. Initiation when ribosome reads first codon _(AUG in given .txt file)_
2. Elongation in-between
3. Ribosome moves one triplet with each elongation step
4. Ribosomes are recycled once they move off of the gene
5. If any pair of ribosomes' positions are under appropriate distance apart, then the ribosome closer to AUG is not allowed to move until its position $\geq$ the distance threshold from the next ribosome
6. Termination when ribosome reads UAG, UAA, or UGA _(in next update)_
7. Length of gene in graph and calculations is in codons, not nucleotides
8. Position refers to the center position of the ribosome
9. Each second is divided into 100ths to allow for more accurate reflection of kI and kE values
10. Calculations of $\text{time res.}\over{kI\text{ or }kE}$ are "floored" (e.g. math.floor($100\over{kI}$)) in determining when to attempt initiation and elongation

__Tunable Parameters:__
1. Initiation rate
2. Elongation rate
3. Number of ribosomes on a single mRNA
4. Number of mRNAs
5. Probability of a ribosome moving from its current position (currently single prob.)
6. Size of a ribosome (in nucleotides)
7. How long the simulated run is for (sec.)
8. Time until initiation rate becomes 0 _(occurs stepwise linearly)_
9. Time until elongation rate becomes 0 _(occurs stepwise linearly)_

In [1]:
%matplotlib notebook
import numpy as np
import math
import sys
import os
import argparse
import matplotlib.pyplot as plt
from ipywidgets import *
from IPython.display import display
plt.style.use('ggplot')

In [2]:
"""
Read in the file containing the gene of interest and determine whether or not it is the correct file type ".txt" and
initialize the mRNA, occupancy, and ribosome_list lists.
"""
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args('Pab1.txt'.split())
ribosome_list = [] # storage for all ribosomes in a single mRNA
gene_length = 0
codon_size = 3
res = 100 # time resolution (the number of divisions made of a second)
mRNA = []
occupancy = [] # storage for all ribosomes in all mRNAs
fname = args.filename
if not fname.lower().endswith(('.txt')):
    parser.error("Input file must be a .txt file")

In [3]:
"""
Description: Creates the ribosome object which stores the ribosome's position and how long
    it has been at its current position. The counter is initially set to zero.
"""
class ribosome:
    """
    Description: Creates a new ribosome object with initial position -1 and counter 0.
    Inputs: None
    Return: None
    """
    def __init__(self):
        self.position = -1
        self.counter = 0

In [4]:
"""
Description: Counts number of nucleotides in the gene of interest and creates an "mRNA" list of the same size
    that will be used to note ribosome positions on.
Inputs: gene_file - a .txt file that contains the mRNA base pairs of a given gene.
Return: gene_length - the length of the gene passed in, in nucleotides
        mRNA - a list corresponding with the length of the (gene_file in codons)
"""
def read_gene(gene_file):
    temp_list = []
    mRNA = []
    
    try:
        with open(gene_file) as inputFileHandle:
            line_list = [lines.split() for lines in inputFileHandle]  # extract lines
            while line_list:
                temp_list.extend(line_list.pop(0))
                while temp_list:
                    if ('>') in temp_list[0]:
                        target = temp_list[0].strip('>')
                    mRNA.extend(temp_list.pop(0))
            gene_length = int(len(mRNA)/codon_size)
            return inputFileHandle.read(), gene_length, mRNA, target
    except IOError:
        sys.stderr.write("read_gene - Error: Could not open {}\n".format(gene_file))
        sys.exit(-1)
    

In [5]:
"""
Description: Given the probability that the ribosome will move, the ribosome is evaluated to move
    or remain in its current position.
Inputs: ribosome - ribosome object trying to move
Return: ribosome - same ribosome object with updated position and counter
"""
def move(ribosome, gene_length, probability):
    position = ribosome.position
    
    if ribosome.position <= gene_length:
        moves = np.random.random() <= probability
        if moves:
            ribosome.position = position + 1
            ribosome.counter = 1
        else:
            ribosome.counter += 1
    return ribosome

In [6]:
"""
Description: Graphs the ribosome occupancy of codons as a histogram with x-axis being the position (in codons)
    and the y-axis being occupancy as a fraction of the total number of ribosomes simulated across all mRNA
Inputs: ribos - the list containing all ribosomes
Return: None
"""
def create_histogram(all_cules, fname, gene_length, res, target):
    pos_array = [ribo.position for sim in all_cules for ribo in sim]
    ax = plt.gca()
    ax.set_title('Ribosome Frequency ' + target)
    ax.set_yscale('log')
    hist = plt.hist(pos_array, bins=range(0,gene_length + 1,5),density=False)
    plt.xlabel('Distance from AUG (codons)')
    plt.ylabel('Ribosome Frequency')
    plt.show()
    
    return

In [2]:
"""
Description: Increases a given variable based on slider input
Inputs: start - initial variable value
        end - final variable value
        curr - current variable value
        sldr - value set in slider by user
        default - value set to instead of 0 to prevent divide by zero error
Return: curr - value of variable after increase
"""
def increase(start, end, curr, sldr, default):
    run = abs(start - end)
    
    if run == 0:
        if curr == start:
            curr = sldr
    else:
        slp = sldr / run
        # increase from "0" to slider
        if curr >= start and curr <= end:
            if curr == end:
                curr = rise
            else:
                curr = ((slp * curr) + (-slp * start))
                if curr == 0:
                    curr = default

    return curr

In [1]:
"""
Description: Decreases a given variable based on slider input
Inputs: start - initial variable value
        end - final variable value
        curr - current variable value
        sldr - value set in slider by user
        default - value set to instead of 0 to prevent divide by zero error
Return: curr - value of variable after decrease
"""
def decrease(start, end, curr, sldr, default):
    run = abs(start - end)
    
    if run == 0:
        if curr == start:
            curr = default
    else:
        slp = sldr / run
        # decrease from slider to "effective 0"
        if curr >= start and curr <= end:
            if curr == end:
                curr = default
            else:
                rate = ((slp * curr) + (-slp * end))
                if curr == 0:
                    curr = default
    
    return curr

In [None]:
"""
Description: Simulates ribosome movement in the elongation and initiation timestep condition
Inputs: ribosome list - ribosomes to be run on a given mRNA
        complex - ribosome index variable
        ribo_size - the size of the ribosome object (nucleotides)
        prob - the probability that a ribosome will move forward from its current position
        gene_length - length of the gene of interest
        n_ribo - secondary ribosome index variable, used as a flag to determine valid transition
Return: ribosome_list - list containing all ribosome objects corresponding to a given mRNA
"""
def init_and_elo(len(ribosome_list),ribosome_list, complex, n_ribo, gene_length, prob, ribo_size):
    for ribo in ribosome_list:
        if ribosome_list.index(ribo) == complex:
            n_ribo += 1
        elif ribo.position > ribosome_list[complex].position:
            if ribo.position - ribosome_list[complex].position > ribo_size: # can move
                n_ribo += 1
            else: 
                break
        elif ribo.position < ribosome_list[complex].position:
            n_ribo += 1
        elif ribo.position == -1:
            n_ribo += 1
        else:
            n_ribo += 1
    if n_ribo == len(ribosome_list): # the last simulated ribosome
        ribosome_list[complex] = move(ribosome_list[complex], gene_length, prob)
        if ribosome_list[complex].position > gene_length:
            ribosome_list[complex].position = -1
            ribosome_list[complex].counter = 1
    else: ribosome_list[complex].counter += 1

    return ribosome_list

In [None]:
"""
Description: Simulates ribosome movement in the initiation only timestep condition
Inputs: ribosome list - ribosomes to be run on a given mRNA
        complex - ribosome index variable
        ribo_size - the size of the ribosome object (nucleotides)
        prob - the probability that a ribosome will move forward from its current position
        gene_length - length of the gene of interest
        n_ribo - secondary ribosome index variable, used as a flag to determine valid transition
Return: ribosome_list - list containing all ribosome objects corresponding to a given mRNA
"""
def init_only(len(ribosome_list), ribosome_list, complex, n_ribo, gene_length, prob, ribo_size):
    if ribosome_list[complex].position == -1:
        for ribo in ribosome_list:
            if ribosome_list.index(ribo) == complex:
                n_ribo += 1
            elif ribo.position > ribosome_list[complex].position:
                if ribo.position - ribosome_list[complex].position > ribo_size: # can move
                    n_ribo += 1
                else: 
                    break
            elif ribo.position < ribosome_list[complex].position:
                n_ribo += 1
            elif ribo.position == -1:
                n_ribo += 1
            else:
                n_ribo += 1
        if n_ribo == len(ribosome_list):
            ribosome_list[complex] = move(ribosome_list[complex], gene_length, prob)
            if ribosome_list[complex].position > gene_length:
                ribosome_list[complex].position = -1
                ribosome_list[complex].counter = 1
            else: ribosome_list[complex].counter += 1
    else: ribosome_list[complex].counter += 1

    return ribosome_list

In [None]:
"""
Description: Simulates ribosome movement in the no initiation timestep condition
Inputs: ribosome list - ribosomes to be run on a given mRNA
        complex - ribosome index variable
        ribo_size - the size of the ribosome object (nucleotides)
        prob - the probability that a ribosome will move forward from its current position
        gene_length - length of the gene of interest
        n_ribo - secondary ribosome index variable, used as a flag to determine valid transition
Return: ribosome_list - list containing all ribosome objects corresponding to a given mRNA
"""
def no_init(len(ribosome_list), ribosome_list, complex, n_ribo, gene_length, prob, ribo_size):
    if ribosome_list[complex].position == -1:
        ribosome_list[complex].counter += 1
    else:
        for ribo in ribosome_list:
            if ribosome_list.index(ribo) == complex:
                n_ribo += 1
            elif ribo.position > ribosome_list[complex].position:
                if ribo.position - ribosome_list[complex].position > ribo_size: # can move
                    n_ribo += 1
                else: 
                    break
            elif ribo.position < ribosome_list[complex].position:
                n_ribo += 1
            elif ribo.position == -1:
                n_ribo += 1
            else:
                n_ribo += 1
        if n_ribo == len(ribosome_list):
            ribosome_list[complex] = move(ribosome_list[complex], gene_length, prob)
            if ribosome_list[complex].position > gene_length:
                ribosome_list[complex].position = -1
                ribosome_list[complex].counter = 1
        else: ribosome_list[complex].counter += 1

    return ribosome_list

In [9]:
"""
Description: Determines whether or not the ribosomes on the simulated mRNA will collide, which ribosome is allowed to
    move, and updates the positions and number of timesteps present at a given location of all ribosomes corresponding
    to a simulated mRNA.
Inputs: ribosomes - the number of ribosomes to be run on a given mRNA
        time - the length of the simulation (sec.)
        ribo_size - the size of the ribosome object (nucleotides)
        prob - the probability that a ribosome will move forward from its current position
        kI - initiation rate
        kE - elongation rate
        t_st_init - start initiation timestep
        t_st_elo - start elongation timestep
        t_crit_st_init - timestep to reach slider initation rate
        t_crit_st_elo - timestep to reach slider elongation rate
        t_stp_init - timestep when initiation rate reaches effective 0
        t_stp_elo - timestep when elongation rate reaches effective 0
        t_crit_spt_init - timestep to being decrease of initation rate
        t_crit_stp_elo - timestep to being decrease of initation rate
        gene_length - length of gene of interest
        res - time resolution (1/x sec.)
        codon_size - size of codon in nt.
Return: ribosome_list - list containing all ribosome objects corresponding to a given mRNA
"""
def simulate(ribosomes, time, ribo_size, prob, kI, kE, t_st_init, t_st_elo,
            t_crit_st_init, t_crit_st_elo, t_stp_init, t_stp_elo, t_crit_stp_init,
            t_crit_stp_elo, gene_length, res, codon_size):

    dead = res ** -codon_size
    ribo_size = math.floor(ribo_size / codon_size)
    
    if kE == 0:
        slider_kE = dead
    else:
        slider_kE = kE
    kE = dead
    if kI == 0:
        slider_kI = dead
    else:
        slider_kI = kI
    kI = dead
    
    init_fall = -slider_kI
    elo_fall = -slider_kE
    init_st_run = t_crit_st_init - t_st_init
    init_stp_run = t_crit_stp_init - t_stp_init
    elo_stp_run = t_crit_stp_elo - t_stp_elo
    elo_st_run = t_crit_st_elo - t_st_elo

    # list of user-defined number of ribosomes
    ribosome_list = [ribosome() for complex in range(ribosomes)]
    flag, gene_length, mRNA, target = read_gene(args.filename)
    # splits the simulation into 0.01sec. resolution
    discrete_time = [t for t in range(time * res)]
    for t_step in discrete_time:
        # Increase initiation rate starting at t_st_init until t_crit_init to kI
        kI = increase(init_st_run, t_step, slider_kI, dead)
        # Decrease initiation rate starting at t_stp_init until t_crit_stp_init to 0
        kI = decrease(init_stp_run, t_step, init_fall, dead)
        # Increase elongation rate starting at t_st_elo until t_crit_elo to kE
        kE = increase(elo_st_run, t_step, slider_kE, dead)
        # Decrease elongation rate starting at t_stp_init until t_crit_stp_init to 0
        kE = decrease(elo_stp_run, t_step, elo_fall, dead)

        # each ribosome...
        for complex in range(len(ribosome_list)):
            # compared ribosome counter variable
            n_ribo = 0
            step = ribosome_list[complex].counter
            if t_step % int(res / kI) == 0: # initiation timestep
                if step % int(res / kE) == 0: # elongation timestep
                    ribosome_list = init_and_elo(len(ribosome_list),ribosome_list, complex, n_ribo, gene_length, prob, ribo_size)
                # if only initiation timestep
                else:
                    ribosome_list = init_only(len(ribosome_list), ribosome_list, complex, n_ribo, gene_length, prob, ribo_size)
            # if not initiation timestep
            else:
                # if elongation timestep but not initiation timestep
                if step % int(res / kE) == 0:
                    ribosome_list = no_init(len(ribosome_list), ribosome_list, complex,n_ribo, gene_length, prob, ribo_size)
                # if not initiation nor elongation timestep
                else:
                    ribosome_list[complex].counter += 1
    return ribosome_list, target

In [10]:
"""
Description: Creates a list of lists containing the states of ribosomes at each time step
Inputs: ribosomes - the number of ribosomes to be run on a given mRNA
        time - the length of the simulation (sec.)
        ribo_size - the size of the ribosome object (nucleotides)
        prob - the probability that a ribosome will move forward from its current position
        kI - initiation rate
        kE - elongation rate
        t_st_init - start initiation timestep
        t_st_elo - start elongation timestep
        t_crit_st_init - timestep to reach slider initation rate
        t_crit_st_elo - timestep to reach slider elongation rate
        t_stp_init - timestep when initiation rate reaches effective 0
        t_stp_elo - timestep when elongation rate reaches effective 0
        t_crit_spt_init - timestep to being decrease of initation rate
        t_crit_stp_elo - timestep to being decrease of initation rate
        gene_length - length of gene of interest
        codon_size - size of codon in nt.
Return: None
"""
def concatenate(ribosomes, time, ribo_size, n_mRNA, prob, kI, kE, t_st_init, t_st_elo,
               t_crit_st_init, t_crit_st_elo, t_stp_init, t_stp_elo, t_crit_stp_init, t_crit_stp_elo):
    occupancy = []
    
    for mRNAs in range(n_mRNA):
        rib_list, target = simulate(ribosomes, time,
                                  ribo_size, prob, kI, kE,
                                  t_st_init, t_st_elo,
                                  t_crit_st_init, t_crit_st_elo,
                                  t_stp_init, t_stp_elo,
                                  t_crit_stp_init, t_crit_stp_elo,
                                  gene_length, res, codon_size)
    occupancy.append(rib_list)
    create_histogram(occupancy, fname, gene_length, res, target)
    print("gene length: {} codons".format(gene_length))
    
    return

In [11]:
"""
Initialize the sliders that control length of simulation, number of ribosomes, number of mRNA, size of mRNAs, kI, kE,
    and the probability of moving to the next codon. Update the histogram when Run Interact button is pressed.
"""
initiation = widgets.IntSlider(min = 0, max = 10, value = 1, step = 1, description = 'kI - try/sec.');
init_stop_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. to kI = 0');
elongation = widgets.IntSlider(min = 0, max = 20, value = 20, step = 1, description = 'kE - try/sec.');
elo_stop_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. to kE = 0');
init_start_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. delay kI');
elo_start_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. delay kE');
init_crit_inc_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. until kI');
elo_crit_inc_time = widgets.IntSlider(min = 0, max = 90001, value = 50, step = 1, description = 'sec. until kE');
prob_move = widgets.FloatSlider(min = 0.00, max = 1.00, value = 0.50, step = 0.01, description = 'P(forward)');
sizes = widgets.IntSlider(min = 25, max = 35, value = 30, step = 1, description = 'Ribo size (nt.)');
secs = widgets.IntSlider(min = 0, max = 90000, value = 100, step = 10, description = 'time (sec.)');
ribo = widgets.IntSlider(min = 0, max = 25, value = 25, step = 1, description = 'n(ribosomes)');
num_cules = widgets.IntSlider(min = 0, max = 5000, value = 10, step = 1, description = 'n(mRNA)');
init_decay_time = widgets.IntSlider(min = 0, max = 90001, value = 20, step = 1, description = 'kI start decay');
elo_decay_time = widgets.IntSlider(min = 0, max = 90001, value = 20, step = 1, description = 'kE start decay');
interact_manual(concatenate,
                ribosomes = ribo,
                time = secs,
                ribo_size = sizes,
                n_mRNA = num_cules,
                prob = prob_move,
                kI = initiation,
                kE = elongation,
                t_st_init = init_start_time,
                t_st_elo = elo_start_time,
                t_crit_st_init = init_crit_inc_time,
                t_crit_st_elo = elo_crit_inc_time,
                t_stp_init = init_decay_time,
                t_stp_elo = elo_decay_time,
                t_crit_stp_init = init_stop_time,
                t_crit_stp_elo = elo_stop_time);

interactive(children=(IntSlider(value=25, description='n(ribosomes)', max=25), IntSlider(value=100, descriptio…