Skip to content
Simulation and analysis code for manuscript
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
analysis
data
modeling
README.org

README.org

Inverted Translational Control of Eukaryotic Gene Expression by Ribosome Collisions

Heungwon Park & Arvind R. Subramaniam

http://rasilab.fredhutch.org/

This repository contains raw experimental data, code, and instructions for:

  • running simulations
  • analyzing high-throughput sequencing data and flow cytometry data
  • quantification of western blots
  • generating figures in the manuscript

Contents

Modeling

Default Run

To run the simulations, install our lab’s customized versions of:

The instructions for installing the above software are provided in the respective links.

Our kinetic model for quality control during eukaryotic translation is defined in modeling/tasep.py. This model is defined using the PySB syntax. To simulate this model with its default parameters, run:

cd modeling
python tasep.py

The above run displays the following output:

BioNetGen version 2.4.0
Reading from file ./tasep.bngl (level 0)
Read 31 parameters.
Read 5 molecule types.
Read 7 observable(s).
Read 2 species.
Read 8402 reaction rule(s).
WARNING: writeFile(): Overwriting existing file ./tasep.xml.
Wrote model in xml format to ./tasep.xml.
Finished processing file ./tasep.bngl.
CPU TIME: total 74.72 s.
NFsim -xml ./tasep.xml -sim 100000 -oSteps 10 -seed 111 -o ./tasep.gdat -rxnlog ./tasep.rxns.tsv -utl 3 -gml 1000000 -maxcputime 6000 -connect
# starting NFsim v1.11...
# seeding random number generator with: 111
# reading xml file (./tasep.xml)
-------]
# preparing simulation...
Connectivity inferred for 1000 reactions.
Connectivity inferred for 2000 reactions.
Connectivity inferred for 3000 reactions.
Connectivity inferred for 4000 reactions.
Connectivity inferred for 5000 reactions.
Connectivity inferred for 6000 reactions.
Connectivity inferred for 7000 reactions.
Connectivity inferred for 8000 reactions.
# equilibrating for :0s.
# simulating system for: 1.000000e+05 second(s).

Sim time: 0.000000e+00	CPU time (total): 3.330000e-04s	 events (step): 0
Sim time: 1.000000e+04	CPU time (total): 1.194367e+02s	 events (step): 976356
Sim time: 2.000000e+04	CPU time (total): 2.274126e+02s	 events (step): 787224
Sim time: 3.000000e+04	CPU time (total): 2.795730e+02s	 events (step): 429620
Sim time: 4.000000e+04	CPU time (total): 3.434378e+02s	 events (step): 446252
Sim time: 5.000000e+04	CPU time (total): 4.125380e+02s	 events (step): 552178
Sim time: 6.000000e+04	CPU time (total): 5.400679e+02s	 events (step): 928650
Sim time: 7.000000e+04	CPU time (total): 6.394025e+02s	 events (step): 763216
Sim time: 8.000000e+04	CPU time (total): 7.081266e+02s	 events (step): 527655
Sim time: 9.000000e+04	CPU time (total): 8.037636e+02s	 events (step): 734622
Sim time: 1.000000e+05	CPU time (total): 8.943492e+02s	 events (step): 682986

# simulated 6828760 reactions in 8.943500e+02s
# 7.635445e+03 reactions/sec, 1.309681e-04 CPU seconds/event
# null events: 0 1.309681e-04 CPU seconds/non-null event
# done.  Total CPU time: 1012.13s

CPU times will be a bit different depending on the machine.

At the end of the run, tasep.params.tsv.gz, tasep.gdat, and tasep.rxns.tsv files should be present in the modeling/ folder.

Parameter Sweep

Simulations with systematic variation of parameters are run from the 9 sub-directories in modeling/simulation_runs/. Each of these sub-directories contains a Snakemake workflow that chooses the parameters, runs the simulations, tabulates the summary data, and generates figures. Below, we describe this workflow using a specific example in the modeling/simulation_runs/endocleave_rate_vary/ sub-directory. All other sub-directories contain a very similar workflow.

For the set of 60 simulations in modeling/simulation_runs/endocleave_rate_vary/, the endonucleolytic mRNA cleavage rate in the collision-stimulated endonucleolytic cleavage model is systematically varied. The parameters that are varied from their default values are chosen in modeling/simulation_runs/endocleave_rate_vary/choose_simulation_parameters.py and written as a tab-separated file modeling/simulation_runs/endocleave_rate_vary/sim.params.tsv in the same directory. The script modeling/simulation_runs/endocleave_rate_vary/run_simulation.py runs the simulation with a single parameter set. This parameter set is decided by the single argument to this script which specifies the row number in modeling/simulation_runs/endocleave_rate_vary/sim.params.tsv. The script modeling/simulation_runs/endocleave_rate_vary/run_simulation.py invokes modeling/get_mrna_lifetime_and_psr.R to parse the raw reaction firing data and calculates the mean and standard deviation of four observables: protein synthesis rate, mRNA lifetime, ribosome collision frequency, and abortive termination frequency for each mRNA during its lifetime. These summary statistics are tabulated for all parameter combinations using the script modeling/combine_lifetime_and_psr_data.R which generates the tsv files in modeling/simulation_runs/endocleave_rate_vary/tables/. The tabulated summary statistics are analyzed and plotted in the RMarkdown script modeling/simulation_runs/endocleave_rate_vary/analyze_results.Rmd, which when knitted, results in the Github-flavored Markdown file modeling/simulation_runs/endocleave_rate_vary/analyze_results.md and the figures in modeling/simulation_runs/endocleave_rate_vary/figures/.

modeling/simulation_runs/endocleave_rate_vary/Snakefile implements the above described workflow. Simulations are often run on a cluster using the cluster configuration modeling/simulation_runs/endocleave_rate_vary/cluster.yaml.

To invoke the above workflow, run:

cd modeling/simulation_runs/endocleave_rate_vary
# check what will be run using a dry run
snakemake -np
# run everything locally; can take a very long time!!
snakemake
# use a SLURM cluster for running simulations
sh submit_cluster.sh > submit.log 2> submit.log &

All the simulations in this work can be run in a single workflow using modeling/Snakefile, but this is not typically recommended unless you are re-running only a few simulations.

Data Analysis

High-Throughput Sequencing

data/htseq/ contains the annotations for the reporter and Illumina multiplexing barcodes used for measuring mRNA levels:

Raw sequencing data in .fastq format must be downloaded to the data/htseq/ folder.

The number of Illumina sequencing reads aligning to each barcode in each sample is counted using analysis/htseq/count_barcodes.py. These counts are available as .tsv files in analysis/htseq/tables/.

The tabulated counts are processed and plotted in analysis/htseq/analyze_barcode_counts.Rmd to generate Fig. 2B, 2C, and 5C in the manuscript. The knitted code and figures from this analysis can be browsed at analysis/htseq/analyze_barcode_counts.md.

The above steps are implemented as a Snakemake workflow in analysis/htseq/Snakefile. The workflow can be run locally or on a SLURM cluster by:

cd analysis/htseq
# local run
snakemake
# cluster run
sh submit_cluster.sh > submit.log 2> submit.log &

This workflow can be visualized by:

snakemake --forceall -dag | dot -Tpng -o dag.png

which produces the following graph: analysis/htseq/dag.png

This workflow generates Fig. 2B, 2C, 5B, and S4B.

Flow Cytometry

data/flow/ contains the annotations for the 8 flow cytometry experiments in our work.

analysis/flow/ contains the RMarkdown scripts for generating figures from the raw data and annotations.

The RMarkdown scripts can be knitted to generate the figures by:

cd analysis/flow
for file in *.Rmd; do R -e "rmarkdown::render('$file')"; done

Western Blots

Un-cropped western blot images corresponding to Fig. 1D, 5C are provided as .png images in data/western/. The region in each image cropped for inclusion in the manuscript is shown as a rectangle.

The lanes are quantified using ImageJ (Rectangle Select → Analyze → Measure) and pasted as tab-delimited rows. This quantification for all lanes in the manuscript is in data/western/quantification.tsv.

Normalization of the lanes for display in figures is carried out in analysis/western/western_analysis.md.

You can’t perform that action at this time.