Inverted Translational Control of Eukaryotic Gene Expression by Ribosome Collisions
Heungwon Park & Arvind R. Subramaniam
This repository contains raw experimental data, code, and instructions for:
- running simulations
- analyzing high-throughput sequencing data and flow cytometry data
- quantification of western blots
- generating figures in the manuscript
- Data Analysis
To run the simulations, install our lab’s customized versions of:
- PySB: https://github.com/rasilab/PySB
- BioNetGen: https://github.com/rasilab/BioNetGen
- NFsim: https://github.com/rasilab/NFsim
The instructions for installing the above software are provided in the respective links.
Our kinetic model for quality control during eukaryotic translation is defined in modeling/tasep.py. This model is defined using the PySB syntax. To simulate this model with its default parameters, run:
cd modeling python tasep.py
The above run displays the following output:
BioNetGen version 2.4.0 Reading from file ./tasep.bngl (level 0) Read 31 parameters. Read 5 molecule types. Read 7 observable(s). Read 2 species. Read 8402 reaction rule(s). WARNING: writeFile(): Overwriting existing file ./tasep.xml. Wrote model in xml format to ./tasep.xml. Finished processing file ./tasep.bngl. CPU TIME: total 74.72 s. NFsim -xml ./tasep.xml -sim 100000 -oSteps 10 -seed 111 -o ./tasep.gdat -rxnlog ./tasep.rxns.tsv -utl 3 -gml 1000000 -maxcputime 6000 -connect # starting NFsim v1.11... # seeding random number generator with: 111 # reading xml file (./tasep.xml) -------] # preparing simulation... Connectivity inferred for 1000 reactions. Connectivity inferred for 2000 reactions. Connectivity inferred for 3000 reactions. Connectivity inferred for 4000 reactions. Connectivity inferred for 5000 reactions. Connectivity inferred for 6000 reactions. Connectivity inferred for 7000 reactions. Connectivity inferred for 8000 reactions. # equilibrating for :0s. # simulating system for: 1.000000e+05 second(s). Sim time: 0.000000e+00 CPU time (total): 3.330000e-04s events (step): 0 Sim time: 1.000000e+04 CPU time (total): 1.194367e+02s events (step): 976356 Sim time: 2.000000e+04 CPU time (total): 2.274126e+02s events (step): 787224 Sim time: 3.000000e+04 CPU time (total): 2.795730e+02s events (step): 429620 Sim time: 4.000000e+04 CPU time (total): 3.434378e+02s events (step): 446252 Sim time: 5.000000e+04 CPU time (total): 4.125380e+02s events (step): 552178 Sim time: 6.000000e+04 CPU time (total): 5.400679e+02s events (step): 928650 Sim time: 7.000000e+04 CPU time (total): 6.394025e+02s events (step): 763216 Sim time: 8.000000e+04 CPU time (total): 7.081266e+02s events (step): 527655 Sim time: 9.000000e+04 CPU time (total): 8.037636e+02s events (step): 734622 Sim time: 1.000000e+05 CPU time (total): 8.943492e+02s events (step): 682986 # simulated 6828760 reactions in 8.943500e+02s # 7.635445e+03 reactions/sec, 1.309681e-04 CPU seconds/event # null events: 0 1.309681e-04 CPU seconds/non-null event # done. Total CPU time: 1012.13s
CPU times will be a bit different depending on the machine.
At the end of the run,
tasep.rxns.tsv files should be present in the modeling/ folder.
Simulations with systematic variation of parameters are run from the 9 sub-directories in modeling/simulation_runs/. Each of these sub-directories contains a Snakemake workflow that chooses the parameters, runs the simulations, tabulates the summary data, and generates figures. Below, we describe this workflow using a specific example in the modeling/simulation_runs/endocleave_rate_vary/ sub-directory. All other sub-directories contain a very similar workflow.
For the set of 60 simulations in modeling/simulation_runs/endocleave_rate_vary/, the endonucleolytic mRNA cleavage rate in the collision-stimulated endonucleolytic cleavage model is systematically varied.
The parameters that are varied from their default values are chosen in modeling/simulation_runs/endocleave_rate_vary/choose_simulation_parameters.py and written as a tab-separated file modeling/simulation_runs/endocleave_rate_vary/sim.params.tsv in the same directory.
The script modeling/simulation_runs/endocleave_rate_vary/run_simulation.py runs the simulation with a single parameter set.
This parameter set is decided by the single argument to this script which specifies the row number in modeling/simulation_runs/endocleave_rate_vary/sim.params.tsv.
The script modeling/simulation_runs/endocleave_rate_vary/run_simulation.py invokes modeling/get_mrna_lifetime_and_psr.R to parse the raw reaction firing data and calculates the mean and standard deviation of four observables: protein synthesis rate, mRNA lifetime, ribosome collision frequency, and abortive termination frequency for each mRNA during its lifetime.
These summary statistics are tabulated for all parameter combinations using the script modeling/combine_lifetime_and_psr_data.R which generates the
tsv files in modeling/simulation_runs/endocleave_rate_vary/tables/.
The tabulated summary statistics are analyzed and plotted in the RMarkdown script modeling/simulation_runs/endocleave_rate_vary/analyze_results.Rmd, which when knitted, results in the Github-flavored Markdown file modeling/simulation_runs/endocleave_rate_vary/analyze_results.md and the figures in modeling/simulation_runs/endocleave_rate_vary/figures/.
modeling/simulation_runs/endocleave_rate_vary/Snakefile implements the above described workflow. Simulations are often run on a cluster using the cluster configuration modeling/simulation_runs/endocleave_rate_vary/cluster.yaml.
To invoke the above workflow, run:
cd modeling/simulation_runs/endocleave_rate_vary # check what will be run using a dry run snakemake -np # run everything locally; can take a very long time!! snakemake # use a SLURM cluster for running simulations sh submit_cluster.sh > submit.log 2> submit.log &
All the simulations in this work can be run in a single workflow using modeling/Snakefile, but this is not typically recommended unless you are re-running only a few simulations.
- modeling/simulation_runs/preterm_compare_models/Snakefile workflow generates Fig. 3B, S2A, S2B, S2C, S2D.
- modeling/simulation_runs/csat_model_vary_num_stalls/Snakefile workflow generates Fig. 3C.
- modeling/simulation_runs/mrna_endocleave_compare_models/Snakefile workflow generates Fig. 4B, 4C, S3A.
- modeling/simulation_runs/csec_model_vary_num_stalls/Snakefile workflow generates Fig. S3B.
data/htseq/ contains the annotations for the reporter and Illumina multiplexing barcodes used for measuring mRNA levels:
- data/htseq/barcode_annotations.tsv contains the 8nt barcodes inserted into the 3′UTR along with a unique plate and well number for each barcode.
- data/htseq/strain_barcode_annotations.tsv contains the plate + well number of the 8nt barcode and the corresponding reporter plasmid listed in Table S1 of the manuscript.
- data/htseq/strain_annotations.tsv contains the initiation and codon mutations in each reporter plasmid that barcoded, and is similar to Table S1 of the manuscript.
- data/htseq/r2_barcode_annotations.tsv contains the Illumina multiplexing barcodes and the corresponding the strain background and whether the library is prepared from cDNA or gDNA.
Raw sequencing data in
.fastq format must be downloaded to the data/htseq/ folder.
The tabulated counts are processed and plotted in analysis/htseq/analyze_barcode_counts.Rmd to generate Fig. 2B, 2C, and 5C in the manuscript. The knitted code and figures from this analysis can be browsed at analysis/htseq/analyze_barcode_counts.md.
The above steps are implemented as a
Snakemake workflow in analysis/htseq/Snakefile.
The workflow can be run locally or on a SLURM cluster by:
cd analysis/htseq # local run snakemake # cluster run sh submit_cluster.sh > submit.log 2> submit.log &
This workflow can be visualized by:
snakemake --forceall -dag | dot -Tpng -o dag.png
data/flow/ contains the annotations for the 8 flow cytometry experiments in our work.
analysis/flow/ contains the RMarkdown scripts for generating figures from the raw data and annotations.
The RMarkdown scripts can be knitted to generate the figures by:
cd analysis/flow for file in *.Rmd; do R -e "rmarkdown::render('$file')"; done
- analysis/flow/no_insert.md generates Fig. 1B.
- analysis/flow/10xaag_wt.md, analysis/flow/8xccg_wt.md, and analysis/flow/cgg_position_number.md generate Fig. 1C left panels, 1C middle panels, and 1C right panels respectively.
- analysis/flow/cgg_position_number.md generates Fig. S1B.
- analysis/flow/lowmedhigh_8xcgg_4ko.md generates Fig. 5A.
- analysis/flow/hel2_asc1_mutants.md generates Fig. 5C top panels and 5C bottom panels. The P-values indicated in Fig. 5C in the manuscript are also calculated and displayed in this page. Note: The mKate2 channel measurement did not work properly in this experiment. Hence the YFP fluorescence is not normalized by mKate2 fluorescence in these figures.
- analysis/flow/5xcgg_3ko.md and analysis/flow/5xcgg_asc1ko.md generate Fig. S4A left two panels and S4A right panels. Note: The measurement in the ΔASC1 strain background was very noisy due to poor growth in the first experiment. So this measurement was repeated with longer growth times and inoculation with larger S. cerevisiae colonies.
Un-cropped western blot images corresponding to Fig. 1D, 5C are provided as
.png images in data/western/.
The region in each image cropped for inclusion in the manuscript is shown as a rectangle.
The lanes are quantified using ImageJ (Rectangle Select → Analyze → Measure) and pasted as tab-delimited rows. This quantification for all lanes in the manuscript is in data/western/quantification.tsv.
Normalization of the lanes for display in figures is carried out in analysis/western/western_analysis.md.