Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 

Inverted Translational Control of Eukaryotic Gene Expression by Ribosome Collisions

Park H, Subramaniam AR. PLoS Biol 17(9): e3000396 (2019)

https://doi.org/10.1371/journal.pbio.3000396

http://rasilab.fredhutch.org/

This repository contains raw experimental data, code, and instructions for:

  • running simulations
  • analyzing high-throughput sequencing data and flow cytometry data generated for this study
  • analysis of publicly available datasets
  • quantification of western blots
  • generating figures in the manuscript

Contents

Source Data for Figures

Modeling

Default Run

To run the simulations, install our lab’s customized versions of:

The instructions for installing the above software are provided in the respective links.

Our kinetic model for quality control during eukaryotic translation is defined in modeling/tasep.py. This model is defined using the PySB syntax. To simulate this model with its default parameters, run:

cd modeling
python tasep.py

The above run displays the following output:

BioNetGen version 2.4.0
Reading from file ./tasep.bngl (level 0)
Read 32 parameters.
Read 5 molecule types.
Read 7 observable(s).
Read 2 species.
Read 9401 reaction rule(s).
WARNING: writeFile(): Overwriting existing file ./tasep.xml.
Wrote model in xml format to ./tasep.xml.
Finished processing file ./tasep.bngl.
CPU TIME: total 96.71 s.

NFsim -xml ./tasep.xml -sim 100000 -oSteps 10 -seed 111 -o ./tasep.gdat -rxnlog ./tasep.rxns.tsv -utl 3 -gml 1000000 -maxcputime 6000 -connect

# starting NFsim v1.11...
# seeding random number generator with: 111
# reading xml file (./tasep.xml)
-------]
# preparing simulation...
Connectivity inferred for 1000 reactions.
Connectivity inferred for 2000 reactions.
Connectivity inferred for 3000 reactions.
Connectivity inferred for 4000 reactions.
Connectivity inferred for 5000 reactions.
Connectivity inferred for 6000 reactions.
Connectivity inferred for 7000 reactions.
Connectivity inferred for 8000 reactions.
# equilibrating for :0s.
# simulating system for: 1.000000e+05 second(s).

Sim time: 0.000000e+00	CPU time (total): 6.590000e-04s	 events (step): 0
Sim time: 1.000000e+04	CPU time (total): 1.406101e+02s	 events (step): 976356
Sim time: 2.000000e+04	CPU time (total): 2.650203e+02s	 events (step): 787224
Sim time: 3.000000e+04	CPU time (total): 3.262947e+02s	 events (step): 429620
Sim time: 4.000000e+04	CPU time (total): 4.007395e+02s	 events (step): 446252
Sim time: 5.000000e+04	CPU time (total): 4.829204e+02s	 events (step): 552178
Sim time: 6.000000e+04	CPU time (total): 6.370497e+02s	 events (step): 928650
Sim time: 7.000000e+04	CPU time (total): 7.528471e+02s	 events (step): 763216
Sim time: 8.000000e+04	CPU time (total): 8.328361e+02s	 events (step): 527655
Sim time: 9.000000e+04	CPU time (total): 9.455520e+02s	 events (step): 734622
Sim time: 1.000000e+05	CPU time (total): 1.052860e+03s	 events (step): 682986

# simulated 6828760 reactions in 1.052872e+03s
# 6.485838e+03 reactions/sec, 1.541821e-04 CPU seconds/event
# null events: 0 1.541821e-04 CPU seconds/non-null event
# done.  Total CPU time: 1195.79s

CPU times will be a bit different depending on the machine.

At the end of the run, tasep.params.tsv.gz, tasep.gdat, and tasep.rxns.tsv files should be present in the modeling/ folder.

Parameter Sweep

Simulations with systematic variation of parameters are run from the 9 sub-directories in modeling/simulation_runs/. Each of these sub-directories contains a Snakemake workflow that chooses the parameters, runs the simulations, tabulates the summary data, and generates figures. Below, we describe this workflow using a specific example in the modeling/simulation_runs/csat_model_vary_num_stalls sub-directory that generated Fig. 3C in our paper. All other sub-directories contain a very similar workflow.

For the set of 130 simulations in modeling/simulation_runs/csat_model_vary_num_stalls, the number of consecutive stall-encoding codons in the collision-stimulated abortive termination (CSAT) model is systematically varied. The parameters that are varied from their default values are chosen in modeling/simulation_runs/csat_model_vary_num_stalls/choose_simulation_parameters.py and written as a tab-separated file modeling/simulation_runs/csat_model_vary_num_stalls/sim.params.tsv in the same directory. The script modeling/simulation_runs/csat_model_vary_num_stalls/run_simulation.py runs the simulation with a single parameter set. This parameter set is decided by the single argument to this script which specifies the row number in modeling/simulation_runs/csat_model_vary_num_stalls/sim.params.tsv. The script modeling/simulation_runs/csat_model_vary_num_stalls/run_simulation.py invokes modeling/get_mrna_lifetime_and_psr.R to parse the raw reaction firing data and calculates the mean and standard deviation of four observables: protein synthesis rate, mRNA lifetime, ribosome collision frequency, and abortive termination frequency for each mRNA during its lifetime. These summary statistics are tabulated for all parameter combinations using the script modeling/combine_lifetime_and_psr_data.R which generates the tsv files in modeling/simulation_runs/csat_model_vary_num_stalls/tables/. The tabulated summary statistics are analyzed and plotted in the RMarkdown script modeling/simulation_runs/csat_model_vary_num_stalls/analyze_results.Rmd, which when knitted, results in the Github-flavored Markdown file modeling/simulation_runs/csat_model_vary_num_stalls/analyze_results.md and the figures in modeling/simulation_runs/csat_model_vary_num_stalls/figures/.

modeling/simulation_runs/csat_model_vary_num_stalls/Snakefile implements the above described workflow. Simulations are often run on a cluster using the cluster configuration modeling/simulation_runs/csat_model_vary_num_stalls/cluster.yaml.

To invoke the above workflow, run:

cd modeling/simulation_runs/csat_model_vary_num_stalls
# check what will be run using a dry run
snakemake -np
# use a SLURM cluster for running simulations
sh submit_cluster.sh > submit.log 2> submit.log &
# uncomment line below to run everything locally; can take a very long time!!
# snakemake

All the simulations in this work can be run in a single workflow using modeling/Snakefile, but this is not typically recommended unless you are re-running only a few simulations.

Data Analysis

High-Throughput Sequencing

data/htseq/ contains the annotations for the reporter and Illumina multiplexing barcodes used for measuring mRNA levels:

Raw sequencing data in .fastq format must be downloaded to the data/htseq/ folder.

The number of Illumina sequencing reads aligning to each barcode in each sample is counted using analysis/htseq/count_barcodes.py. These counts are available as .tsv files in analysis/htseq/tables/.

The tabulated counts are processed and plotted in analysis/htseq/analyze_barcode_counts.Rmd to generate Fig. 2B, 2C, and 5C in the manuscript. The knitted code and figures from this analysis can be browsed at analysis/htseq/analyze_barcode_counts.md.

The above steps are implemented as a Snakemake workflow in analysis/htseq/Snakefile. The workflow can be run locally or on a SLURM cluster by:

cd analysis/htseq
# local run
snakemake
# cluster run
sh submit_cluster.sh > submit.log 2> submit.log &

This workflow can be visualized by:

snakemake --forceall -dag | dot -Tpng -o dag.png

which produces the following graph: analysis/htseq/dag.png

This workflow generates Fig. 2B, 2C, 5B, and S4B.

Flow Cytometry

data/flow/ contains the annotations for the 9 flow cytometry experiments in our work.

analysis/flow/ contains the RMarkdown scripts for generating figures from the raw data and annotations.

The RMarkdown scripts can be knitted to generate the figures by:

cd analysis/flow
for file in *.Rmd; do R -e "rmarkdown::render('$file')"; done

Western Blots

Un-cropped western blot images corresponding to Fig. 1D, 5B, S4C are provided as .png images in data/western/. The region in each image cropped for inclusion in the manuscript is shown as a rectangle.

The lanes are quantified using ImageJ (Rectangle Select → Analyze → Measure) and pasted as tab-delimited rows. This quantification for all lanes in the manuscript is in data/western/quantification.tsv.

Normalization of the lanes for display in figures is carried out in analysis/western/western_analysis.md.

The LTN1Δ western blot gel for Fig. 5B had a splotch near the truncated band region (see here), so we repeated this western blot for Fig. S4C (see here) for responding to a reviewer’s comment.

Identification of putative RQC stalls

To identify putative RQC stalls used in Fig. 6, the gene-level annotations in GFF3 format were downloaded for the saccer3 genomic assembly: https://downloads.yeastgenome.org/sequence/S288C_reference/genome_releases/S288C_reference_genome_R64-1-1_20110203.tgz.

These were analyzed using analysis/public_datasets/rqc_stalls_in_yeast_orfs/scripts/analyze_rqc_stalls_in_genome.md and analysis/public_datasets/rqc_stalls_in_yeast_orfs/scripts/count_rqc_residues.py to generate the putative RQC stalls/controls and their locations in yeast ORFs: analysis/public_datasets/rqc_stalls_in_yeast_orfs/tables/ngrams_annotated.tsv and analysis/public_datasets/rqc_stalls_in_yeast_orfs/tables/ngram_control_annotated.tsv.

Analysis of Dvir et al. 2013 data

Supplementary table S1 was downloaded from http://www.pnas.org/lookup/suppl/doi:10.1073/pnas.1222534110/-/DCSupplemental/sd01.xlsx.

This data is analyzed in analysis/public_datasets/dvir_2013_kozak_library/scripts/plot_kozak_strength.md to generate Fig. S1C.

Analysis of Weinberg et al. 2016 data

The annotations for the SRA experiment were downloaded using the script: analysis/public_datasets/weinberg_2016_riboseq/scripts/downloadannotations.py.

The URL in the annotations were used to download the .sra files and convert them to .fastq.gz files using the script: analysis/public_datasets/weinberg_2016_riboseq/scripts/downloaddata.py.

The raw reads were trimmed, aligned to the transcriptome, and used for calculating transcriptomic coverage using the workflow: analysis/public_datasets/weinberg_2016_riboseq/Makefile.

The transcriptomic coverage was used to calculate the ribosome density profile around RQC stalls and controls in the script: analysis/public_datasets/weinberg_2016_riboseq/scripts/plot_ribo_density_around_rqc_stalls.md. This generates Fig. 6A and S5A.


To generate Fig. 6C, the RPKM values for RNA-seq and Ribo-seq were downloaded from GEO:

wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE75nnn/GSE75897/suppl/GSE75897_RPF_RPKMs.txt.gz
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE75nnn/GSE75897/suppl/GSE75897_RiboZero_RPKMs.txt.gz
# this is the original data from which the above samples were renanalyzed.
wget ftp://ftp.ncbi.nlm.nih.gov/geo/series/GSE53nnn/GSE53313/suppl/GSE53313_Cerevisiae_RNA_RPF.txt.gz

These are analyzed in the script: analysis/public_datasets/weinberg_2016_riboseq/scripts/analyze_te_genes.md to generate 6C. The P-values for this figure panel are also calculated in this script.


To generate Fig. S5B, the transcriptome-aligned reads from above were analyzed in the script: analysis/public_datasets/weinberg_2016_riboseq/scripts/plot_te_for_only_preceding_stall_region.md. The P-values for this figure panel are also calculated in this script.

Analysis of Sitron et al. 2017 data

The raw .fastq files were obtained from Dr. Onn Brandman.

The raw reads were trimmed, aligned to the transcriptome, and used for calculating total read counts for each ORF using the workflow: analysis/public_datasets/sitron_2017_rqc_riboseq/Snakefile. The workflow was run on a cluster using the submission script: analysis/public_datasets/sitron_2017_rqc_riboseq/submit_cluster.sh.

The total read counts and their fold change between HEL2Δ + ASC1Δ strains and WT strains were calculated in the script: analysis/public_datasets/sitron_2017_rqc_riboseq/scripts/analyze_gene_fold_change.md to generate Fig. 6D. The P-values for this figure panel are also calculated in this script.

About

Simulation and analysis code for manuscript

Resources

Releases

No releases published

Packages

No packages published