Code for data analysis and reproduction of figures
- scripts contains all the data analysis code
- tables contains all summary tables generated from analysis
- figures contains all main and supplemental figures generated from analysis
Analysis Steps
All .Rmd
scripts below should be called from within R or with the R -e
switch from the command line.
Trim reads, discard rRNA-aligning reads and align rest to human and flu genomes
Example of a single sample is shown. Run as a loop to process all samples, or run it as separate jobs on a cluster.
Makefile SAMPLE=ltm_untr
Calculate coverage at each genomic position
Single example is shown.
Run this with the reference genome as either gencode
or flu
, strand as either plus
or minus=
, and the sample name of one of the 12 samples in this study.
genome
is just an extra variable that can also be set to transcript
to get coverage along transcripts (not used).
# for Ribo-seq or Ribo-seq + LTM samples
Rscript calculate_coverage.R ltm_untr gencode genome plus
# for mRNA samples
Rscript calculate_coverage_mrna.R mrna_untr gencode genome plus
Calculate metagene profile at start and stop codons
rmarkdown::render('analyze_flu_called_start_sites.Rmd')
rmarkdown::render('analyze_host_called_start_sites.Rmd')
Plot statistics of various read pre-processing steps until alignment
rmarkdown::render('plot_read_preprocessing_stats.Rmd')
Plot statistics of aligned reads
rmarkdown::render('plot_alignment_stats.Rmd')
Find alignment to different NP variants
Example for a single sample is shown.
Rscript find_np_alignments.R ltm_vir
Plot NP alignment statistics
rmarkdown::render('analyze_np_ctg.Rmd')
Pool counts at neighboring sites for calling start sites
Example for a single sample, strand, and reference is shown. Do this spe
Run this with the reference genome as either gencode
or flu
, strand as either plus
or minus=
, and the sample name of one of the Ribo-seq or Ribo-seq + LTM samples in this study.
This script uses the coverage files calculated above as input.
Rscript pool_neighbors.R coverage/gencode/ltm_vir.gencode.plus.genome.tsv.gz
Call start sites on flu and host transcripts
rmarkdown::render('call_flu_start_sites.Rmd')
rmarkdown::render('call_host_start_sites.Rmd')
Annotate the called start sites on transcripts and plot basic summary statistics
rmarkdown::render('analyze_flu_called_start_sites.Rmd')
rmarkdown::render('analyze_host_called_start_sites.Rmd')
Plot statistics of the called start sites on flu and host transcripts
rmarkdown::render('plot_flu_called_start_stats.Rmd')
rmarkdown::render('plot_host_called_start_stats.Rmd')
Plot NP322 supplemental figures
NP_highCTG_supp_figs.ipynb
Plot NA43 codon conservation, activity assays, and viral titers
NA43_activity_expression_codoncons_and_viraltiters.ipynb
Analyze and plot NA43 viral competition assays
NA43_competition.ipynb
Influenza CTG evolution
Parse and align influenza sequences
python get_human_seqs.py
python get_humanH5N1_seqs.py
python get_classical_swine_seqs.py
python get_avian_seqs.py
Plot influenza CTG evolution
Influenza_CTG_evolution.ipynb
Generate low and high CTG PR8 NP sequences
python redesign_sequences.py