Code for data analysis and reproduction of figures
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data/NA43_activity_expression_titers
figures
scripts
supplemental_files
README.org

README.org

Code for data analysis and reproduction of figures

  • scripts contains all the data analysis code
  • tables contains all summary tables generated from analysis
  • figures contains all main and supplemental figures generated from analysis

Analysis Steps

All .Rmd scripts below should be called from within R or with the R -e switch from the command line.

Trim reads, discard rRNA-aligning reads and align rest to human and flu genomes

Example of a single sample is shown. Run as a loop to process all samples, or run it as separate jobs on a cluster.

Makefile SAMPLE=ltm_untr

Calculate coverage at each genomic position

Single example is shown. Run this with the reference genome as either gencode or flu, strand as either plus or minus=, and the sample name of one of the 12 samples in this study. genome is just an extra variable that can also be set to transcript to get coverage along transcripts (not used).

# for Ribo-seq or Ribo-seq + LTM samples
Rscript calculate_coverage.R ltm_untr gencode genome plus
# for mRNA samples
Rscript calculate_coverage_mrna.R mrna_untr gencode genome plus

Calculate metagene profile at start and stop codons

rmarkdown::render('analyze_flu_called_start_sites.Rmd')
rmarkdown::render('analyze_host_called_start_sites.Rmd')

Plot statistics of various read pre-processing steps until alignment

rmarkdown::render('plot_read_preprocessing_stats.Rmd')

Plot statistics of aligned reads

rmarkdown::render('plot_alignment_stats.Rmd')

Find alignment to different NP variants

Example for a single sample is shown.

Rscript find_np_alignments.R ltm_vir

Plot NP alignment statistics

rmarkdown::render('analyze_np_ctg.Rmd')

Pool counts at neighboring sites for calling start sites

Example for a single sample, strand, and reference is shown. Do this spe Run this with the reference genome as either gencode or flu, strand as either plus or minus=, and the sample name of one of the Ribo-seq or Ribo-seq + LTM samples in this study. This script uses the coverage files calculated above as input.

Rscript pool_neighbors.R coverage/gencode/ltm_vir.gencode.plus.genome.tsv.gz

Call start sites on flu and host transcripts

rmarkdown::render('call_flu_start_sites.Rmd')
rmarkdown::render('call_host_start_sites.Rmd')

Annotate the called start sites on transcripts and plot basic summary statistics

rmarkdown::render('analyze_flu_called_start_sites.Rmd')
rmarkdown::render('analyze_host_called_start_sites.Rmd')

Plot statistics of the called start sites on flu and host transcripts

rmarkdown::render('plot_flu_called_start_stats.Rmd')
rmarkdown::render('plot_host_called_start_stats.Rmd')

Plot NP322 supplemental figures

NP_highCTG_supp_figs.ipynb

Plot NA43 codon conservation, activity assays, and viral titers

NA43_activity_expression_codoncons_and_viraltiters.ipynb

Analyze and plot NA43 viral competition assays

NA43_competition.ipynb

Influenza CTG evolution

Parse and align influenza sequences

python get_human_seqs.py 
python get_humanH5N1_seqs.py
python get_classical_swine_seqs.py
python get_avian_seqs.py

Plot influenza CTG evolution

Influenza_CTG_evolution.ipynb

Generate low and high CTG PR8 NP sequences

python redesign_sequences.py