Skip to content

scripts for analyzing data and generating figures in the paper

Notifications You must be signed in to change notification settings

rasilab/adarnell_2018

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for analysis : Darnell, Subramaniam, and O'Shea 2018

This github repository includes jupyter notebooks (and associated static HTML files) for reproducing all plots in the paper, and R scripts for sequencing data analysis, in scripts/

Table of Contents

Annotation files

The following bowtie indices / annotation files should be generated in new folders called bowtie_indices and sequence_annotation_files before running the script below. These files are not provided in the repository due to their large size, but instructions for generating them are included below.

  • bowtie index for subtractive alignment of rRNA contaminants: hg38.rrna
  • transcriptome rsem reference bowtie index for alignment: genome_rsem/hg38.gencode.v24.rsem/bowtie
  • annotations in fai format for preparing transcriptome rsem reference bowtie index: hg38.fa.masked
  • canonical CCDS subset annotations for assigning reads after alignment: gencode.v24.canonical_ccds_transcripts.20170315.gff3
  • mouse gencode annotations for matching ATF4/CHOP targets to human homologs: gencode.vM9.annotation.gff3.gz

Generating bowtie indices and annotation files

Required software

  • bowtie/1.1.1
  • rsem/1.3.0
  • samtools/1.4.0
  • gzip/1.3.12
  • R/3.3.3
  • python/2.7.13

Download, unzip, and index hg38 genome annotations (hg38.fa.masked)

rsync -avzP rsync://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.masked.gz .
gunzip hg38.fa.masked.gz
samtools faidx hg38.fa.masked

Download Gencode annotations and extract unique CCDS annotations to a gff3 file (gencode.v24.canonical_ccds_transcripts.20170315.gff3)

wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_24/gencode.v24.annotation.gff3.gz
gunzip gencode.v24.annotation.gff3.gz
Rscript make_unique_ccds_transcripts.R

Download mouse Gencode annotations

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M9/gencode.vM9.annotation.gff3.gz

Download rRNA sequences from NCBI, write to fasta, and create human rRNA bowtie index

from Bio import Entrez
from Bio import SeqIO

Entrez.email = 'YOUR_EMAIL_ADDRESS@XYZ.EDU'

refseqids = {'28S' : 'NR_003287.2',
             '18S' : 'NR_003286.2',
             '5.8S' : 'NR_003285.2',
             '5S' : 'NR_023363.1',
            }
outputlist = []
for (name,ID) in refseqids.items():
    # identify ids by searching
    rec = Entrez.read(Entrez.esearch(db="nucleotide", term=ID))
    # retrieve full record for the found id
    handle = Entrez.efetch(db="nucleotide", id=rec["IdList"][0], rettype="fasta")
    output = SeqIO.read(handle, 'fasta')
    # clean up record to be minimalistic
    output.id = ID
    output.description = output.name = ''
    outputlist.append(output)
    
SeqIO.write(outputlist, 'sequence_annotation_files/rrna.fa', 'fasta')
/app/bowtie/1.1.1/bowtie-build sequence_annotation_files/rrna.fa hg38.rrna

Prepare gencode transcriptome index (hg38.gencode.v24.rsem) with rsem for alignment

rsem-prepare-reference --bowtie-path /app/bowtie/1.1.1 --bowtie --gff3 sequence_annotation_files/gencode.v24.annotation.gff3 sequence_annotation_files/hg38.fa.masked bowtie

Pipeline for processing ribosome profiling sequencing data

Starting with .fastq files.

Required software

  • bowtie/1.1.1
  • cutadapt/1.4.1
  • R/3.3.3
  • rsem/1.3.0

Make a directory for each sample

for SAMPLE in `ls rawdata/*.fq.gz | cut -f1 -d. | cut -f2 -d/`;
do
    mkdir processeddata/$SAMPLE; 
    echo $SAMPLE; 
done

Trim reads

for SAMPLE in `ls rawdata/*.fq.gz | cut -f1 -d. | cut -f2 -d/`; 
do 
    cutadapt --adapter=AAAAAAAAAA --minimum-length=13 --discard-untrimmed --output processeddata/$SAMPLE/$SAMPLE.trim.fq rawdata/$SAMPLE.R1.fq.gz; 
done

Subtractive alignment to rRNA

for SAMPLE in `ls processeddata/*/*.trim.fq | cut -f1 -d. | cut -f2 -d/`; 
do 
    bowtie --seedlen=23 --threads=8 --un processeddata/$SAMPLE/norrna.fq --sam bowtie_indices/hg38.rrna processeddata/$SAMPLE/$SAMPLE.trim.fq; 
done

Align to transcriptome

for SAMPLE in `ls processeddata/*/*.norrna.fq | cut -f1 -d. | cut -f2 -d/`; 
do
    rsem-calculate-expression --num-threads 8 --output-genome-bam --sort-bam-by-coordinate  processeddata/$SAMPLE/$SAMPLE.norrna.fq /n/osheafs1/LAB/alicia/bioinformatics/bowtie_indices/genome_rsem/hg38.gencode.v24.rsem/bowtie processeddata/$SAMPLE/gencode; 
done

Calculate codon-specific ribosome density

for SAMPLE in `ls processeddata/*/*norrna* | cut -f1 -d. | cut -f2 -d/`; 
do 
    Rscript scripts/calculate_codon_ribo_density_150overhang_20170807.R $SAMPLE; echo $SAMPLE; 
done

Run R script to calculate gene fold changes

for SAMPLE in `ls processeddata/*/*norrna* | cut -f1 -d. | cut -f2 -d/`; 
do 
    Rscript scripts/calculate_gene_fold_change_20170615.R $SAMPLE; echo $SAMPLE; 
done

About

scripts for analyzing data and generating figures in the paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published