G418_readthrough

All code used in analysis of G418 data is available here.

Exact code for generating figures is found in G418_readthrough/figures/figscripts, Example figures are found in G418_readthrough/figures/output_figures

Clonning this repository and running Wangen_G418_workflow.py will regenerate the figures from Wangen and Green. 2019. (https://elifesciences.org/articles/52611)

In Wangen_g418_workflow.py, Change threadNumb to the desired number of threads, default is 40

* WARNING *

Running this workflow requires at least 1.1 Terabytes of free storage space to generate all necessary files

The pipeline is designed to be run on a computational server with at least 40 threads, and may take an extremely long time to complete if run locally. STAR also requires a considerable amount of memory to run this workflow with default settings. If memory is limiting, increase the --genomeSAsparseD value

Overview of Wangen_G418_workflow.py

Generate Genomes
- Download hg38 and Gencode annotations
- Parse the GTF file and choose single isoform for each gene
- Build all other necessary annotation files from parsed GTF
- Parse the GTF file and create a second GTF file that contains all valid transcripts
- Build non codon RNA depletion annotation files
- Build indexes for STAR alignments
Raw Sequencing Data
- Download FASTQ files from SRA Pending release of data
- create merged FASTQ files of replicates for select ribosome profiling experiments
Ribosome Profiling data processing
- run main analysis pipeline on ribosome profiling data, building all files required for generation of figures
- mapping to all possible transcripts is only performed for select datasets, as this takes a long time
RNAseq data processing
- run main analysis pipeline on RNAseq data, building all files required for generation of figures
Plot Figures
- Run individual scripts that generate all figures in the manuscript
- Figures can be compared to examples in output_figures to validate successful completion of analysis

Description of subfolders:

Data contains raw luciferase measurments. All raw and processed data files will be saved here
figures contains scripts for generating all figures in figures/figscripts and example output figures in figures/output_figures
genomes contains currated refseq annotations of rRNA in fasta format. All annotation files will be created in this directory
GFF contains a python module that must be coppied to python2.7/site-packages for import into python. Can also be found here: https://github.com/chapmanb/bcbb/tree/master/gff/BCBio/GFF
riboseq contains scripts for processing ribosome profiling data
RNAseq contains scripts for processing RNAseq data
utils contains scripts for generating annotation files

To run the analysis pipeline, clone the repository and run the Wangen_G418_workflow.py script

All command line utilities must be downloaded and added to $PATH

REQUIREMENTS:

Command Line Utilities, added to $PATH:

tally, 15-065: (https://www.ebi.ac.uk/research/enright/software/kraken)
seqtk, 1.0-r31: (https://github.com/lh3/seqtk)
skewer, 0.2.2: (https://github.com/relipmoc/skewer)
STAR, STAR_2.5.3a_modified: (https://github.com/alexdobin/STAR)
pigz, 2.3.1: (https://zlib.net/pigz/)
samtools, 0.1.19-96b5f2294a:(https://github.com/samtools/samtools)
kpLogo (http://kplogo.wi.mit.edu/)
kentUtils (https://github.com/ENCODE-DCC/kentUtils)
sra-tools (https://github.com/ncbi/sra-tools)

python 2.7, install using pip:

built in modules do not have version number listed

os
sys
ftplib
subprocess
glob
struct
ast
time
datetime
collections
importlib
math
multiprocessing
urllib
argparse 1.1
pandas 0.22.0
numpy 1.14.0
pysam 0.13
scipy 1.0.1
statsmodels 0.9.0
twobitreader 3.1.5
Bio 1.58
pathos 0.2.1
matplotlib 2.2.2
seaborn 0.9.0
csv 1.0
lifelines 0.19.5
logomaker 0.8
scikit-learn 0.20.3
GFF copy and paste GFF folder into ~/lib/python2.7/site-packages/

R 3.4.3:

ggplot2 2.2.1
plyr 1.8.4
reshape2 1.4.3
scales 0.5.0
xtail 1.1.5
DESeq2 1.18.1
glue 1.3.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

GFF

GFF

RNAseq

RNAseq

figures

figures

genomes

genomes

riboseq

riboseq

utils

utils

.DS_Store

.DS_Store

LICENSE

LICENSE

README.md

README.md

Wangen_G418_workflow.py

Wangen_G418_workflow.py

Repository files navigation

G418_readthrough

* WARNING *

Overview of Wangen_G418_workflow.py

Description of subfolders:

REQUIREMENTS:

Command Line Utilities, added to $PATH:

python 2.7, install using pip:

R 3.4.3:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
Data		Data
GFF		GFF
RNAseq		RNAseq
figures		figures
genomes		genomes
riboseq		riboseq
utils		utils
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
Wangen_G418_workflow.py		Wangen_G418_workflow.py

License

elifesciences-publications/G418_readthrough

Folders and files

Latest commit

History

Repository files navigation

G418_readthrough

*** WARNING ***

Overview of Wangen_G418_workflow.py

Description of subfolders:

REQUIREMENTS:

Command Line Utilities, added to $PATH:

python 2.7, install using pip:

R 3.4.3:

About

Resources

License

Stars

Watchers

Forks

Languages

* WARNING *