Ribosome footprinting pipeline

This is a simple based implementation of a Ribosome profiling pipeline

Installation

Requirements

snakemake 3.7.1 (3.8.0 is bad, later maybe ok)
R > 3.3.1 (In older version wig generation does not work)
- GenomicAlignments
- GenomicFeatures
- RiboProfiling
- rtracklayer
tophat
bowtie
FastX
samtools
cutadapt

Run

The pipeline is used as

snakemake [options] [target]

The possible targets are:

bams Generate BAM files with prefiltering of the ribosomal fragments
bamsnf without prefiltering
randomizedbams Generates BAMS where reads with double mapping are randomly selected. Be careful, it is a bit of a hack.
csv Generate WIGS, bigWigs, CSV with counts per gene (bams are, of course, also generated)

Run snakemake --lt to see all possible target rules, or look into Snakefile for details.

The suggested way to run is something like

snakemake -np csv

To check what is it going to do, and then start as

snakemake -p csv

run.sh contains an example setup to run on a LSF based cluster.

Input files

The samples in FASTQ format (preferrably compressed with gzip, bzip, or xz) should be in input directory.

data should contain yast.fa with the reference genome FASTA, yeast.gff with the gene annotations, and Yeast-Noncoding-extended.fa with the ribosomal RNA contaminating sequences to filter at the first stage. See next section on the naming for these files.

Configuration

The configuration is in the file config.json which looks like

{
"INDEX": "data/yeast",
"INDEX_NC": "data/Yeast-Noncoding-extended",
"GENOME_GFF": "data/yeast.gff",
"ADAPTER": "CTGTAGGCACCATCAATAGATCGGAAGAGCACACGTCTGAACTCCAGTCAC",
"MANUAL_SHIFTS_FILE": "shifts.csv"
}

The entries mean:

INDEX basename of the FASTA file with the reference sequence. In the example above the reference should be named data/yeast.fa
GENOME_GFF name of the genome annotation in gff format
ADAPTER Adapter to strip of the end of the sequence
MANUAL_SHIFTS_FILE Name of the csv with the manual selection of the reads and position of the P-sites. See P-site WIGs

P-site WIG generation

The WIG (or, currently, BedGraph) files are being generated by selecting only the reads with length that is close to the correct one, and then selecting the fixed offset into the read, which is individual for each read length. By default, the algorithm selects the length corresponding to the maximum of the length histogram +- one. The histogram with color-marked selected lengths is saved in wig/{sample}_Length.pdf. Then, for each of the lenght individually the metagene distribution of the starts of the reads relative to the start of the nearest OFR is plotted in wig/{sample}_Asiteshifts.pdf, that is P-site codon. Then, the corrsponding offset into the read is taken, which leads to the resulting bedgraph files.

However, if this fails, it is possible to create the MANUAL_SHIFTS_FILE with the sete of the read lengths and shifts for each sample. It has three columns: file name, length list (space separated), and shift list. the pipeline creates the CSV files with the values it found automatically in wigs/{sample}.shifts.csv, which can be used as the template to create the MANUAL_SHIFTS_FILE.

Notes on linkers/adapters in Ribosome Footprinting

Detailed sequence breakout for Ribosome footprint library

                                                                                      / This one (rApp) seems to not be present!
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCTATGCATGCATGCATGCATGCATGCATGCaCTGTAGGCACCATCAATAGATCGGAAGAGCACACGTCTGAACTCCAGTCACTGACCAATCTCGTATGCCGTCTTCTGCTTG
                                                          ^^^^Sequence^^^^^^^^^^^^^^^^-CTGTAGGCACCATCAATAGATCGGAAGAGC                     TGACCA
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT                                               GATCGGAAGAGCACACGTCTGAACTCCAGTCACNNNNNNATCTCGTATGCCGTCTTCTGCTTG
TruSeq Universal Adapter                                                                                 TruSeq Indexed Adapter           Index
AATGATACGGCGACCACCGA
Portion bind to flow cell
                                                                                                                                                    CGTATGCCGTCTTCTGCTTG
                                                                                                                                                    Binds to flow cell for paired end reads
                                                                                                         GATCGGAAGAGCACACGTCTGAACTCCAGTCAC -> Index read proceeds
                                                                                                         Index read primer
                         ACACTCTTTCCCTACACGACGCTCTTCCGATCT -> Read proceeds
                         Read 1 primer

Linkers (some references)

Adapter

5'rAppCTGTAGGCACCATCAAT/3ddC/3'
cloning linkers

AATGATACGGCGACCACCGAGATCTACAC
CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTCAGACGTGTGCTCTTCCG: IDX4

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile
cluster.json		cluster.json
config.json		config.json
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scripts

scripts

.gitignore

.gitignore

README.md

README.md

Snakefile

Snakefile

cluster.json

cluster.json

config.json

config.json

run.sh

run.sh

Repository files navigation

Ribosome footprinting pipeline

Installation

Requirements

Run

Input files

Configuration

P-site WIG generation

Notes on linkers/adapters in Ribosome Footprinting

Detailed sequence breakout for Ribosome footprint library

Linkers (some references)

About

Releases

Packages

Languages

fedxa/RPF_pipeline

Folders and files

Latest commit

History

Repository files navigation

Ribosome footprinting pipeline

Installation

Requirements

Run

Input files

Configuration

P-site WIG generation

Notes on linkers/adapters in Ribosome Footprinting

Detailed sequence breakout for Ribosome footprint library

Linkers (some references)

About

Resources

Stars

Watchers

Forks

Languages