Skip to content
Genomewide ncRNA Annotation Pipeline
Perl Shell Perl 6
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
lib/Bio/Gorap
.gitignore
Gorap.pl
LICENSE
LICENSE.Rfam
LICENSE.Silva
Makefile.PL
README
data-13.tar.gz
parameter.txt
setup

README

================================================
Gorap 2  -  Genomewide ncRNA Annotation Pipeline
================================================

Gorap screens genomic sequences for all non-coding RNAs present in the Rfam database using 
- a generalized strategy and applying multiple filters 
- or specialized software. 
Gorap provides ncRNA based reconstruction of phylogenetic trees and is able to perform de novo predictions including TPM calculations from RNA-Seq experiments. RNA family specific screening options, threshold and constrains can be easily amended and completed by custom queries.

Goraps setup will install Bio::Gorap, which is a distribution of Perl modules to provide software wrappers and functions for efficient Fasta, GFF, Stockholm alignment file and taxonomic manipulation.

The installation process includes all necessary software and databases to run the Gorap pipeline.
- Infernal, Blast, RNAmmer, tRNAscan, Bcheck, CRT, RAxML, Mafft, Samtools
- Rfam, NCBI Taxonomy, Silva

Requirements: Linux/Unix or Mac with internet access and installed
- Perl, git, gcc, make, wget, curl

1   INSTALLATION
2   RUN GORAP
3   UPDATE GORAP
3.1 UPDATE LIBRARY
3.2 UPDATE DATABASES
4   CONFIGURATION FILES
4.1 ADD CUSTOM TOOLS
4.2 ADD CUSTOM CONSTRAINS
4.3 ADD CUSTOM FAMILY
5   RESULTS
6   MISSING STUFF
7   CONTACT


--------------
1 INSTALLATION
--------------

1) Set the GORAP variable, assigned to its installation directory
export GORAP=/path/to/install/dir
2) store the GORAP variable permanently to ensure that Gorap can always find the required tools
echo "export GORAP=$GORAP" >> ~/.bashrc
3) Download Gorap
git clone --recursive https://github.com/koriege/gorap.git
4) Enter the source directory
cd gorap
5) Checkout latest stable version
git checkout $(git describe --tags)
6) Install Gorap and all necessary tools
./setup -i all
7) Download latest NCBI Taxonomy
$GORAP/bin/gorap/Gorap.pl -update ncbi


-----------
2 RUN GORAP
-----------

- Get help
$GORAP/bin/gorap/Gorap.pl -h


- Screen example data for 6S RNA (Rfam entry 13), on 8 threads and taxonomic information 
Gorap.pl -q 13 -c 8 -k bac -s 562 -r 543 -i $GORAP/bin/gorap/example/ecoli.fa -b $GORAP/bin/gorap/example/ecoli.bam -g $GORAP/bin/gorap/example/ecoli.gff

- Run GORAP using a parameter file - adapt existing parameter file for your needs
Gorap.pl -file parameter.txt

- Hint 1: Multiple runs can be saved into the same output directory
- Hint 2: Restart a run with same parameters, adding -skip option will skip annotation to perform only additional computations like TPM calculation, or phylogeny reconstruction
- Hint 3: Command line parameters priorize parameter file settings
-> e.g. Gorap.pl -file parameter.txt -skip -sort -l newlabel


- Example for annotation and SSU/RNome based phylogeny reconstruction (via -og) in one step
- A given outgroup genome will be screened only for ncRNAs predicted in genome files given by -i
Gorap.pl -i <FASTA>,<FASTA>,<FASTA>,<FASTA> -k bac -og <FASTA>

- 3-Step phylogeny reconstruction on hand curated alignments
- Using labels is highly recommended
Gorap.pl -i <FASTA>,<FASTA>,<FASTA>,<FASTA> -k bac -l mylabel
- Evaluate alignments by removing/adding (see meta) sequences from STK files, then update annotation files
Gorap.pl -i <FASTA>,<FASTA>,<FASTA>,<FASTA> -k bac -l mylabel -refresh
- Skip the annotation process, but do the downstream analysis for phylogeny reconstruction
Gorap.pl -i <FASTA>,<FASTA>,<FASTA> -og <FASTA> -k bac -l mylabel -skip


- Example with enabled de novo prediction and TPM calculation (via -b) from RNA-Seq experiments
- See options for calling piles: -minl -minh, -strand
perl Gorap.pl -i <FASTA>,<FASTA>,<FASTA> -k bac -b <BAM>,<BAM> -l mylabel -skip


--------------
3 UPDATE GORAP
--------------

3.1 UPDATE LIBRARY

1) Follow the INSTALLATION steps 3 to 5
2) Update Gorap
./setup -i update

3.2 UPDATE DATABASES

In case of updating to a newer Rfam release, you need to manually correct predefined constrains.
Therefore, please check the ERROR output and shift pre-annotated CD-box snoRNA box constrains towards matching:
- the C-box motif UGAUGA
- the D-box motif CUGA
Example:
conserved=uuGCAAUGAUGuUAagAAUUUCUUcacCUGAAuuaaaCcuUGAaGuucAAAaauCGAGCUUUUUAACaCUGAGCaaa
constrain=.....|..1...|......................................................|.1..|....
constrain=......|0.|...........................................................|0.|....

- Update all databases
Gorap.pl -update all

- Or update single databases
$GORAP/bin/gorap/Gorap.pl -update rfam
$GORAP/bin/gorap/Gorap.pl -update ncbi
$GORAP/bin/gorap/Gorap.pl -update silva


---------------------
4 CONFIGURATION FILES
---------------------

Configuration files are located in the $GORAP/gorap/config/ directory to set family specific thresholds, prediction tools and sequence mismatch constrains, which can be changed here.

4.1 ADD CUSTOM TOOLS

Requirement: New tools output needs to be GFF3 format

In the [cmd] section, cli parameters are stored for each tool applied.
Available placeholders:
- $genome (replaced by a fasta file location)
- $kingdom (replaced by bac, arc, euk, fungi or virus)
- $cpus (replaced by number of threads to use)
- $output (replaced by a temporary file location. default: STDOUT will be parsed)

Example:
tool=barrnap
parameter=--threads $cpus --kingdom $kingdom $genome

4.2 ADD CUSTOM CONSTRAINS

Use constrains to set the number of allowed mismatches in a specific region.
The given STK consensus can be modified according to IUPAC, but must not change in length.

Example:
conserved=uuGCAAUGAUGuUAagAAUUUCUUcacCUGAAuuaaaCcuUGAaGuucAAAaauCGAGCUUUUUAACaCUGAGCaaa
constrain=.....|..0..|.......................................................|.2..|....

4.3 ADD CUSTOM FAMILY

Create a new configuration file in the $GORAP/gorap/config/ directory, with a similar naming scheme (RFxxxxx_description.cfg) containing required paths to a Stockholm alignment file and its covariance model.
To use the default screening methods Infernal and/or Blast add the following cli commands to the [cmd] section
tool=Infernal
tool=Blast


---------
4 RESULTS
---------

Browse the results by using a webbrowser to open index.html or check out the following directories

1) annotations
*orig - files with original sequence ids
*passed - files with predictions that passed all filters

GFF filter tags:
L - Length cutoff if predicted gene is shorter than 40% of consensus reference
B - Bitscore cutoff as minimum of scores by taxonomic related species or Rfam suggested noise cutoff
S - Structure based rejection if predicted gene holds less than 50% conserved hairpins
P - Primary sequence conservation based rejection, if predicted gene has less than 70% of strongly (>=90%) conserved nucleotides in Rfam
	Exception: snoRNAs, which are screened for special properties and proper box motifs
C - Copy number cutoff as defined in configuration file
O - Overlap based rejection if higher scored annotation exists due to multi- tool or kingdom screening
X - Tag for manually deleted sequences after Gorap refresh
! - Passed tag - all filters passed

2) alignments - in Stockholm format of filter passed predictions

3) phylogeny in newick and PDF format
RNome - all ncRNA predictions (except rRNAs and tRNAs), using a super-matrix approach
core50RNome - ncRNA predictions (except rRNAs and tRNAs), present in >=50% of given input FASTA files, using a super-matrix approach
coreRNomeSTK - ncRNAs present in all species, built from concatenated Stockholm alignments
coreRNomeMAFFT - ncRNAs present in all species, built by Mafft from concatenated FASTA sequences

4) meta - intermediate results for each filter applied


---------------
5 MISSING STUFF
---------------

Suggestions? Bugs? Troubles? Please let me know!


---------
6 CONTACT
---------

Konstantin Riege
konstantin{.}riege{a}uni-jena{.}de
konstantin{a}bioinf{.}uni-leipzig{.}de
konstantin.riege{a}leibniz-fli{.}de
You can’t perform that action at this time.