Skip to content

maaskola/discrover

Repository files navigation

Build Status Coverity Scan Build Status

Discrover

Discover discriminative sequence motifs

Copyright 2011, Jonas Maaskola. This is free software under the GPL version 3, or later. See the file COPYING for detailed conditions of distribution.

This software package also contains code coming from an R library:
Mathlib : A C Library of Special Functions
Copyright (C) 2005-6 Morten Welinder terra@gnome.org
Copyright (C) 2005-10 The R Foundation
Copyright (C) 2006-10 The R Core Development Team
Mathlib is free software and distributed under the GNU GPL version 2.
This software package uses routines from Mathlib to compute Chi-Square distribution probabilities by means of the incomplete Gamma function.

Publication

The tool is described in the following open-access article:
Jonas Maaskola and Nikolaus Rajewsky. Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
Nucleic Acid Research, 42(21):12995-13011, Dec 2014. doi:10.1093/nar/gku1083

Documentation

The package includes UNIX man pages for the programs.

The sub-directory doc contains a manual for this package, written in LaTeX. A PDF version of the manual will be generated during the build process of this package.

Galaxy front-end

There's a module in development to use Discrover inside the bioinformatics web framework Galaxy. You can find it here.

Obtaining Discover

Binary packages of Discrover are available for select Linux distributions. Notice that for Ubuntu a PPA has been set up that can also be used for installing Discrover. Instructions on how to install the packages (and how to use the PPA) or how to manually build Discrover are available in separate files.

Sequence data

The synthetic sequence data used in the publication for motif discover performance evaluation are available here.

Usage

Below is a minimal description on how to use this package. Please refer to the UNIX man pages, the manual, and the command line help for more information.

The package contains two main programs: plasma and discrover.

plasma is used to find IUPAC regular expression type motifs, and discrover learns HMMs. Both use discriminative objective functions. If no seeds are specified for discrover, plasma will be used to find seeds automatically.

Command line help is available with discrover -h or discrover --help and, similarly, plasma -h or plasma --help.

Note that some infrequently used options are hidden by default, and may be shown with the verbose switch: discrover -hv

Even more obscure options are available by adding the very verbose switch: discrover -hV

Sequence logos

Both plasma and discrover can generate sequence logos (and discrover does so by default). The same sequence logo creation routines are also available in the separate program discrover-logo.

Sequence shuffling

When plasma and discrover are given just a single FASTA file for analysis, they will automatically shuffle the sequences to create control sequences. You can use the same sequence shuffling routines via the separate program discrover-shuffle.