Copyright 2011, Jonas Maaskola. This is free software under the GPL version 3, or later. See the file COPYING for detailed conditions of distribution.
This software package also contains code coming from an R library:
Mathlib : A C Library of Special Functions
Copyright (C) 2005-6 Morten Welinder terra@gnome.org
Copyright (C) 2005-10 The R Foundation
Copyright (C) 2006-10 The R Core Development Team
Mathlib is free software and distributed under the GNU GPL version 2.
This software package uses routines from Mathlib to compute Chi-Square distribution probabilities by means of the incomplete Gamma function.
The tool is described in the following open-access article:
Jonas Maaskola and Nikolaus Rajewsky.
Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models
Nucleic Acid Research, 42(21):12995-13011, Dec 2014. doi:10.1093/nar/gku1083
The package includes UNIX man pages for the programs.
The sub-directory doc contains a manual for this package, written in LaTeX. A PDF version of the manual will be generated during the build process of this package.
There's a module in development to use Discrover inside the bioinformatics web framework Galaxy. You can find it here.
Binary packages of Discrover are available for select Linux distributions. Notice that for Ubuntu a PPA has been set up that can also be used for installing Discrover. Instructions on how to install the packages (and how to use the PPA) or how to manually build Discrover are available in separate files.
The synthetic sequence data used in the publication for motif discover performance evaluation are available here.
Below is a minimal description on how to use this package. Please refer to the UNIX man pages, the manual, and the command line help for more information.
The package contains two main programs: plasma
and discrover
.
plasma
is used to find IUPAC regular expression type motifs, and discrover
learns HMMs.
Both use discriminative objective functions.
If no seeds are specified for discrover
, plasma
will be used to find seeds automatically.
Command line help is available with discrover -h
or discrover --help
and, similarly, plasma -h
or plasma --help
.
Note that some infrequently used options are hidden by default, and may be shown with the verbose switch: discrover -hv
Even more obscure options are available by adding the very verbose switch: discrover -hV
Both plasma
and discrover
can generate sequence logos (and discrover
does so by default).
The same sequence logo creation routines are also available in the separate program discrover-logo
.
When plasma
and discrover
are given just a single FASTA file for analysis, they will automatically shuffle the sequences to create control sequences.
You can use the same sequence shuffling routines via the separate program discrover-shuffle
.