Skip to content
Permalink
Browse files

Renamed flag to KeepEmpty

Add variable KeepFlank, and associated behavior
updated documentation

git-svn-id: svn://scit.us/dawg/stable@325 7e5b8c5c-87e2-0310-a459-8d91125b37d5
  • Loading branch information...
reedacartwright committed Apr 3, 2009
1 parent 74b5a15 commit 76e9b672de6aef32bf562ab27ab7cbe3ff13a0d3
Showing with 143 additions and 69 deletions.
  1. +51 −39 dawg.txt
  2. +56 −10 readme.txt
  3. +7 −5 src/dawg.cpp
  4. +1 −1 src/dawg.h
  5. +1 −1 src/output.cpp
  6. +24 −12 src/tree.cpp
  7. +3 −1 src/tree.h
@@ -1,54 +1,66 @@
COMMAND LINE USAGE
dawg -[scubvh?] file1 [file2...]
dawg -[scubvhqew?] [-o outputfile] file1 [file2...]
-s: process files serially [default]
-c: process files combined together
-u: unbuffered output
-b: buffered output [default]
-q: disable error and warning reports (quiet)
-e: enable error reports [default]
-w: enable warning reports [default]
-v: display version information
-h: display help information
-?: same as -h
-o outputfile: override ouput filename in the configuration file

Dawg will read stdin if filename is "-".

FILE FORMAT
The file format takes a series of statements in the form of "name = value,"
where "name" is alphanumeric and value can be a string, number, boolean,
tree, or vector of values. A single variable is equivalent to a vector of
a single entry.
The file format takes a series of statements in the form of
"name = value," where "name" is alphanumeric and value can be a string,
number, boolean, tree, or vector of values. A single variable is
equivalent to a vector of a single entry.

string: "[char-sequence]"
<<EOF [several lines] EOF
number: [sign]digits[.digits][(e|E)[sign]digits]
boolean: true|false
tree: Newick Format
vector: { value, value, ...}
string: "[char-sequence]"
'[char-sequence]'
"""[multi-line char-sequence]""" (rm initial and final newlines)
'''[multi-line char-sequence]''' (kp initial and final newlines)
number: [sign]digits[.digits][(e|E)[sign]digits]
boolean: true|false
tree: Newick Format
vector: { value, value, ...}

OPTIONS
Name Type Description

Name Type Description
--------------------------------------------------------------------------
Tree VT phylogeny
TreeScale N coefficient to scale branch lengths by
Sequence VS root sequences
Length VN length of generated root sequences
Rates VVN rate of evolution of each root nucleotide
Model S model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
Freqs VN nucleotide (ACGT) frequencies
Params VN parameters for the model of evolution
Width N block width for indels and recombination
Scale VN block position scales
Gamma VN coefficients of variance for rate heterogenity
Alpha VN shape parameters
Iota VN proportions of invariant sites
GapModel VS models of indel formation: NB|PL|US
Lambda VN rates of indel formation
GapParams VVN parameter for the indel model
Reps N number of data sets to output
File S output file
Format S output format: Fasta|Nexus|Phylip|Clustal
GapSingleChar B output gaps as a single character
GapPlus B distinguish insertions from deletions in alignment
EmptyCol B preserve empty columns in final alignment
LowerCase B output sequences in lowercase
Translate B translate outputed sequences to amino acids
NexusCode S text or file to include between datasets in Nexus format
Seed VN PRNG seed (integers)
Tree VT phylogeny
TreeScale N coefficient to scale branch lengths by
Sequence VS root sequences
Length VN length of generated root sequences
Rates VVN rate of evolution of each root nucleotide
Model S model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
Freqs VN nucleotide (ACGT) frequencies
Params VN parameters for the model of evolution
Width N block width for indels and recombination
Scale VN block position scales
Gamma VN coefficients of variance for rate heterogenity
Alpha VN shape parameters
Iota VN proportions of invariant sites
GapModel VS models of indel formation: NB|PL|US
Lambda VN rates of indel formation
GapParams VVN parameter for the indel model
Reps N number of data sets to output
File S output file
Format S output format: Fasta|Nexus|Phylip|Clustal
GapSingleChar B output gaps as a single character
GapPlus B distinguish insertions from deletions in alignment
KeepFlank N undeletable flanking regions N nucs from sequence
KeepEmpty B preserve empty columns in final alignment
LowerCase B output sequences in lowercase
Translate B translate outputed sequences to amino acids
Seed VN pseudo-random-number-generator seed (integers)
Out.Block.Head S string to insert at the start of the output
Out.Block.Tail S string to insert at the end of the output
Out.Block.Before S string to insert before a sequence set in the output
Out.Block.After S string to insert after a sequence set in the output
Out.Subst B do variable subsitution in Out.Block.*

@@ -1,36 +1,80 @@
DAWG VERSION 1.1-RELEASE
DAWG VERSION 1-STABLE

Copyright (C) (2005-2006) Reed A. Cartwright - All rights reserved.
Copyright (c) (2005-2009) Reed A. Cartwright - All rights reserved.

DESCRIPTION

Dawg is an application that will simulate nucleotide evolution with gaps.
Dawg is an application that will simulate nucleotide evolution with
gaps.

ABSTRACT

DNA Assembly with Gaps (Dawg) is an application designed to simulate the evolution of recombinant DNA sequences in continuous time based on the robust general time reversible model with gamma and invariant rate heterogeneity and a novel length-dependent model of gap formation. The application accepts phylogenies in Newick format and can return the sequence of any node, allowing for the exact evolutionary history to be recorded at the discretion of users. Dawg records the gap history of every lineage to produce the true alignment in the output. Many options are available to allow users to customize their simulations and results.

Many tools and procedures exist for reconstructing alignments and phylogenies and estimating evolutionary parameters from extant data. True phylogenies and alignments are known in very rare instances. In the absence of known data with true phylogenies, we are left with using simulations to test the accuracy of such procedures. Proper simulation of sequence evolution should involve both nucleotide substitution and indel formation. However, existing tools for simulating sequence evolution either do not include indels, like Seq-gen or evolver, or include a rather inexact model of indel formation, like Rose. I developed Dawg to fill in these gaps.
DNA Assembly with Gaps (Dawg) is an application designed to simulate the
evolution of recombinant DNA sequences in continuous time based on the
robust general time reversible model with gamma and invariant rate
heterogeneity and a novel length-dependent model of gap formation. The
application accepts phylogenies in Newick format and can return the
sequence of any node, allowing for the exact evolutionary history to be
recorded at the discretion of users. Dawg records the gap history of
every lineage to produce the true alignment in the output. Many options
are available to allow users to customize their simulations and results.

Many tools and procedures exist for reconstructing alignments and
phylogenies and estimating evolutionary parameters from extant data.
True phylogenies and alignments are known in very rare instances. In the
absence of known data with true phylogenies, we are left with using
simulations to test the accuracy of such procedures. Proper simulation
of sequence evolution should involve both nucleotide substitution and
indel formation. However, existing tools for simulating sequence
evolution either do not include indels, like Seq-gen or evolver, or
include a rather inexact model of indel formation, like Rose. I
developed Dawg to fill in these gaps.

CONTACT

racartwr@ncsu.edu or reed@scit.us

REFERENCE

Cartwright, R.A. (2005) DNA Assembly With Gaps (Dawg): Simulating Sequence Evolution. Bioinformatics 21 (Suppl. 3): iii31-iii38
Cartwright, R.A. (2005) DNA Assembly With Gaps (Dawg): Simulating
Sequence Evolution. Bioinformatics 21 (Suppl. 3): iii31-iii38

LICENSE

See COPYING for license information.

DOWNLOAD

Dawg can be downloaded from <http://scit.us/projects/dawg/>.

INSTALLATION

See INSTALL for installation instructions.
See Dawg's website for binary packages for Windows, Mac OSX, and other
systems. Alternatively, you can compile Dawg from the source. Dawg
requires CMake 2.6 (http://www.cmake.org/) to build it from sources. Many
Unix-like operating systems can install CMake through their package
systems. Extract the Dawg source code and issue the following commands in
the extracted directory:

DOWNLOAD
cmake .
make
make install

The '-G' option to cmake is used to specify different build systems, e.g. Unix
Makefiles versus KDevelop3 project. The '-D' option to cmake can be used to
set different cmake variables from the command line:

cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/usr .
make
make install

This will build an optimized version of Dawg and install it to '/usr/bin'.
To specify your own build flags you need to set the environment variables
CFLAGS and LDFLAGS as neccessary. Then specify

cmake -DCMAKE_BUILD_TYPE= .

Dawg 1.1 can be downloaded from <http://scit.us/projects/dawg/>.
See CMake's manual for additional information.

EXAMPLES

@@ -129,6 +173,8 @@ OPTIONS
Format S output format: Fasta|Nexus|Phylip|Clustal
GapSingleChar B output gaps as a single character
GapPlus B distinguish insertions from deletions in alignment
KeepFlank N undeletable flanking regions N nucs from sequence
KeepEmpty B preserve empty columns in final alignment
LowerCase B output sequences in lowercase
Translate B translate outputed sequences to amino acids
Seed VN pseudo-random-number-generator seed (integers)
@@ -215,6 +215,7 @@ bool Execute()
vector<double> vdGapModel[2];
vdGapModel[0].push_back(1.0);
vdGapModel[1].push_back(1.0);
unsigned int uKeepFlank = 0;

vector<NewickNode*> vtTrees;
double dTreeScale = 1.0;
@@ -230,7 +231,7 @@ bool Execute()
bool bOutSubst = true;

bool bGapSingle = false, bGapPlus = false, bLowerCase = false, bTranslate = false;
bool bEmptyCol = false;
bool bKeepEmpty = false;
unsigned int uWidth = 1;
DawgVar::Vec::size_type nRes;

@@ -301,7 +302,7 @@ bool Execute()
DawgVar::Get("GapPlus", bGapPlus);
DawgVar::Get("LowerCase", bLowerCase);
DawgVar::Get("Translate", bTranslate);
DawgVar::Get("EmptyCol", bEmptyCol);
DawgVar::Get("KeepEmpty", bKeepEmpty);

nRes = DawgVar::GetArray("Lambda", dLambda, 2, false);
if(nRes)
@@ -312,6 +313,7 @@ bool Execute()
if(nRes == 0)
return DawgError("\"GapParams\" must be specified if \"Lambda\" is.");
}
DawgVar::Get("KeepFlank", uKeepFlank);

DawgVar::Get("File", ssFile);
DawgVar::Get("Format", ssFormat);
@@ -413,7 +415,7 @@ bool Execute()

// Initialize Evolution
if(!myTree.SetupEvolution(dNucFreq, dRevParams, paramsIns, paramsDel,
uWidth, vdGamma, vdIota, vdScale, dTreeScale ))
uWidth, vdGamma, vdIota, vdScale, dTreeScale, uKeepFlank ))
return DawgError("Bad evolution parameters");

// Initialize Root
@@ -504,8 +506,8 @@ bool Execute()
uOutFlags |= FlagOutLowerCase;
if(bTranslate)
uOutFlags |= FlagOutTranslate;
if(bEmptyCol)
uOutFlags |= FlagOutEmptyCol;
if(bKeepEmpty)
uOutFlags |= FlagOutKeepEmpty;

// setup output location
ostream* pOut;
@@ -70,7 +70,7 @@ const unsigned int FlagOutLowerCase = 1;
const unsigned int FlagOutGapPlus = 2;
const unsigned int FlagOutGapSingleChar = 4;
const unsigned int FlagOutTranslate = 8;
const unsigned int FlagOutEmptyCol = 16;
const unsigned int FlagOutKeepEmpty = 16;

// Nucleotide Numbers
const int NumAdenine = 0;
@@ -309,7 +309,7 @@ void FilterSequence(const string& ssSrc, string& ssDest, unsigned int uFlags)
void FilterAlignment(const Tree::Alignment& alnSrc, Tree::Alignment& alnDest, unsigned int uFlags)
{
alnDest = alnSrc;
if((uFlags & FlagOutGapPlus) || (uFlags & FlagOutEmptyCol))
if((uFlags & FlagOutGapPlus) || (uFlags & FlagOutKeepEmpty))
return;
// remove columns that contain nothing but gaps
Tree::Alignment::iterator it = alnDest.begin();
@@ -366,6 +366,7 @@ void Tree::Evolve(Node &rNode, double dTime)
Sequence::size_type uLength = rNode.SeqLength()/m_uWidth;
double dLength = (double)uLength;
double dW = 1.0/m_funcRateSum(dLength);

// Do indels
for(double dt = rand_exp(dW); dt <= dTime; dt += rand_exp(dW))
{
@@ -406,17 +407,24 @@ void Tree::Evolve(Node &rNode, double dTime)
Sequence::size_type uPos = rand_uint((uint32_t)(uLength+ul-2));
Sequence::size_type uB = max(ul-1, uPos);
Sequence::size_type uSize = min(ul-1+uLength, uPos+ul)-uB;
uB -= (ul-1);
uB *= m_uWidth;
uSize *= m_uWidth;
// Find deletion point
Node::iterator itPos = rNode.SeqPos(uB);
Sequence::size_type uTemp = uSize;
uTemp -= itPos.first->Deletion(itPos.second, uTemp);
// Delete uSize nucleotides begin sensitive to gaps that overlap sections
for(++itPos.first; uSize && itPos.first != rNode.m_vSections.end(); ++itPos.first)
uTemp -= itPos.first->Deletion(itPos.first->begin(), uTemp);
uLength -= (uSize-uTemp)/m_uWidth;

// If GapLimits are on, only process deletion if it is completely inside the acceptance
// region as defined by the GapLimit. Check points are at sequence positions 0-uKeepFlank and
// uLength-1+uKeepFlank. These become 0-uKeepFlank+ul-1 and uLength-1+uKeepFlank+ul-1,
// when shifted to "deletion space".
if(m_uKeepFlank == 0 || ( (ul-1) < uPos+m_uKeepFlank && uPos < uLength-1+m_uKeepFlank ) ) {
uB -= (ul-1);
uB *= m_uWidth;
uSize *= m_uWidth;
// Find deletion point
Node::iterator itPos = rNode.SeqPos(uB);
Sequence::size_type uTemp = uSize;
uTemp -= itPos.first->Deletion(itPos.second, uTemp);
// Delete uSize nucleotides begin sensitive to gaps that overlap sections
for(++itPos.first; uSize && itPos.first != rNode.m_vSections.end(); ++itPos.first)
uTemp -= itPos.first->Deletion(itPos.first->begin(), uTemp);
uLength -= (uSize-uTemp)/m_uWidth;
}
}
// update length
dLength = (double)uLength;
@@ -433,7 +441,8 @@ bool Tree::SetupEvolution(double pFreqs[], double pSubs[],
const std::vector<double> &vdGamma,
const std::vector<double> &vdIota,
const std::vector<double> &vdScale,
double dTreeScale)
double dTreeScale,
int uKeepFlank)
{
// Verifiy Parameters
if(pFreqs[0] < 0.0 || pFreqs[1] < 0.0 || pFreqs[2] < 0.0 || pFreqs[3] < 0.0)
@@ -488,6 +497,9 @@ bool Tree::SetupEvolution(double pFreqs[], double pSubs[],
// Setup TreeScale
m_dTreeScale = dTreeScale;

// Setup GapLimit
m_uKeepFlank = uKeepFlank;

// Setup Cumulative Frequencies
m_dNucCumFreqs[0] = pFreqs[0];
m_dNucCumFreqs[1] = m_dNucCumFreqs[0]+pFreqs[1];
@@ -131,7 +131,8 @@ class Tree
bool SetupEvolution(double pFreqs[], double pSubs[],
const IndelModel::Params& rIns, const IndelModel::Params& rDel,
unsigned int uWidth, const std::vector<double> &vdGamma,
const std::vector<double> &vdIota, const std::vector<double> &vdScale, double dTreeScale);
const std::vector<double> &vdIota, const std::vector<double> &vdScale, double dTreeScale,
int uKeepFlank);

// Setup the root node
bool SetupRoot(const std::vector<std::string> &vSeqs, const std::vector<unsigned int> &vData,
@@ -204,6 +205,7 @@ class Tree
double m_dLambdaDel;
LinearFunc m_funcRateIns;
LinearFunc m_funcRateSum;
int m_uKeepFlank;
};

bool SaveAlignment(std::ostream &rFile, const Tree::Alignment& aln, unsigned int uFlags);

0 comments on commit 76e9b67

Please sign in to comment.
You can’t perform that action at this time.