Skip to content
Permalink
Browse files

git-svn-id: svn://scit.us/dawg/trunk@174 7e5b8c5c-87e2-0310-a459-8d91…

…125b37d5
  • Loading branch information...
reedacartwright committed Jan 27, 2005
1 parent d37d8aa commit 0f4b03fe30f86cf1add5b8568cd9d096401e8295
Showing with 154 additions and 59 deletions.
  1. +1 −1 Makefile.am
  2. +24 −14 README
  3. +1 −1 configure.ac
  4. +14 −11 dawg.txt
  5. +3 −4 examples/example1.fud
  6. +1 −1 examples/example2.fud
  7. +6 −3 lambda.pl
  8. +34 −0 nexus2fasta.pl
  9. +4 −1 outsplit.pl
  10. +4 −2 src/dawg.cc
  11. +14 −11 src/dawgtxt.h
  12. +23 −10 src/tree.cc
  13. +1 −0 src/tree.h
  14. +24 −0 varrep.pl
@@ -1,3 +1,3 @@
SUBDIRS = src examples tests doc

EXTRA_DIST = lambda.pl
EXTRA_DIST = lambda.pl outsplit.pl varrep.pl nexus2fasta.pl
38 README
@@ -20,6 +20,8 @@ COMMAND LINE USAGE
-h: display help information
-?: same as -h

Dawg will read stdin if filename is "-".

FILE FORMAT
The file format takes a series of statements in the form of
"name = value," where "name" is alphanumeric and value can
@@ -36,28 +38,29 @@ OPTIONS
Name Type Description
--------------------------------------------------------------------------
Tree VT phylogeny
Scale N coefficient to scale branch lengths by
TreeScale N coefficient to scale branch lengths by
Sequence VS root sequences
Length VN length of generated root sequences
Rates VVN rate of evolution of each root nucleotide
Model S model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
Freqs VD nucleotide (ACGT) frequencies
Freqs VN nucleotide (ACGT) frequencies
Params VN parameters for the model of evolution
Gamma N coefficient of variance for rate heterogenity
Alpha N shape parameter
Iota N proportion of invariant sites
GapModel VS models of indel formation: NB|User
Width N block width for indels and recombination
Scale VN block position scales
Gamma VN coefficients of variance for rate heterogenity
Alpha VN shape parameters
Iota VN proportions of invariant sites
GapModel VS models of indel formation: NB|PL|US
Lambda VN rates of indel formation
GapParams VN parameter for the indel model
Frame N size of frame to restrict indels to
Reps N number of data sets to output
File S output file
Format S output format: Fasta|Nexus|Phylip
GapSingleChar B output gaps as a single character
GapPlus B distinguish insertions from deletions
NexusBlock S text to include between datasets in Nexus format
NexusBlockFile S file to read the Nexus block from
Seed VN PRNG seed
GapPlus B distinguish insertions from deletions in alignment
LowerCase B output sequences in lowercase
NexusCode S text or file to include between datasets in Nexus format
Seed VN PRNG seed (integers)

The meaning of the Params vector is different for each substitution model.
GTR: Substitution rates A-C, A-G, A-T, C-G, C-T, G-T
@@ -70,8 +73,15 @@ The meaning of the Params vector is different for each substitution model.
TN: Alpha1 (A-G), Alpha2 (C-T), Beta (Transversions)

The meaning of the GapParams vector is different for each gap model.
User: The distribution of gap sizes.
NB: The number of failures (r), the probability of success (q).
US: The distribution of gap sizes.
NB: The number of failures (r), the probability of success (q).
PL: The rate parameter (a), the maximum gap size.

To create a recombinant tree, you may need to specifically describe and label the
inner nodes at which the recombination events occur. See example2.fud.
inner nodes at which the recombination events occur. See example4.fud.

Gamma takes precidence over Alpha.

Sequence takes precidence over Length.

If NexusCode is the name of a file, the code is read from that file.
@@ -1,4 +1,4 @@
AC_INIT(dawg, 1.0.0-rc2, rac@uga.edu)
AC_INIT(dawg, 1.0.0-rc3, rac@uga.edu)
AC_CONFIG_SRCDIR([src/dawg.h])
AC_CONFIG_AUX_DIR([config])
AC_PREREQ(2.59)
@@ -7,6 +7,8 @@ COMMAND LINE USAGE
-h: display help information
-?: same as -h

Dawg will read stdin if filename is "-".

FILE FORMAT
The file format takes a series of statements in the form of
"name = value," where "name" is alphanumeric and value can
@@ -23,25 +25,26 @@ OPTIONS
Name Type Description
--------------------------------------------------------------------------
Tree VT phylogeny
Scale N coefficient to scale branch lengths by
TreeScale N coefficient to scale branch lengths by
Sequence VS root sequences
Length VN length of generated root sequences
Rates VVN rate of evolution of each root nucleotide
Model S model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN
Freqs VD nucleotide (ACGT) frequencies
Freqs VN nucleotide (ACGT) frequencies
Params VN parameters for the model of evolution
Gamma N coefficient of variance for rate heterogenity
Alpha N shape parameter
Iota N proportion of invariant sites
GapModel VS models of indel formation: NB|User
Width N block width for indels and recombination
Scale VN block position scales
Gamma VN coefficients of variance for rate heterogenity
Alpha VN shape parameters
Iota VN proportions of invariant sites
GapModel VS models of indel formation: NB|PL|US
Lambda VN rates of indel formation
GapParams VN parameter for the indel model
Frame N size of frame to restrict indels to
Reps N number of data sets to output
File S output file
Format S output format: Fasta|Nexus|Phylip
GapSingleChar B output gaps as a single character
GapPlus B distinguish insertions from deletions
NexusBlock S text to include between datasets in Nexus format
NexusBlockFile S file to read the Nexus block from
Seed VN PRNG seed
GapPlus B distinguish insertions from deletions in alignment
LowerCase B output sequences in lowercase
NexusCode S text or file to include between datasets in Nexus format
Seed VN PRNG seed (integers)
@@ -21,11 +21,10 @@ Freqs = {0.20, 0.30, 0.30, 0.20}
Reps = 10

#Output information
File = "example1.nex"
File = "example1.poo.nex"
Format = "Nexus"
NexusBlock = <<EOF
NexusCode = <<EOF
Begin Paup;
set cr=parsimony;
hsearch;
hsearch;
End;
EOF
@@ -7,7 +7,7 @@ Length = 300
Lambda = 0.1

# Use the user defined gap model
GapModel = {"User"}
GapModel = {"US"}

GapParams =
{
@@ -1,7 +1,7 @@
#! /usr/bin/perl -w

# lambda.pl v1.0 - An estimator for DAWG's lambda
# Copyright (2004) Reed A. Cartwright. All rights reserved.
# lambda.pl v1.1 - An estimator for DAWG's lambda
# Copyright (2004-2005) Reed A. Cartwright. All rights reserved.
#
# Usage: perl lambda.pl [treefile] [fastafile]
#
@@ -80,6 +80,8 @@
}
my $avgG = $suml/$numgaps-1.0;

my $gapclasses = keys(%gapsizes);

my %model = ();

#geometric model
@@ -134,7 +136,7 @@
$model{NegBin} = $rmodel{$r_ll};

# truncated power-law
my @gs = (sort(keys(%gapsizes)))[0..4];
my @gs = (sort(keys(%gapsizes)))[0..(($gapclasses >4) ? 4 : $gapclasses-1)];

my $sx = 0.0;
my $sx2 = 0.0;
@@ -143,6 +145,7 @@
my $n = @gs;
foreach(@gs)
{
next unless(defined($_));
my $x = log($_);
my $y = log($gapsizes{$_}/$numgaps);
$sx += $x;
@@ -0,0 +1,34 @@
#! /usr/bin/perl -w
# Copyright (2005) Reed A. Cartwright. All rights reserved.
#
# converts nexus sequences to fasta
#
# usage: perl nexus2fasta.pl < infile > outfile
#
# Distributed under the same license as DAWG
#


my $state = 0;

my %seqs = ();

local $/;
my $text = <>;

my ($data) = $text =~ /begin\s+data;.+?matrix\s*(.*?);\s*end;/is;
my @lines = split(/\n/, $data);

foreach(@lines)
{
s/^\s+//;
s/\s+$//;
next unless(/\w/);
my @sec = split(/\s+/, $_);
my $name = shift(@sec);
$seqs{$name} |= '';
$seqs{$name} .= join('', @sec);
}

print ">$_\n$seqs{$_}\n\n" foreach(sort(keys(%seqs)));

@@ -3,10 +3,13 @@
#
# outsplit.pl is used to extract sequences from Fasta and Phylip files
#
# usage perl -w outsplit.pl <file> <id>
# usage: perl -w outsplit.pl <file> <id>
#
# if <id> is "all" it it creates a directory and
# creates a new file for each alignment
#
# Distributed under the same license as DAWG
#

use strict;
use File::Basename;
@@ -131,10 +131,12 @@ bool Execute()
double dRevParams[6] = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
vector<double> vdParams;
vector< vector<double> > vvdRates;
string ssModel = "JC", ssGapModel[2] = {"NB", "NB"};
string ssModel = "JC", ssGapModel[2] = {"US", "US"};

double dLambda[2] = {0.0, 0.0};
vector<double> vdGapModel[2];
vdGapModel[0].push_back(1.0);
vdGapModel[1].push_back(1.0);

vector<NewickNode*> vtTrees;
double dTreeScale = 1.0;
@@ -303,7 +305,7 @@ bool Execute()
dRevParams[0] = dRevParams[2] = dRevParams[3] = dRevParams[5] = vdParams[2];
}
else
return DawgError("Unknown Model, \"%s\"", ssModel.c_str());
return DawgError("Unknown substitution model, \"%s\".", ssModel.c_str());

IndelModel::Params paramsDel, paramsIns;
paramsIns.ssModel = ssGapModel[0];
@@ -7,6 +7,8 @@
" -h: display help information\n" \
" -?: same as -h\n" \
"\n" \
" Dawg will read stdin if filename is \"-\".\n" \
"\n" \
"FILE FORMAT\n" \
" The file format takes a series of statements in the form of\n" \
" \"name = value,\" where \"name\" is alphanumeric and value can\n" \
@@ -23,26 +25,27 @@
" Name Type Description\n" \
"--------------------------------------------------------------------------\n" \
" Tree VT phylogeny\n" \
" Scale N coefficient to scale branch lengths by\n" \
" TreeScale N coefficient to scale branch lengths by\n" \
" Sequence VS root sequences \n" \
" Length VN length of generated root sequences\n" \
" Rates VVN rate of evolution of each root nucleotide\n" \
" Model S model of evolution: GTR|JC|K2P|K3P|HKY|F81|F84|TN\n" \
" Freqs VD nucleotide (ACGT) frequencies \n" \
" Freqs VN nucleotide (ACGT) frequencies \n" \
" Params VN parameters for the model of evolution\n" \
" Gamma N coefficient of variance for rate heterogenity\n" \
" Alpha N shape parameter\n" \
" Iota N proportion of invariant sites\n" \
" GapModel VS models of indel formation: NB|User\n" \
" Width N block width for indels and recombination\n" \
" Scale VN block position scales\n" \
" Gamma VN coefficients of variance for rate heterogenity\n" \
" Alpha VN shape parameters\n" \
" Iota VN proportions of invariant sites\n" \
" GapModel VS models of indel formation: NB|PL|US\n" \
" Lambda VN rates of indel formation\n" \
" GapParams VN parameter for the indel model\n" \
" Frame N size of frame to restrict indels to\n" \
" Reps N number of data sets to output\n" \
" File S output file \n" \
" Format S output format: Fasta|Nexus|Phylip\n" \
" GapSingleChar B output gaps as a single character\n" \
" GapPlus B distinguish insertions from deletions\n" \
" NexusBlock S text to include between datasets in Nexus format\n" \
" NexusBlockFile S file to read the Nexus block from\n" \
" Seed VN PRNG seed\n" \
" GapPlus B distinguish insertions from deletions in alignment\n" \
" LowerCase B output sequences in lowercase\n" \
" NexusCode S text or file to include between datasets in Nexus format\n" \
" Seed VN PRNG seed (integers)\n" \

@@ -177,13 +177,20 @@ void Tree::ProcessNewickNode(NewickNode* pNode, Node::Handle hAnc)
ProcessNewickNode(pNode->m_pSib.get(), hAnc);

Node& node = m_map[pNode->m_ssLabel];
node.m_ssName = pNode->m_ssLabel;
node.m_mBranchLens[hAnc] = pNode->m_dLen;
node.m_vAncestors.resize(m_nSec+1, m_map.end());
node.m_vAncestors[m_nSec] = hAnc;


if(node.m_ssName.empty() && !pNode->m_pSub.get())
m_vTips.push_back(pNode->m_ssLabel);

if(node.m_ssName.empty())
node.m_ssName = pNode->m_ssLabel;

if(pNode->m_pSub.get())
ProcessNewickNode(pNode->m_pSub.get(), m_map.find(pNode->m_ssLabel));
ProcessNewickNode(pNode->m_pSub.get(), m_map.find(node.m_ssName));


}

void Tree::Evolve()
@@ -210,8 +217,8 @@ void Tree::Evolve()
(*it)[u].m_ucNuc = RandomBase();
}
}
for(Node::Handle it = m_map.begin(); it != m_map.end(); ++it)
Evolve(it->second);
for(vector<string>::const_iterator cit = m_vTips.begin(); cit != m_vTips.end(); ++cit)
Evolve(m_map[*cit]);
}

void Tree::Evolve(Node &rNode)
@@ -482,11 +489,14 @@ bool Tree::SetupEvolution(double pFreqs[], double pSubs[],
try {m_pInsertionModel.reset(new PowerModel(rIns.vdModel));}
catch(...) {return DawgError("Insertion model parameters not specified correctly.");}
}
else
else if(rIns.ssModel == "US")
{
try {m_pInsertionModel.reset(new UserModel(rIns.vdModel));}
catch(...) {return DawgError("Insertion model parameters not specified correctly.");}
}
else
return DawgError("Unknown insertion model: \"%s\".", rDel.ssModel.c_str());


m_dLambdaDel = rDel.dLambda;
if(m_dLambdaDel < DBL_EPSILON)
@@ -496,16 +506,19 @@ bool Tree::SetupEvolution(double pFreqs[], double pSubs[],
try {m_pDeletionModel.reset(new NegBnModel(rDel.vdModel));}
catch(...) {return DawgError("Deletion model parameters not specified correctly.");}
}
else if(rIns.ssModel == "PL")
else if(rDel.ssModel == "PL")
{
try {m_pDeletionModel.reset(new PowerModel(rDel.vdModel));}
catch(...) {return DawgError("Deletion model parameters not specified correctly.");}
}
else
else if(rDel.ssModel == "US")
{
try {m_pDeletionModel.reset(new UserModel(rDel.vdModel));}
catch(...) {return DawgError("Deletion model parameters not specified correctly.");}
}
}
else
return DawgError("Unknown deletion model: \"%s\".", rDel.ssModel.c_str());

m_funcRateIns.m = m_dLambdaIns;
m_funcRateIns.b = m_dLambdaIns;
m_funcRateSum.m = m_dLambdaDel+m_dLambdaIns;
@@ -573,7 +586,7 @@ void Tree::Align(Alignment &aln, bool bGapPlus, bool bGapSingleChar, bool bLower
vector<string> vNames;
for(Node::Map::const_iterator cit = m_map.begin(); cit != m_map.end(); ++cit)
{
vNames.push_back(cit->first);
vNames.push_back(cit->second.m_ssName);
Sequence s;
cit->second.Flatten(s);
vTable.push_back(s);

0 comments on commit 0f4b03f

Please sign in to comment.
You can’t perform that action at this time.