Skip to content

shamsbayzid/DynaDup

 
 

Repository files navigation

DESCRIPTION:
----------------------
DynaDup is a Java program based on the phylonet (bioinfo.cs.rice.edu/phylonet) for estimating species trees given a set of rooted gene trees, such that duplications or duplication and losses are minimized (approximately). The algorithms used are described in:

1. M. S. Bayzid, S. Mirarab, T. Warnow, "Optimal Species Trees under Gene Duplication and Loss", PSB 2013.

2. M. S. Bayzid, T. Warnow, "Gene Tree Parsimony for Incomplete Gene Trees" (submitted to WABI 2017).


INSTALLATION:
----------------------
There is no installation required to run DynaDup. You simply need to download the .zip file (from https://github.com/shamsbayzid/DynaDup/) and extract the contents to a folder of your choice. 

Altrenatively, you can replicate the github repository (https://github.com/shamsbayzid/DynaDup/) and run make.sh to build the project. 

DynaDup is a java-based application, and should run in any environment (Windows, Linux, Mac, etc.) as long as java is installed. Java 1.5 or later is required. We have tested DynaDup only on Linux.

EXECUTION:
---------------------
DynaDup currently has no GUI. You need to run it through command-line. In a terminal, go the location where you have downloaded the software, and issue the following command:

java -jar mgd.2.3.2.jar

This will give you a list of options available in DynaDup.

To find the species tree that minimizing duplications (MGD) given a set of gene trees in a file called in.tree, use:

java -jar mgd.2.3.2.jar -i in.tree

The results will be outputted to the standard output. To save the results in a file use the -o option:

java -jar mgd.2.3.2.jar -i in.tree -o out.tre

To minimize duplications and losses based on traditional definition using homomorphic restriction run (MGDL_h), run:

java -jar mgd.2.3.2.jar -i in.tree -dl

To minimize duplications and losses using our new definition using the original tree and considering missing taxa as losses (MGDL), run:

java -jar mgd.2.3.2.jar -i in.tree -dll

Note that, currently, the input gene trees need to be a) fully resolved, b) rooted. 

Name Mapping
--------------------------
Leaves of a gene trees need to have distinct labels; these labels will appear as the leaf names in the species tree by default. If the gene trees contain multiple copies of the gene for the same taxa, you need to provide that information to DynaDup using the -a option. The -a allows you to provide a mapping between names in gene trees to names in the species tree. Gene tree names that map to the same species tree name are considered to be multiple copies all belonging to the same species. The -a option can be used in two ways, as described below. 

If there is only one string after -a, it needs to be the name of a file that lists all the mappings in the following format:

species1: gene1.1,gene1.2,gene1.3;
species2: gene2.1,gene2.2;

Each line represents one speceies. Before the colon the species name is given, and then all the names in the gene tree that belong to that species are listed, with "," used to separate different names. In this example, species1 and species2 are the taxon names, and gene1.1, gene1.2, etc. are leave names from gene trees. If the name of the file containing these mappings is mapping.txt, the following command should be used:

java -jar mgd.2.3.2.jar -i in.tree -a mapping.txt

Alternatively, by providing two values after -a option, you can use an automatic name mapping based on (java) regular expressions. The first parameter after -a is a search pattern, and the second parameter is the replacement string. Any part of the leaf names from gene trees that matches the search pattern (a regular expression) is replaced by the replacement string. For example, 

java -jar mgd.2.3.2.jar -i in.tree -a "_.*" "_sp"

replaces anything after an underscore in the gene tree names to a "_sp" in the species name. So, human_1, human_2 and human_3 will all map to human_sp. If the replacement string is set to "/delete/", the string matched by the search pattern is simply removed. So, 

java -jar mgd.2.3.2.jar -i in.tree -a "_.*" "/delete/"

maps 'human_1', 'human_2', and 'human_3' to 'human'; 'horse_ab_c' to 'horse'; and 'mouse' to 'mouse'. 


Algorithmic Details:
------------------------
The based of our dynamic programming algorithm are described in our referenced publication. Here we describe some details not included in the paper. 

In our current implementation, the set of subtree bipartitions to be considered are constructed as follows. We first construct a set of clusters, and then for any two clusters X and Y in this set of clusters, we consider X|Y as a subtree bipartition. 

The set of clusters is created with the following rules. 

1- For any node v with subtree bipartition X|Y in gene trees, we include X, Y, X-Y and Y-X in the set of clusters. 
2- If a cluster A cannot be further resolved based on the available set of subtree bipartitions, we find the set of largest subclusters of A, and for each B in this set, we add A-B to the set of clusters. This ensures that we can always find at least one fully resolved tree for any given set of incomplete gene trees. 
3- If -cs and -cd options are given, extra subtree bipartitions are added as follows. For any cluster C if |C| >= cs*|taxa|, we add complementary clusters (with respect to C) of all subclusters of C if size of the subcluster is >= cd*|C|.
4- If extra trees are provided using -ex option, clusters from those trees are also added the set of clusters. 

Memory:
------------------------

For big datasets (say 500 taxon) increasing the memory available to Java can result in speed up. Note that you should give Java only as much as free memory you have available on your machine. So, for example, if you have 3GB of free memory, you can invoke DynaDup using the following command to make all the 3GB available to Java:

java -Xmx3000M -jar mgd.2.3.2.jar -i in.tree


Acknowledgment
-----------------------
DynaDup code uses code from PhyloNet package.

Bug Reports:
-----------------
contact smirarab@gmail.com or shams.bayzid@gmail.com

About

Software to Estimate Species Tree by minimizing Duplication and Loss

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 98.9%
  • Shell 1.1%