###What trees to present in UCE manuscript?

We generated two data sets: a 75% complete matrix with at least 90 taxa per gene alignment and a 95% complete matrix with at least 114 taxa present in each gene alignment. We analyzed each of these matrices with RAXML and exabayes.

There are some minor differences in the trees. I am going to focus on the 75% complete matrix for this first step.

####Update 4-11-2015
Brant has rerun the acanthomorph data set using partitonfinder. We have exabayes results right now. I need to find out if there are raxml results. 

The new trees are here: **/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/**

Check to see if the partitoned and unpartitoned exabayes trees are the same

####Update 4-25-2015
here are the final trees:

UPDATE: the tree analyses are finalized. There are (in 75% complete)

- ExaBayes_ConsensusExtendedMajorityRuleNewick.Acanthomorph-75p-STDPART-1.5M-Burn25-FINAL.tre
- ExaBayes_ConsensusExtendedMajorityRuleNewick.Acanthomorph-75p-UNPART-1.5M-Burn25-FINAL.tre
- RAxML.acanthomorph-no-chauliodius-75p-STDPART.tre
- RAxML.acanthomorph-no-chauliodius-75p-UNPART.tre

and in 95% complete:

- ExaBayes_ConsensusExtendedMajorityRuleNewick.Acanthomorph-95p-UNPART-1M-Burn25-FINAL.tre
- RAxML.acanthomorph-no-chauliodius-95p-UNPART.tre

Will present the 75% complete tree in the paper with a comparison of the RAXML and EXABAYES partitioned trees

##Would be sensible to SAVE the tree data structures now so that I don't need to keep generating new uuids....

I figured out how to do this--see **acanthmorph_figure_april_2015.ipynb**

###general strategy for comparing trees

- pull them out of the dictionary
- find the clades in tree 1 that are not present in tree 2
    - plot it
- find the clades in tree 2 not present in tree 1
    - plot it
    
Write some functions to help. makeGroups() takes a tree and a list of taxa in a subclade and returns the nodeid of the ancestral node for that list

random_color(), rgb2hex(), and hls2hex() are from the ETE2 example and help generate hues for the comparison

mylayout() is the layout style for plotting

also

1. rename tips to include family taxonomy
2. assign unique ID to nodes so I can refer to them later..
3. also reading in a file (ranks.txt) that has families and their species as a dictionary. this file is curently placed here: **/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/converted_pngs/ranks.txt**


###Update June 10

Adding the Astral 40 ML tree to the comparison



In [134]:
from __future__ import division
import colorsys
from ast import literal_eval

def makeGroups(tree, taxonList):
    #pass in tree, list of taxa subtending node
    subClade = tree.get_common_ancestor(taxonList)
    #print subClade.nodeid
    #return list with the unique id for the node and the tree itself
    return [subClade.nodeid, subClade]

def renderClades(cladelist, name, style="TreeStyle()", writedir = None, scc= None):
    for ii, clade in enumerate(cladelist):
        fname = name + "_{}.pdf".format(ii)
        if not writedir:
            writedir = os.getcwd()
        #print "saving {} to: {}".format(fname, writedir)
        #clade.describe()
        #clade.get_tree_root().dist = 0
        #clade.describe()
        #clade.convert_to_ultrametric(10)
        clade.render(writedir + fname, tree_style=style )
        
def random_color(h=None, l=None, s=None):
    if h is None:
        h = random.random()
    if l is None:
        l = random.random()
    if s is None:
        s = random.random()
    return hls2hex(h, l, s)

def rgb2hex(rgb):
    return '#%02x%02x%02x' % rgb

def hls2hex(h, l, s):
    return rgb2hex( tuple(map(lambda x: int(x*255), colorsys.hls_to_rgb(h, l, s))))


def getColor(hue):
    #given a dictionary, a list of nodeids,  and a hue, assign a unique color to each nodeid
    hue = hue
    sat = 0.3
    lum = 0.9
    color = random_color(hue, lum, sat)
    return color

def mylayout(node):
    
    #the layout style for these trees
    #node = node
    #subCladeColors = scc
    node.img_style["hz_line_width"]=1
    node.img_style["vt_line_width"]=1
    # If node is a leaf, add the nodes name and a its scientific
    # name
    if node.is_leaf():
        node.img_style["size"] = 0
        color = "#000000"
        faces.add_face_to_node( faces.TextFace(node.name, ftype="Arial", fsize=12, fstyle = "italic", fgcolor=color), node, 0, position="branch-right" )
        #try adding the family to the tip as well
        if node.name in ranks:
            family = ranks[node.name]
            faces.add_face_to_node( faces.TextFace(family, ftype="Arial", fsize=6, fgcolor="purple"), node, 0, position = "branch-top" )
        if node.up is None:
            node.img_style["size"]=0
            node.dist = 0.35 # you may need to change this value to fit the aspect of your tree
            node.img_style["hz_line_color"] = "#ffffff"
            if UNROOTED:
                node.img_style["vt_line_type"] = 0.001
        # Sets the style of leaf nodes
        #node.img_style["size"] = 3
        #node.img_style["shape"] = "circle"
    #If node is an internal node
    else:
        ###make bubble proportional to boostrap support
        # Creates a sphere face whose size is proportional to node's
        # feature "weight"
        C = CircleFace(radius=node.support/20, color="Black", style="sphere")
        # Let's make the sphere transparent 
        #C.opacity = 0.0
        # And place as a float face over the tree
        #faces.add_face_to_node(C, node, 0, position="float")
        
    
        if node.nodeid in subCladeColors:
            #print "this node is in colors" + node.nodeid
            node.img_style["bgcolor"] = subCladeColors[str(node.nodeid)]
            #print BG_COLORS[node.nodeid]
        # Sets the style of internal nodes
        node.img_style["size"] = 1
        node.img_style["shape"] = "circle"
        node.img_style["fgcolor"] = "#000000"

def setTreeStyle(node_symbols = False):
    #cladeColors = scc
    # sets  default treestyle for rendering clades
    I = TreeStyle()
    #I.tree_width = 100
    if node_symbols:
        print yes
    else:
        I.layout_fn = mylayout
    I.show_leaf_name = False
    I.show_branch_support = True
    I.optimal_scale_level = "semi"
    I.show_scale = False
    I.force_topology = True
    return I




rr = open("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/converted_pngs/ranks.txt").read()
ranks = literal_eval(rr)

In [139]:
import os, uuid
os.chdir("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/compare_trees")
from ete2 import Tree, TreeStyle, AttrFace, NodeStyle, faces, CircleFace

exaPartTree = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/ExaBayes_ConsensusExtendedMajorityRuleNewick.Acanthomorph-75p-STDPART-1.5M-Burn25-FINAL.tre")
exaNoPartTree = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/ExaBayes_ConsensusExtendedMajorityRuleNewick.Acanthomorph-75p-UNPART-1.5M-Burn25-FINAL.tre")
raxPartTree = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/RAxML.acanthomorph-no-chauliodius-75p-STDPART.tre")
raxNoPartTree = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/RAxML.acanthomorph-no-chauliodius-75p-UNPART.tre")
ast50_111_BEST = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.50th-percentile-111-sites-BEST-genetrees.astral-4.7.8.BEST.species.tre")
ast50_111_MRE = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.50th-percentile-111-sites-BEST-genetrees.astral-4.7.8.MRE.species.tre")
ast75p_BEST = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.75p-acanthomorph.astral-4.7.8.BEST.species.tre")
ast75p_MRE = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.75p-acanthomorph.astral-4.7.8.MRE.species.tre")
ast75_147_BEST = Tree("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.75th-percentile-147-sites-BEST-genetrees.astral-4.7.8.BEST.species.tre")
ast75_147_MRE = Tree ("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/trees/75p/Astral.75th-percentile-147-sites-BEST-genetrees.astral-4.7.8.MRE.species.tre")

treeColl = {
"exa_75_part" : exaPartTree, 
"exa_75_no_part" : exaNoPartTree, 
"rax_75_part" : raxPartTree,  
"rax_75_no_part" : raxNoPartTree,
    "ast50_111_BEST" : ast50_111_BEST,
    "ast50_111_MRE" : ast50_111_MRE,
    "ast75p_BEST" : ast75p_BEST,
    "ast75p_MRE" : ast75p_MRE,
    "ast75_147_BEST" : ast75_147_BEST,
    "ast75_147_MRE" : ast75_147_MRE
    
    
}

##need to root the exabayes trees, set root branch to 0, assign uuid
for name in treeColl.keys():
    tt = treeColl[name]
    ancestor = "alepisaurus_ferox"
    tt.set_outgroup(ancestor)
    tt.get_tree_root().dist = 0
    for node in tt.traverse("postorder"):
        #node.add_features( nodeid = str(uuid.uuid1() ) )
        node.add_features( nodeid = node.get_topology_id() )
        if node.is_leaf():
            node.name = node.name.capitalize().replace("_", " ")
            #print node.name
        if node.name == "Takifugu occelatus":
            node.name == "Takifugu ocellatus"
        if node.name == "Ostorhinchus nigrofasciatus":
            node.name == "Ostorhinchus nigrofasciatus"
        if node.name == "Sargocentron coruscum2":
            node.name == "Sargocentron coruscum"
        if node.name == "Aulostomus sp":
            node.name == "Aulostomus maculatus"
        if node.name == "Emblemariopsis  sp":
            node.name == "Emblemariopsis  randalli"
            

####RF distance of partitioned and unpartitioned exabayes

This code 

- goes through each combination of tree pairs
- calculates the RF distance between them
- prints them as a pdf


In [140]:
##go through each pair of items in a dictionary and compare the RF distance
##print this distance

from itertools import combinations



def cladeDiffVisualizer(tree1, tree2):
    #rf, rf_max, common_attrs, names, edges_t1, edges_t2, discarded_edges_t1, discarded_edges_t2
    #print len(tree1.robinson_foulds(tree2))
    rf, max_rf, common_leaves, parts_t1, parts_t2, discarded_edges_t1, discarded_edges_t2 = tree1.robinson_foulds(tree2)
    #T1notInT2 = parts_t1 - parts_t2
    #T2notInT1 = parts_t2 - parts_t1   
    not_in_tree_2 = parts_t1 - parts_t2
    not_in_tree_1 = parts_t2 - parts_t1

#tree1_not_tree2  holds the node id and the ETE node of partitioned subclades NOT present in the unpart tree
    #holder = {}
    tree2_not_tree1 = {}
    #clades in tree 2 that are not present in tree 1
    for taxa in not_in_tree_1:
        groups =  makeGroups(tree2, taxa)
        #print len(taxa), len(set(taxa))
        #if groups[0] in tree1_not_tree2.keys():
            #print "already present {}".format(groups[0])
            #print groups[1], tree1_not_tree2[groups[0]]
        if groups[0] in tree2_not_tree1:
            print "duplicate: {}".format(groups[0])
        tree2_not_tree1[groups[0]] = groups[1]   
        
    tree1_not_tree2 = {}
    #clades present in tree 1 that are not present in tree 2
    for taxa in not_in_tree_2:
        groups =  makeGroups(tree1, taxa)
        tree1_not_tree2[groups[0]] = groups[1]
  
    totclades = len(not_in_tree_2) + len (not_in_tree_1)
    #print "the length of total clades is {}".format(totclades)
    subCladeColors = {} # this will hold all of the colors of oustanding clades
    possibleHues = range(360, 0, -int( 360./ totclades) ) #get a set of color values based on number of subclades

    for cladeID in tree1_not_tree2.keys():
        #print "the length of tree1_not_tree2.keys is {}".format(len(tree1_not_tree2.keys()))
        hue = possibleHues.pop()/360
        subCladeColors[cladeID] = getColor(hue)
        #print "the len of subCladeColors is {}".format(len(subCladeColors.keys()))

    for cladeID in tree2_not_tree1.keys():
        #print "the length of tree2_not_tree1.keys is {}".format(len(tree1_not_tree2.keys()))

        hue = possibleHues.pop()/360
        subCladeColors[cladeID] = getColor(hue)
        #print "the len of subCladeColors is {}".format(len(subCladeColors.keys()))
   
    #change to a directory for subclade trees
    #os.chdir("/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/")
    
    t1Clades = tree1_not_tree2.values()
    t2Clades = tree2_not_tree1.values()
    
    return [rf, t1Clades, t2Clades, subCladeColors]

for pairs in combinations(treeColl.keys(), 2):
    tname_1, tname_2 = pairs
    tree1 = treeColl[tname_1]
    tree2 = treeColl[tname_2]
    res = cladeDiffVisualizer(tree1, tree2)
    rf = res[0]
    t1clades = res[1]
    t2clades = res[2]
    subCladeColors = res[3]
    
    #set the treestyle
    
    TS = setTreeStyle()
    
    print "\nthe RF distance between the {} and {} is {}\n".format(tname_1, tname_2, rf )
    writedir = "/Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/"
    renderClades(t1clades, "{}.not.{}".format(tname_1, tname_2), TS, writedir)
    renderClades(t2clades, "{}.not.{}".format(tname_2, tname_1), TS, writedir)
    


the RF distance between the ast50_111_BEST and exa_75_no_part is 36

saving ast50_111_BEST.not.exa_75_no_part_0.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.not.exa_75_no_part_1.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.not.exa_75_no_part_2.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.not.exa_75_no_part_3.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.not.exa_75_no_part_4.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.not.exa_75_no_part_5.pdf to: /Users/michael_alfaro/Dropbox/malfaro-acanthomorph/manuscript/pnas tex/ETE_work/rendered_subclades/
saving ast50_111_BEST.no

In [12]:
res = astralTree.robinson_foulds(exaPartTree)
%qtconsole

In [22]:
totclades = len(notInUnPart) + len (notInPart)
subCladeColors = {} # this will hold all of the colors of oustanding clades
possibleHues = range(360, 0, -int( 360./ totclades) ) #get a set of color values based on number of subclades

for cladeID in partNotUnpart.keys():
    hue = possibleHues.pop()/360
    subCladeColors[cladeID] = getColor(hue)

for cladeID in UnpartNotPart.keys():
    hue = possibleHues.pop()/360
    subCladeColors[cladeID] = getColor(hue)
    
#print subCladeColors.viewitems()   

I = TreeStyle()
I.tree_width = 100
I.layout_fn = mylayout
#I.show_branch_length = False
#I.show_branch_support = True
I.show_leaf_name = False
I.show_branch_support = True
#I.mode = "c"
#I.arc_span = 180
#I.arc_start = -180
#I.force_topology = True
#I.legend_position = 3
#I.extra_branch_line_type = 1
#I.guiding_lines_type = 1
#I.guiding_lines_color = "#666666"
#I.extra_branch_line_color = "#666666"
I.optimal_scale_level = "semi"
#I.root_opening_factor = 0
#I.scale = 1000
#tt.show(tree_style=I)
#tt.render("%%inline", w=400)
#current_tree.show(tree_style=I)


NameError: name 'notInUnPart' is not defined

####RF distance of ML to Bayes


In [164]:


totclades = len(notInBayes) + len (notInML)
subCladeColors = {} # this will hold all of the colors of oustanding clades
possibleHues = range(360, 0, -int( 360./ totclades) ) #get a set of color values based on number of subclades

for cladeID in mlXbayes.keys():
    hue = possibleHues.pop()/360
    subCladeColors[cladeID] = getColor(hue)

for cladeID in bayesXml.keys():
    hue = possibleHues.pop()/360
    subCladeColors[cladeID] = getColor(hue)
    
#print subCladeColors.viewitems()    

NameError: name 'notInBayes' is not defined

Using **I** as the treestyle 

In [144]:
I = TreeStyle()
I.tree_width = 100
I.layout_fn = mylayout
#I.show_branch_length = False
#I.show_branch_support = True
I.show_leaf_name = False
I.show_branch_support = True
#I.mode = "c"
#I.arc_span = 180
#I.arc_start = -180
#I.force_topology = True
#I.legend_position = 3
#I.extra_branch_line_type = 1
#I.guiding_lines_type = 1
#I.guiding_lines_color = "#666666"
#I.extra_branch_line_color = "#666666"
I.optimal_scale_level = "semi"
#I.root_opening_factor = 0
#I.scale = 1000
#tt.show(tree_style=I)
#tt.render("%%inline", w=400)
#current_tree.show(tree_style=I)