# A simple tree in Python

Here is a minimal data strucutre for representing a tree in Python. A tree is just a node with two or more children.

In [3]:
class Node:
    
    def __init__ (self, length=None, name=None, children=None):
        self.length = length
        self.name = name
        self.children = children   

Let us make the tree from the slide 16 in the slides "Evolutionary_Trees.pdf".

In [4]:
leaf_a = Node(length = 0.1, name = "A", children = None)
leaf_b = Node(length = 0.2, name = "B", children = None)
leaf_c = Node(length = 0.3, name = "C", children = None)
leaf_d = Node(length = 0.4, name = "D", children = None)
node_e = Node(length = 0.5, name = "E", children = [leaf_c, leaf_d])
node_f = Node(length = None, name = "F", children = [leaf_a, leaf_b, node_e])

Let us make a function to print out a tree in Newick format.

In [35]:
def print_newick(r):    
    """
    Print out the tree r in newick format using a depth-first traversal of the tree
    """
    df_print_newick(r)
    print(";")
    
def df_print_newick(r):
    if r.children != None:
        print("(", end="")
        for c in r.children[:-1]:
            df_print_newick(c)
            print(",", end="")
        df_print_newick(r.children[-1])
        print(")", end="")
    if r.name != None: print(r.name, end="")
    if r.length != None: print(":", r.length, sep="", end="")

In [36]:
print_newick(node_f)

(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;


We can use the BaseTree data structure from the Phylo module in Biopython to represent a tree. You can read about Phylo in https://biopython.org/wiki/Phylo. You can also take a look at chapter 13 in the Biopython tutorial at http://biopython.org/DIST/docs/tutorial/Tutorial.html.

The BaseTree data structure in the Phylo module of Biopython is (of course) more advanced than the above simple data struture, but it resembles it. In Phylo terminoly a node is a clade, and a tree contains more information than its node. 

Here is how to construct the tree from the slide 16 in the slides "Evolutionary_Trees.pdf". Compare with how it is done above.

In [20]:
from Bio import Phylo
leaf_a = Phylo.BaseTree.Clade(branch_length = 0.1, name = "A")
leaf_b = Phylo.BaseTree.Clade(branch_length = 0.2, name = "B")
leaf_c = Phylo.BaseTree.Clade(branch_length = 0.3, name = "C")
leaf_d = Phylo.BaseTree.Clade(branch_length = 0.4, name = "D")
node_e = Phylo.BaseTree.Clade(branch_length = 0.5, name = "E", clades = [leaf_c, leaf_d])
node_f = Phylo.BaseTree.Clade(name = "F", clades = [leaf_a, leaf_b, node_e])
tree = Phylo.BaseTree.Tree(root = node_f)

In [22]:
print(tree)

Tree(rooted=True)
    Clade(name='F')
        Clade(branch_length=0.1, name='A')
        Clade(branch_length=0.2, name='B')
        Clade(branch_length=0.5, name='E')
            Clade(branch_length=0.3, name='C')
            Clade(branch_length=0.4, name='D')


We can make a similar recursive depth first traversal to print out this tree in Newick format. Note that a clade (a node) in Phylo has a method, is_terminal(), that tells us whether the clade/node is a leaf or not.

In [32]:
def print_newick_biopython(t):    
    """
    Print out the tree r in newick format using a depth-first traversal of the tree
    """
    df_print_newick_biopython(t.clade)
    print(";")
    
def df_print_newick_biopython(r):
    if r.is_terminal() == False:
        print("(", end="")
        for c in r.clades[:-1]:
            df_print_newick_biopython(c)
            print(",", end="")
        df_print_newick_biopython(r.clades[-1])
        print(")", end="")
    if r.name != None: print(r.name, end="")
    if r.branch_length != None: print(":", r.branch_length, sep="", end="")

In [33]:
print_newick_biopython(tree)

(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;


Phylo supports a lot of functionality, including printing a tree in Newick format. Explore the documentation.

In [35]:
tree.format("newick")

'(A:0.10000,B:0.20000,(C:0.30000,D:0.40000)E:0.50000)F:0.00000;\n'