# Day 13 notebook

The objectives of this notebook are to practice

* representing trees as strings
* visualizing trees
* enumerating trees
* examining trees

## The `toytree` module

In this activity we will use the `toytree` (third-party) module to work with phylogenetic trees.  In particular, we will use the newick-format parsing and visualization functionality of this module.  I encourage you to look through the [`toytree` documentation](https://toytree.readthedocs.io/en/latest/4-tutorial.html).

In [1]:
import toytree # for working with tree data structures

## The newick format for trees
The classic string representation for trees is the [Newick format](https://en.wikipedia.org/wiki/Newick_format).  In addition to the Wikipedia article, you can also read the [description by Joe Felsenstein](http://evolution.genetics.washington.edu/phylip/newicktree.html), one of the leaders in the field of phylogenetics (and also a UW-Madison alumnus!).

For example, the Newick formatted string `"((A, B),((C, D),E));"` corresponds to the tree below:

In [2]:
example_newick = "((A, B), ((C, D), E));"
example_tree = toytree.tree(example_newick)
canvas, axes = example_tree.draw(use_edge_lengths=False)

You can specify branch lengths in a Newick string using a colon (`:`) followed by a number after the child node of the branch.  For example, the Newick formatted string `"((A:3, B:3):1,((C:1,D:1):2,E:3):1);"` corresponds to the tree below:

In [3]:
example2_newick = "((A:3, B:3):1, ((C:1, D:1):2, E:3):1);"
example2_tree = toytree.tree(example2_newick)
canvas, axes = example2_tree.draw(use_edge_lengths=True)
# Turn on the x axis so that branch lengths can be measured
axes.show = True
axes.y.show = False # the y-axis is meaningless
axes.x.ticks.show = True

Here is an alternative tree visualization of the same tree.  Note that the lengths of the vertical lines in the previous visualization are not meaningful.

In [4]:
canvas, axes = example2_tree.draw(tree_style="c", tip_labels=True, node_labels=False, orient="right")

The Newick format also allows for specifying the names and support values of internal nodes, but we will not generally use those features, so do not worry about that aspect of the format for now.

## PROBLEM 1: Newick string for a tree without branch lengths (1 POINT)

Give a Newick string (without branch lengths) that respresents the tree shown below.  Assign that string to the variable `problem1_newick` in the cell below.
![problem1_tree](problem1_tree.png)

In [5]:
### BEGIN SOLUTION TEMPLATE=problem1_newick=?
problem1_newick = "((((A, B), C), D), (E, F));"
canvas, axes = toytree.tree(problem1_newick).draw(use_edge_lengths=False)
### END SOLUTION

In [6]:
# tests for problem1_newick
problem1_tree = toytree.tree(problem1_newick) # this will raise an exception if the string is not valid
### BEGIN HIDDEN TESTS
problem1_tree.treenode.sort_descendants()
assert problem1_tree.write(tree_format=9) == "((((A,B),C),D),(E,F));"
### END HIDDEN TESTS
print("SUCCESS: problem1_newick passed all visible tests")

SUCCESS: problem1_newick passed all visible tests


## PROBLEM 2: Newick string for a tree with branch lengths (1 POINT)

Give a Newick string (with branch lengths) that respresents the tree shown below.  Assign that string to the variable `problem2_newick` in the cell below.
![problem2_tree](problem2_tree.png)

In [7]:
### BEGIN SOLUTION TEMPLATE=problem2_newick=?
problem2_newick = "((A:3, (B:2, C:2)):1, ((D:1, E:1), F:2):2);"
canvas, axes = toytree.tree(problem2_newick).draw(tree_style="c", tip_labels=True, node_labels=False, orient="right")
### END SOLUTION

In [8]:
# tests for problem2_newick
problem2_tree = toytree.tree(problem2_newick) # this will raise an exception if the string is not valid
### BEGIN HIDDEN TESTS
problem2_tree.treenode.sort_descendants()
assert problem2_tree.write(tree_format=5) == "((A:3,(B:2,C:2):1):1,((D:1,E:1):1,F:2):2);"
### END HIDDEN TESTS
print("SUCCESS: problem2_newick passed all visible tests")

SUCCESS: problem2_newick passed all visible tests


## PROBLEM 3: Newick strings for all rooted trees with four leaves (1 POINT)
List all possible binary rooted trees with four leaves, $\{A, B, C, D\}$.  Assign a list containing the Newick strings for the trees to the variable `problem3_newick_list` below.

In [9]:
### BEGIN SOLUTION TEMPLATE=problem3_newick_list=?
problem3_newick_list = ["((A,B),(C,D));",
                        "((A,C),(B,D));",
                        "((A,D),(B,C));",
                        "(A,(B,(C,D)));",
                        "(A,(C,(B,D)));",
                        "(A,(D,(B,C)));",
                        "(B,(A,(C,D)));",
                        "(B,(C,(A,D)));",
                        "(B,(D,(A,C)));",
                        "(C,(A,(B,D)));",
                        "(C,(B,(A,D)));",
                        "(C,(D,(A,B)));",
                        "(D,(A,(B,C)));",
                        "(D,(B,(A,C)));",
                        "(D,(C,(A,B)));"]

def all_splits(leaves):
    n = len(leaves)
    for split_bit_pattern in range(1, 2**(n - 1)):
        split = ([], [])
        for i, leaf in enumerate(leaves):
            bit_pos = n - 1 - i
            split[((1 << bit_pos) & split_bit_pattern) >> bit_pos].append(leaf)
        yield split

def all_binary_rooted_trees(leaves):
    if len(leaves) == 1:
        yield leaves[0]
    else:
        for left_leaves, right_leaves in all_splits(leaves):
            for left_tree in all_binary_rooted_trees(left_leaves):
                for right_tree in all_binary_rooted_trees(right_leaves):
                    yield "(%s,%s)" % (left_tree, right_tree)

bb = [s + ";" for s in all_binary_rooted_trees("ABCD")]

trees = toytree.Multitree.MultiTree("\n".join(problem3_newick_list))
for tree in trees.treelist:
    tree.draw(use_edge_lengths=False)
### END SOLUTION

In [10]:
# tests for problem3_newick_list
# the line below will raise an exception if the strings are not valid
problem3_trees = toytree.Multitree.MultiTree("\n".join(problem3_newick_list)) 
### BEGIN HIDDEN TESTS
def standardize_newick(s):
    t = toytree.tree(s)
    t.treenode.sort_descendants()
    return t.write(tree_format=9)

problem3_newick_list_solution = [
    "((A,B),(C,D));",
    "((A,C),(B,D));",
    "((A,D),(B,C));",
    "(A,(B,(C,D)));",
    "(A,(C,(B,D)));",
    "(A,(D,(B,C)));",
    "(B,(A,(C,D)));",
    "(B,(C,(A,D)));",
    "(B,(D,(A,C)));",
    "(C,(A,(B,D)));",
    "(C,(B,(A,D)));",
    "(C,(D,(A,B)));",
    "(D,(A,(B,C)));",
    "(D,(B,(A,C)));",
    "(D,(C,(A,B)));"]

assert (sorted(map(standardize_newick, problem3_newick_list)) == 
        sorted(map(standardize_newick, problem3_newick_list_solution)))
### END HIDDEN TESTS
print("SUCCESS: problem3_newick_list passed all visible tests")

SUCCESS: problem3_newick_list passed all visible tests


## PROBLEM 4:  A primate phylogenetic tree (1 POINT)
[Perelman et al.](https://doi.org/10.1371/journal.pgen.1001342) sequenced DNA from 186 primate species and constructed one of the most refined phylogenetic trees of the primates to date.  This phyogenetic tree is provided as the file `primates.newick` in this activity.  

Examine this tree (you will likely need to use large "height" and "width" arguments to the `tree.draw` method to see the tree clearly).  Are humans more closely related to [galagos](https://en.wikipedia.org/wiki/Galago), [tarsiers](https://en.wikipedia.org/wiki/Tarsier), or [lemurs](https://en.wikipedia.org/wiki/Lemur), according to this tree?  Give your answer by assigning the string "galagos", "tarsiers", or "lemurs" to the variable `humans_most_closely_related_to`.

In [11]:
### BEGIN SOLUTION TEMPLATE=humans_most_closely_related_to=?
primates_tree_filename = "primates.newick"
primates_tree_newick = open(primates_tree_filename).read()
primates_tree = toytree.tree(primates_tree_newick)
primates_tree.draw(height=3000, width=2000)
humans_most_closely_related_to = "tarsiers"
### END SOLUTION

In [12]:
# tests for humans_most_closely_related_to
assert humans_most_closely_related_to in {"galagos", "tarsiers", "lemurs"}
### BEGIN HIDDEN TESTS
assert humans_most_closely_related_to == "tarsiers"
### END HIDDEN TESTS
print("SUCCESS: humans_most_closely_related_to passed all visible tests")

SUCCESS: humans_most_closely_related_to passed all visible tests
