# ETE3 demo for Soltis Lab March 2017

Let's look at the __[ETE Toolkit](http://etetoolkit.org/)__ for working iwth phylogenetic trees in Python. You can __[download ETE from here](http://etetoolkit.org/download/)__. For this demo, we'll start by mostly owrking through the __[ETE tutorial](http://etetoolkit.org/docs/latest/tutorial/index.html)__. <br>

## Thinking about trees generally
As an intro to trees, the tutorial has this to say:<br>
<div class="alert alert-block alert-info">"In bioinformatics, trees are the result of many analyses, such as phylogenetics or clustering. Although each case entails specific considerations, many properties remains constant among them. In this respect, ETE is a python toolkit that assists in the automated manipulation, analysis and visualization of any type of hierarchical trees. It provides general methods to handle and visualize tree topologies, as well as specific modules to deal with phylogenetic and clustering trees."
</div>

## Let's go...
Import ete3 and play with some trees:

In [102]:
from ete3 import Tree

# Loads a tree structure from a newick string. The returned variable ’t’ is the root node for the tree.
t = Tree("(A:0.5,(B:1,(E:1,D:1):0.5):0.5);" )
print(t)


   /-A
--|
  |   /-B
   \-|
     |   /-E
      \-|
         \-D


### get_common_ancestor
We'll come back to fancy graphical trees later, but for now, we have a decent representation of a tree and can start doing things with it.

We can find the sub-tree that is the common ancestor of two tips. For example, E and B. This is done a __[bit later](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#find-the-first-common-ancestor)__ in the tutorial.

Remember that sub-trees are basically the same as trees, so we could get the leaves on that sub-tree with <code>get_leaves()</code>:

In [66]:
ancestor=t.get_common_ancestor("E", "B")
print(ancestor)
decendents=ancestor.get_leaves()
print (decendents)


   /-B
--|
  |   /-E
   \-|
      \-D
[Tree node 'B' (0x1131d132), Tree node 'E' (0x1131e2a9), Tree node 'D' (-0x7fffffffeece1dcb)]


### Searching
We can also search in trees, or test if a taxon is in a tree:

In [67]:
print(t.get_leaves_by_name("B"))
for taxon in ["A","X"]:
    if t.get_leaves_by_name(taxon):
        print("%s is in the tree" %(taxon))
    else:
        print("%s is not in the tree" %(taxon))

[Tree node 'B' (0x1131d132)]
A is in the tree
X is not in the tree


We can also search by name:

In [85]:
my_node = t.search_nodes(name = "A")
print (my_node)

[Tree node 'A' (0x1131d10f)]


### Custom searching functions
For more complex searches, you will need to make your own search function. Here's the one from __[this part of the tutorial](http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html#search-all-nodes-matching-a-given-criteria)__, modified a bit to print the nodes found (and re-naming the tree to t2 to not get rid of my tree).

In [105]:
def search_by_size(node, size):
    "Finds nodes with a given number of leaves"
    matches = []
    for n in node.traverse():
       if len(n) == size:
          matches.append(n)
    return matches

t2 = Tree()
t2.populate(40)
# returns nodes containing 6 leaves
subtrees= search_by_size(t2, size=6)
for node in subtrees:
    print (node)


      /-aaaaaaaaab
   /-|
  |  |   /-aaaaaaaaac
  |   \-|
--|      \-aaaaaaaaad
  |
  |   /-aaaaaaaaae
   \-|
     |   /-aaaaaaaaaf
      \-|
         \-aaaaaaaaag

      /-aaaaaaaabh
   /-|
  |   \-aaaaaaaabi
  |
--|      /-aaaaaaaabj
  |   /-|
  |  |   \-aaaaaaaabk
   \-|
     |   /-aaaaaaaabl
      \-|
         \-aaaaaaaabm


Let's try a search function to find branches under a set value:

In [106]:
def find_short_branches(node, length):
    "Finds nodes with branches under set length"
    matches=[]
    for n in node.traverse():
        if n.dist <= length:
            matches.append(n)
    return matches
subtrees= find_short_branches(t, 0.5)
for node in subtrees:
    print (node)


   /-A
--|
  |   /-B
   \-|
     |   /-E
      \-|
         \-D

--A

   /-B
--|
  |   /-E
   \-|
      \-D

   /-E
--|
   \-D


### Shortcuts
<div class="alert alert-block alert-info">"Finally, ETE implements a built-in method to find the first node matching a given name, which is one of the most common tasks needed for tree analysis. This can be done through the operator & (AND). Thus, TreeNode&”A” will always return the first node whose name is “A” and that is under the tree “MyTree”. The syntaxis may seem confusing, but it can be very useful in some situations."</div>

In [112]:
t = Tree("((H:0.3,I:0.1):0.5, A:1, (B:0.4,(C:1,(J:1, (F:1, D:1):0.5):0.5):0.5):0.5);")
# Get the node D in a very simple way
D = t&"D"
# Get the path from B to the root
node = D
path = []
while node.up:
  path.append(node)
  node = node.up
print (t)
# I substract D node from the total number of visited nodes
print ("There are", len(path)-1, "nodes between D and the root")
# Using parentheses you can use by-operand search syntax as a node
# instance itself
Csparent= (t&"C").up #MAG: Changed name of variable for consistency
Bsparent= (t&"B").up
Jsparent= (t&"J").up
# I check if nodes belong to certain partitions
print ("It is", Csparent in Bsparent, "that C's parent is under B's ancestor")
print ("It is", Csparent in Jsparent, "that C's parent is under J's ancestor")



      /-H
   /-|
  |   \-I
  |
--|--A
  |
  |   /-B
   \-|
     |   /-C
      \-|
        |   /-J
         \-|
           |   /-F
            \-|
               \-D
There are 4 nodes between D and the root
It is True that C's parent is under B's ancestor
It is False that C's parent is under J's ancestor
