In [82]:
# This notebook uses Julia 0.4.0.

# A package that I found interesting is Phylogenetics.
# Phylogenetics offers tools to analyze evolution and evolutionary history
Pkg.init()
Pkg.add("Phylogenetics") # Install Phylogenetics.jl
Pkg.update()
using Phylogenetics # to use Phylogenetics

INFO: Initializing package repository /home/juser/.julia/v0.4
INFO: Package directory /home/juser/.julia/v0.4 is already initialized.
INFO: Nothing to be done
INFO: Updating METADATA...
INFO: Computing changes...
INFO: No packages to install, update or remove


In [22]:
# In the study of phylogeny, a common way to visualize evolution is with a phylogenetic tree. 
# This is a branching diagram that represents evolutionary relationships between species, 
# inferred by looking at similarities in their genome. Sometimes, branches are labeled with 
# distances. These trees are usually represented in Newick format (https://en.wikipedia.org/wiki/Newick_format)

# This package allows for the representation of Cladograms (no distances)
# and Phylogenetic trees (with distances) in Newick format. 

# Here is how to create a simple Cladogram "Clado" from Newick format
simple_clado = tr"(A,B,(C,D));"

# Notice that although we initially only entered 4 nodes into the tree, the result refers to nodes 5 and 6. 
# This is because to create this cladogram, 2 more nodes are necessary to properly combine the existing nodes. 
# See this image to see the other two nodes (called F and E) https://en.wikipedia.org/wiki/Newick_format#/media/File:NewickExample.svg




Phylogenetics.Clado("",5x2 Array{Int64,2}:
 5  1
 5  2
 5  6
 6  3
 6  4,2,AbstractString["A","B","C","D"],AbstractString["",""])

In [None]:
# You can also create a phylogenetic tree with distances using Newick format

simple_phylo = tr"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);"  


In [23]:
# One of the most useful functions in this package is getkids, which returns an array associating
# each node with its child nodes
getkids(simple_clado)

# If you refer to the image of the tree (https://en.wikipedia.org/wiki/Newick_format#/media/File:NewickExample.svg),
# the only two nodes with children are nodes F (child nodes A, B, E) and E (child nodes C, D)

6-element Array{Array{Int64,N},1}:
 Int64[]
 Int64[]
 Int64[]
 Int64[]
 [1,2,6]
 [3,4]  

In [43]:
# Another useful function is getroot, which finds the root node of the tree.
getroot(simple_clado.edge[:,2],simple_clado.edge[:,1])

# As you can see, it returns the correct root node (recall that node 5 maps to F on the phylogenetic tree image)

5

In [44]:
# Now, we can use these tools to represent a real life data set that examines the overlapping evolutionary history of 
# raccoons, bears, seals, sea lions, monkeys, cats, weasels, and dogs.

# We first construct the phylogenetic tree using Newick format

real_tree = tr"((raccoon:19.19959,bear:6.80041):0.84600,((sea_lion:11.99700, seal:12.00300):7.52973,((monkey:100.85930,cat:47.14069):20.59201, weasel:18.87953):2.09460):3.87382,dog:25.46154);"



Phylogenetics.Phylo("",13x2 Array{Int64,2}:
  9  10
 10   1
 10   2
  9  11
 11  12
 12   3
 12   4
 11  13
 13  14
 14   5
 14   6
 13   7
  9   8,6,AbstractString["raccoon","bear","sea_lion","seal","monkey","cat","weasel","dog"],[0.846,19.19959,6.80041,3.87382,7.52973,11.997,12.003,2.0946,20.59201,100.8593,47.14069,18.87953,25.46154],AbstractString["","","","","",""],-1.0)

In [64]:
# What if we wanted to know which of these other animals is most closely related to a raccoon evolutionarily?
# In a phylogenetic tree, it will be the animal that shares the most recent parent with the raccoon. 

# To figure this out, we first, we need to find out which index represents raccoon in this list.

# Here is the type definition for Phylo (which real_tree is a member of):
#immutable Phylo <: Phylogeny
#	name::String
#	edge::Array{Int,2}
#	Nnode::Int
#	tipLabel::Array{String}
#	edgeLength::Array{Float64}
#	nodeLabel::Array{String}
#	rootEdge::Float64
#end

# The component containing the node labels is "tipLabel", so we search for "raccoon" within this list to retrieve the index
raccoon_array = ["raccoon"]
index = findin(raccoon_array, real_tree.tipLabel)


1

In [81]:
# Now that we know raccoon represents index "index", we can use the get kids method again.
kids = getkids(real_tree)

for i = [1:length(kids)] #for every parent-child list
    childlist = kids[i]
    if length(findin(childlist, index)) > 0 #if the raccoon index can be found in the parent-child list
        if (i < length(real_tree.tipLabel)) #print out pertinent information about the parent and child
            print("Parent is ")
            print(real_tree.tipLabel[i])
        else 
            print("Parent is additional node ")
            print(i)
            print("\n")
            print("Children are: ")
            print(real_tree.tipLabel[childlist])
        end
    end
end

# The resulting print out tells us that the parent of the raccoon is a new node 10, and its "sibling" in the tree is 
# the bear. So, out of the animals in this list, the bear is the most closely related to the raccoon.


Parent is additional node 10
Children are: AbstractString["raccoon","bear"]