## Basic functions of `distmetric`: Working example demo
Goal: Demonstrate all the functions in distmetric and resulting outputs.

In [72]:
import distmetric
import toytree
import numpy as np
import pandas as pd

### 1) INPUT: Generating/Parsing trees
- Uses Newick (string type data) formatted trees (toytree)
- Using Generator class object

In [73]:
# generate 10 random trees as input with Generator class object, defaults to 10 tip trees
TREES = distmetric.Generator(10)
randomtrees = TREES.get_randomtrees()

In [74]:
# examining what the trees look like:
toytree.mtree(randomtrees).draw(ts = 'c', nrows=2, ncols=5);

In [75]:
randomtrees[1].write()

'((r9:833333,(r8:666667,r7:666667)100:166667)100:166667,((r6:666667,r5:666667)100:166667,(r4:666667,(r3:500000,(r2:333333,(r1:166667,r0:166667)100:166667)100:166667)100:166667)100:166667)100:166667);'

### 2) ANALYSIS: Calculate distances between trees in given user input
- 3 options (pairwise, random, relative to consensus tree)
    - Pairwise: compare tree 1 vs. 2, tree 2 vs. 3...etc.
    - Random: compare tree 1 vs. 8, tree 5 vs. 3...etc.
    - Consensus: compare every tree with consensus tree
- 2 methods (Robinson-Foulds, Quartets)
- Data output: pandas dataframe with two columns (trees that were compared, calculated distance metric)

In [76]:
# Quartets method + consensus example
quart = distmetric.Quartets(randomtrees, "consensus")
quart.run()
quart.output

Unnamed: 0,trees,Quartet_intersection
0,"0, consensus",0.814286
1,"1, consensus",0.838095
2,"2, consensus",0.866667
3,"3, consensus",0.942857
4,"4, consensus",0.809524
5,"5, consensus",1.0
6,"6, consensus",0.8
7,"7, consensus",1.0
8,"8, consensus",1.0
9,"9, consensus",0.728571


In [77]:
# RF method + pairwise example
rfs = distmetric.RF(randomtrees, "pairwise")
rfs.run()
rfs.output

Unnamed: 0,trees,RF
0,"0, 1",0.333333
1,"1, 2",0.277778
2,"2, 3",0.222222
3,"3, 4",0.277778
4,"4, 5",0.222222
5,"5, 6",0.222222
6,"6, 7",0.222222
7,"7, 8",0.0
8,"8, 9",0.277778


In [78]:
# RF method + random example
rfs = distmetric.RF(randomtrees, "random")
rfs.run()
rfs.output

Unnamed: 0,trees,RF
0,"9, 4",0.388889
1,"4, 3",0.277778
2,"3, 0",0.222222
3,"0, 8",0.166667
4,"8, 6",0.222222
5,"6, 1",0.222222
6,"1, 2",0.277778
7,"2, 5",0.166667
8,"5, 7",0.0


### 3) INTREPETING DATA: Summary statistics and histogram
- Summary statistics: mean, std, min, max
- Use numpy operations
- Histogram function is built into package to allow user to quickly see the distribution of resulting metrics, use toyplot for other types of plots

In [79]:
# summarize data
df = rfs.output
stats = distmetric.SumStat(df, "RF")
stats.get_sumstat()

('mean: 0.21604938271604937',
 'std: 0.09953404627529075',
 'min: 0.0',
 'max: 0.3888888888888889')

In [80]:
stats.histogram();