# Brief Tutorial on iPython Notebooks
+ What is an iPython Notebook?

Briefly, an iPython notebook is a format for running and visualizing code. It takes advantage of your browser's ability to render attractive HTML documents in order to weave together words, code fragments and output.

+ What is it good for?

Teaching, since they allow for notes to be interspersed in the code. Visualizing and sharing notes.


Which is what we're going to do today.

In [None]:
import dendropy
from dendropy.calculate import treemeasure
import pandas
import sys
import glob
import numpy as np

Above, I created a couple global variables. I thought I might need them.

In [None]:
def initializer():
#Load Tree
	tree = dendropy.Tree.get(path='test/p3p511.tre', schema="nexus", rooting="default-unrooted")
#Get Edges from tree
	edges = [edge.length for edge in tree.preorder_edge_iter()]
	edges[0] = 0
#Start a pandas dataframe
	df = pandas.DataFrame(pandas.Series(edges, edges),columns=['true'])
#Use the correct edges as the header
	return(df)

Above, I read in a tree from standard input (the command line), extracted the branch lengths, and loaded them into a pandas dataframe

In [None]:
initializer()

In [None]:
def get_tree_list():
	container = [file for file in glob.glob('test/*.con')]
	treelist = dendropy.TreeList()	
	for file in container:
		print("processing file %s" % file)
		tree = dendropy.Tree.get(path=file, schema="nexus", extract_comment_metadata=True, rooting="default-unrooted")
		treelist.append(tree)
	return(treelist, container)

Above, what we do is use glob to find all the files with a certain extension. Then, we iterate over those files, reading them in with Dendropy and parsing any annotations on them. Finally, we return the filename and the annotations to use in the next function

In [None]:
get_tree_list()

In [None]:
def proc_trees(treelist):
	print treelist
	df_list = []
	for tree in treelist:
		print('Calculating tree: %s' % tree)
		node_hpd = [nd.annotations.findall(name='length_hpd95') for nd in tree.preorder_node_iter()]
		node_med = [nd.annotations.findall(name='length_median') for nd in tree.preorder_node_iter()]
		kvs = [nd.values_as_dict() for nd in node_hpd]
		gnocchi = [kv.values() for kv in kvs]
		max = [float(line[0][1]) for line in gnocchi]
		min = [float(line[0][0]) for line in gnocchi]
		df['min'] = pandas.Series(min, index=df.index)
		df['max'] = pandas.Series(max, index=df.index)
		df['boolcol'] = df['min'] < df['true']
		df['boolcolmax'] = df['max'] > df['true']
		kvs = [nd.values_as_dict() for nd in node_med]
		gnocchi = [kv.values() for kv in kvs]
		med = [float(line[0]) for line in gnocchi]
		df['med'] = pandas.Series(med, index=df.index)
		df['devcol'] = df['med'] - df['true']
		df_list.append(df)
	return(df_list)

This takes the values in node_hpd and breaks them apart into individual lists of values.

Then we crunch them into a pandas dataframe and create two column, which are boolean. 

In [None]:

def count_correct(df_list, container):
	for file in container:
		print('Exporting %s' % file)
		for df in df_list:
			min_true = (df.boolcol==True).sum()
			max_true = int((df.boolcolmax==True).sum())
			count = int(df.boolcol.count())
			df.to_csv("%s.csv" % file)
		print('Number of nodes above minimum age %s \n \
		Number of nodes under maximum: %s' % (min_true, max_true))
	return(min_true, max_true)	
		

So there, all our functions are defined. Now we can call them all.

In [None]:
if __name__ == "__main__":
	df = initializer()
	treelist, container = get_tree_list()
	df_list = proc_trees(treelist)
	min, max = count_correct(df_list, container)


In [None]:
print(df)