## Plotting trees with iTOL

This is a notebook that provides a basic tutorial on how to plot trees with [iTOL](https://itol.embl.de/).

At the very least you'll need a tree (in newick format) to be plotted. You can also annotate the leaves with various other pieces of information, both discrete and continuous values. Here we'll just focus on plotting continuous values.

To get started, you'll need to make sure you have `ete3`, `pandas`, and `itolapi` installed (all of them should be pip installable).

_Note_: for some reason, iTOL requires that the tree you're going to be uploaded end with the suffix `.tree`. Keep that in mind if you see some cryptic errors!

In [19]:
import os
%load_ext autoreload
%autoreload 2

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [41]:
import os
import sys

from ete3 import Tree
import pandas as pd

sys.path.append('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots')
import plot_tree_itol, itol_manager

In [42]:
#tree_name = '/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/3726_NT_T1_tree.processed.tree'
# tree_name = "../../data/Cassiopeia_trees/lg7_tree_hybrid_priors.alleleThresh.collapsed.txt"
tree_name = "../../data/Cassiopeia_trees/tree_test.txt"
with open (tree_name, "r") as myfile:
    tree_string = myfile.readlines()
tree = Tree(tree_string[0], 1)

#tree = Tree(tree_name, 1)
# read in data to be plotted
# continuous data

#fitness = pd.read_csv(f"/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/mean_fitness.3726_NT_T1.txt", sep='\t', index_col = 0)
ge = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/GE-1-imputed.txt',
                  sep ='\t',
                  index_col = 0)

ge_leaves = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/GE-1-leaves.txt',
                  sep ='\t',
                  index_col = 0)

ge_gt = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/GE-1-gt.txt',
                  sep ='\t',
                  index_col = 0)

ge_avg = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/GE-1-avg.txt',
                  sep ='\t',
                  index_col = 0)

ge_scvi = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/GE-1-scvi.txt',
                  sep ='\t',
                  index_col = 0)

variance = pd.read_csv('/Users/khalilouardini/Desktop/projects/scVI/scvi/notebooks/plots/variance.txt',
                  sep ='\t',
                  index_col = 0)

ge.head(5), ge_leaves.head(5), ge_gt.head(5), ge_avg.head(5), ge_scvi.head(5), variance.head(5)

(                                                    GE-1-imputed
 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0...             0
 0|0|0|0|0|0|0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0...             0
 2|2|2|2|6|11|0|9|2|0|2|0|7|4|0|0|0|0|6|4|0|4|0|...             0
 6|3|0|4|9|0|8|5|0|0|4|10|11|0|0|0|0|0|2|9|0|0|0...             0
 0|2|2|2|2|0|2|0|2|2|2|0|2|0|0|0|0|0|0|0|0|2|0|3...             0,
                        GE-1-leaves
 L6.GTTTCTAGTGAAGGCT-1            0
 L6.CCCTCCTCAGCTGCTG-1            4
 L6.TGGCGCATCTCGTATT-1            3
 L6.CGTGAGCTCACTCCTG-1            2
 L6.CCTCTGAAGAAACGAG-1            0,
                                                     GE-1-groundtruth
 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0...                 2
 0|0|0|0|0|0|0|0|0|2|0|0|0|0|0|0|0|0|0|0|0|0|0|0...                 0
 2|2|2|2|6|11|0|9|2|0|2|0|7|4|0|0|0|0|6|4|0|4|0|...                 0
 6|3|0|4|9|0|8|5|0|0|4|10|11|0|0|0|0|0|2|9|0|0|0...                 1
 0|2|2|2|2|0|2|0|2|2|2|0|2|0|0|0|0|0

### Set up iTOL

You'll have to create your own iTOL account using the link above and then find your own api key which will let you "batch" upload trees to the server. 

Aftering doing this, we're going to create the requisite files for plotting the tree with the leaves annotated by fitness. This makes use of the `plot_tree_itol` utilities file.

In [43]:
apiKey = 'GMkmiUxXIyzqip9q2YxVyg'
projectName = 'CasVI'
tree_out = "Collapsed Tree gt"

GE color gradient

In [44]:
files = []
files += itol_manager.create_gradient_from_df(ge_leaves, tree, 'GE-1-leaves')

Annotations

In [45]:
files += [itol_manager.create_annotation_file_for_itol(tree, ge, 'annotations/annotations_imputed.txt')]
files += [itol_manager.create_annotation_file_for_itol(tree, ge_gt, 'annotations/annotations_gt.txt')]
files += [itol_manager.create_annotation_file_for_itol(tree, ge_avg, 'annotations/annotations_avg.txt')]
files += [itol_manager.create_annotation_file_for_itol(tree, ge_scvi, 'annotations/annotations_scvi.txt')]
#files += [itol_manager.create_annotation_file_for_itol(tree, variance, 'annotations/variance_analysis.txt')]

In [46]:
files

['GE-1-leaves.GE-1-leaves.txt',
 'annotations/annotations_imputed.txt',
 'annotations/annotations_gt.txt',
 'annotations/annotations_avg.txt',
 'annotations/annotations_scvi.txt']

In [47]:
plot_tree_itol.upload_to_itol(tree, apiKey, projectName, 
                              files = files, outfp = tree_out,
                              tree_name = 'collapsed_tree_gt')

iTOL output: ERR 8: In annotations_avg.txt: Couldn't find ID 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 in the tree
ERR 8: In annotations_gt.txt: Couldn't find ID 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 in the tree
ERR 8: In annotations_imputed.txt: Couldn't find ID 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 in the tree
ERR 8: In annotations_scvi.txt: Couldn't find ID 0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0 in the tree
SUCCESS: 18423239310401607468647

Tree Web Page URL: http://itol.embl.de/external.cgi?tree=18423239310401607468647&restore_saved=1


In [18]:
# clean up workspace
for file in files:
    os.remove(file)