## Sequence Alignment and phylogenetic tree

I wanted to create an image that had both a sequence alignment and a phylogenetic tree. To do this I found the ETE toolkit (http://etetoolkit.org/docs/latest/tutorial/tutorial_trees.html).

The protein sequence alignment comes from Muscle at the EBI website and downloaded the .clw and .nh files. Then I trimmed the alignment to include only relevant regions and made it a fasta. To make Figure 1 for the publication, I imported the svg into Adobe Illustrator and gave a red background to the specific lysine residues which were acetylated. The ETE toolkit did not give me facile control of specific residues within an alignment, so this was done by hand. The full alignment with colored residues as produced by the peptide homology viewer are located in the html files in the repo.

In [1]:
from ete3 import PhyloTree, TreeStyle, NodeStyle, AttrFace
from ete3.treeview import faces


#for the MSA - make sure background is white, and letters are black
for aa in faces._aabgcolors:
    faces._aabgcolors[aa] = "white"

for aa in faces._aafgcolors:
    faces._aafgcolors[aa] = "black"

    
## file i/o
enolase_tree = PhyloTree("K01689_tree_structure.nh") #from Muscle@EBI
FilePath =  "K01689__invariantRegion_trimmed.fasta"
enolase_tree.link_to_alignment(alignment=FilePath, alg_format="fasta")



#now try and put some colors on the nodes
#http://etetoolkit.org/docs/latest/reference/reference_treeview.html#ete3.SVG_COLORS    
#actinobacteria
a_style = NodeStyle()
a_style["bgcolor"] = "#b8ab88"
actino_root = enolase_tree.get_common_ancestor("B_bifidum", "B_infantis", "M_smegmatis", "C_gilvus", "M_luteus", "S_venezuelae")
actino_root.set_style(a_style)

#Firmicutes
f_style = NodeStyle()
f_style["bgcolor"] = "#ffffc0"
firmicutes_root = enolase_tree.get_common_ancestor("L_casei", "P_hydrogenalis")
firmicutes_root.set_style(f_style)

#bacteroidetes
b_style = NodeStyle()
b_style["bgcolor"] = "#d7914d"
bacteroides_root = enolase_tree.get_common_ancestor("B_thetatiotaomicron", "Algoriphagus_sp.")
bacteroides_root.set_style(b_style)

#proteobacteria
p_style = NodeStyle()
p_style["bgcolor"] = "#6fabbf"
proteobacteria_root1 = enolase_tree.get_common_ancestor("F_novicida", "D_acidovorans")
proteobacteria_root1.set_style(p_style)
p2_style = NodeStyle()
p2_style["bgcolor"] = "#c48d94"
proteobacteria_root2 = enolase_tree.get_common_ancestor("A_cryptum", "P_denitrificans")
proteobacteria_root2.set_style(p2_style)
proteobacteria_root3 = enolase_tree.get_common_ancestor("M_xanthus", "S_aurantiaca")
proteobacteria_root3.set_style(p2_style)


ts = TreeStyle()
ts.show_leaf_name = False
#enolase_tree.show(tree_style=ts)
enolase_tree.render("eno_tree_trimmed.svg", dpi=300, tree_style=ts)

{'faces': [[217.1469879518072, 290.0, 268.1469879518072, 307.0, 39, 'L_casei'],
  [366.08433734939763, 292.0, 366.08433734939763, 305.0, 39, None],
  [196.3277108433735, 171.0, 266.3277108433735, 188.0, 24, 'F_novicida'],
  [366.08433734939763, 173.0, 366.08433734939763, 186.0, 24, None],
  [204.75662650602408, 477.0, 289.7566265060241, 494.0, 59, 'P_ruminicola'],
  [366.08433734939763, 479.0, 366.08433734939763, 492.0, 59, None],
  [199.89879518072286, 494.0, 259.89879518072286, 511.0, 61, 'B_fragilis'],
  [366.08433734939763, 496.0, 366.08433734939763, 509.0, 61, None],
  [209.69397590361447, 188.0, 275.6939759036145, 205.0, 27, 'C_freundii'],
  [366.08433734939763, 190.0, 366.08433734939763, 203.0, 27, None],
  [181.78313253012044, 18.0, 280.78313253012044, 35.0, 4, 'L_pneumophila'],
  [366.08433734939763, 20.0, 366.08433734939763, 33.0, 4, None],
  [203.75180722891565, 324.0, 282.75180722891565, 341.0, 43, 'P_polymyxa'],
  [366.08433734939763, 326.0, 366.08433734939763, 339.0, 43, 