# Map mutations on branches

Load our toy example

In [1]:
import scistree2 as s2
import numpy as np 
import pandas as pd
gp = s2.probability.from_csv('./data/toy_probs.csv')

Run inference

In [2]:
caller = s2.ScisTree2(threads=8) # use 8 threads
tree, imputed_genotype, likelihood = caller.infer(gp) # run Scistree2 inference
print('Imputed genotype from SPR: \n', imputed_genotype)
print('Newick of the SPR tree: ', tree)
print('Likelihood of the SPR tree: ', likelihood)

Imputed genotype from SPR: 
       cell1  cell2  cell3  cell4  cell5
snp1      1      0      1      0      0
snp2      0      1      0      1      0
snp3      1      0      1      0      0
snp4      0      0      0      0      1
snp5      1      0      1      0      0
snp6      1      1      1      1      0
Newick of the SPR tree:  (((cell1,cell3),(cell2,cell4)),cell5);
Likelihood of the SPR tree:  -6.271255186813891


Take a look at the inferred tree.

In [3]:
tree.draw()

           ┌cell5
 564f445691┤
           │                     ┌cell4
           │          ┌8aa238bf62┤
           │          │          └cell2
           └8c5a57e560┤
                      │          ┌cell3
                      └ba4d175664┤
                                 └cell1


You can now find out where mutations are placed on the tree. The `node.mutations` attribute of a branch's ending node provides the mutation profile for that branch.

In [4]:
for node in tree.get_all_nodes():
    print(f'Mutations at branch ending at {tree[node].name}:', tree[node].mutations)

Mutations at branch ending at cell5: ['snp4']
Mutations at branch ending at 8c5a57e560: ['snp6']
Mutations at branch ending at 8aa238bf62: ['snp2']
Mutations at branch ending at ba4d175664: ['snp1', 'snp3', 'snp5']
Mutations at branch ending at cell4: []
Mutations at branch ending at cell2: []
Mutations at branch ending at cell3: []
Mutations at branch ending at cell1: []
Mutations at branch ending at 564f445691: []


You can inject branch information into a Newick string by passing a custom function. This function extracts the desired data, such as the number of mutations, from each node and incorporates it into the string.

This allows you to represent the number of mutations as the branch length. With this Newick string, you can easily perform further analysis or visualization using other packages, for example, libraries like ete3 are well-suited for this purpose, although we won't cover that part here.

In [5]:
def get_num_mutations(node):
    return len(node.mutations)

print(tree.output(branch_length_func=get_num_mutations)) # Newick string format: branch lengths represent the number of mutations.

print(tree.output(branch_length_func=lambda x: len(x.mutations))) # or simply, using a lambda expression.

(((cell1:0,cell3:0):3,(cell2:0,cell4:0):1):1,cell5:1):0;
(((cell1:0,cell3:0):3,(cell2:0,cell4:0):1):1,cell5:1):0;
