# Calculate enrichment scores across tree - BEAST trees

This is an initial attempt to break this code into reasonable chunks that can be coded, tested, and executed separately. In this notebook, we will be reading in a tree, enumerating all the mutations on that tree, and calculating enrichment scores for each of them. These enrichment scores are based on [this code](https://github.com/sheppardlab/pGWAS/blob/master/assomap_given_phylo.py), written for [this paper](https://www.nature.com/articles/s41467-018-07368-7#Sec10) detailed in lines 245-273 and calculated from the following contingency table as: 

|host|presence|absence|
|:------|:-------|:------|
|host 1|A|B| 
|host 2|C|D|

where A, B, C, and D are counts of the mutation's presence and absence in host 1 and host 2. The odds ratio is then calculated as: `OR = (A * D)/(B * C)`

In this notebook, this code is written for parsing a tree json format, output from Nextstrain. In subsequent notebooks, I will alter this for running on beast trees. 


### A NOTE ON BALTIC: 
Currently, in the posterior set of trees, mutations are annotated as traits as `&typeTrait=domestic,mutations="G730A,A846G,C1203A,A1278G"`. However, baltic is not reading in all the mutations properly, instead only reading in the first mutation in the list. This is due to line 1116 of baltic, which attempts to find trait strings. In order to make this work, I added in a comma as an acceptable character in the 4th block of the string search. I saved the version with this small edit as `../baltic/baltic/baltic-modified-for-muts.py`. I will use that baltic version in this notebook. Additionally, for this to work, the typeTrait also needs to be a string. 


Also, a note on trees: as currently written, this does some switching between numNames and strain nameese. If the tree you are testing it on doesen't have those, the code will need to be changed. 

In [89]:
import glob, json
import re,copy, imp
import pandas as pd 
import numpy as np
from io import StringIO
import time
from Bio import SeqIO
from Bio.Seq import Seq


# for this to work, you will need to download the most recent version of baltic, available here 
#bt = imp.load_source('baltic', '../baltic/baltic/baltic.py')
bt = imp.load_source('baltic', '../baltic/baltic/baltic-modified-for-muts.py')

## Deal with posterior

In [90]:
def get_taxa_lines(tree_path):    

    lines_to_write = ""
    with open(tree_path, 'rU') as infile:
        for line in infile: ## iterate through each line
            if 'state' not in line.lower(): #going to grab all the interesting stuff in the .trees file prior to the newick tree strings
                lines_to_write = lines_to_write + line

    return(lines_to_write)

In [91]:
def convert_strain_to_number(taxa_lines):
    
    output_dict = {}
    
    translation_block = taxa_lines.split("Translate\n")[1]
    translation_list = translation_block.replace("\t","").split("\n")
    
    for t in translation_list: 
        information = t.lstrip().replace(",","")  # remove leading white spaces and commas
        
        if len(information.split(" ")) == 2:
            numeric_id = information.split(" ")[0]
            strain_name = information.split(" ")[1]
        
            output_dict[numeric_id] = strain_name
            
        else: 
            pass
        
    return(output_dict)

In [92]:
def convert_leaves_to_strains(input_leaves, strains_dict): 
    output_list = []
    
    for l in input_leaves: 
        strain_name = strains_dict[l]
        output_list.append(strain_name)
        
    return(output_list)

In [93]:
def get_burnin_value(tree_path, burnin_percent):
    with open(tree_path, 'rU') as infile:
        numtrees = 0
        for line in infile: ## iterate through each line
            if 'state' in line.lower(): #going to grab all the interesting stuff in the .trees file prior to the newick tree strings
                numtrees += 1
    
    burnin = numtrees * burnin_percent
    return(burnin)

# Translations and ancestral sequence reconstruction

In [94]:
def read_alignment(alignment_file):
    alignment_dict = {}

    for seq in SeqIO.parse(alignment_file, "fasta"):
        seqName = seq.description 
        sequence = str(seq.seq)
        alignment_dict[seqName] = sequence
        
    return(alignment_dict)

In [95]:
def return_cds_coordinates(genbank_ref_file):
    
    from Bio import GenBank
    with open(genbank_ref_file) as handle:
        for record in GenBank.parse(handle):

            # pull out the CDS feature; the gene coordinatees are in the feature.location. Get help with help(feauture)
            for f in record.features:
                if f.key == "CDS":
                    cds_start = int(f.location.split("..")[0])
                    cds_stop = int(f.location.split("..")[1])
                
    return(cds_start, cds_stop)

In [96]:
def return_mutations_on_branch(branch):
    if branch == None: 
        mutations = []
    elif "mutations" in branch.traits: 
        mutations = branch.traits["mutations"].split(",")
    else:
        mutations = []
    
    return(mutations)

In [97]:
def return_mutated_sequence(sequence, muts, cds_start, cds_stop):
    # make into a list because strings are immutable, while list are not
    mutated_sequence = list(sequence)
    
    for m in muts:
        site = int(m[1:-1])-1   # -1 is because of 0 indexing
        ancestral_nt = m[0]
        mutated_nt = m[-1]
        
        # since we are going backwards up the tree, we are reconstructing the ancestral sequence
        mutated_sequence[site] = ancestral_nt
    
    mutated_sequence = "".join(mutated_sequence)
    mutated_aa_sequence, aa_muts = return_mutated_aa_sequence(sequence, mutated_sequence, cds_start, cds_stop)
    # return a string
    return(mutated_sequence, mutated_aa_sequence, aa_muts)

In [98]:
def return_aa_sequence(sequence, cds_start, cds_stop):
    
    ha_cds = str(sequence)[cds_start-1:cds_stop-1]    # slice string based on cds coordinates
    ha_cds_seq = Seq(ha_cds)    # make it a Seq object
    translation = ha_cds_seq.translate()
    
    return(str(translation))

In [99]:
def return_mutated_aa_sequence(sequence, mutated_sequence, cds_start, cds_stop):
    
    ha_cds = str(sequence)[cds_start-1:cds_stop-1]    # slice string based on cds coordinates
    ha_cds_seq = Seq(ha_cds)    # make it a Seq object
    translation = ha_cds_seq.translate()
    mutated_translation = str(Seq(str(mutated_sequence)[cds_start-1:cds_stop-1]).translate())  # same as above but on 1 line
    
    aa_muts = []
    for i in range(len(translation)):
        if mutated_translation[i] != translation[i]:
            aa_mut = mutated_translation[i] + str(i+1) + translation[i]
            aa_muts.append(aa_mut)

    return(mutated_translation, aa_muts)

In [100]:
def return_all_parents(k, parents_dict, sequence, cds_start, cds_stop):
    mutations = return_mutations_on_branch(k)
    
    # if at root
    if k.parent == None:
        return(parents_dict)
    
    # if not at root yet
    elif k.branchType == "leaf":
        
        # do something else here....we've already recorded the mutations and stuff so we should just go up one
        parents_dict = return_all_parents(k.parent, parents_dict, sequence, cds_start, cds_stop)
    
    else:
        sequence, aa_sequence, aa_muts = return_mutated_sequence(sequence, mutations, cds_start, cds_stop)
        parents_dict[k] = {"nt_muts": mutations, "sequence":sequence, "aa_sequence":aa_sequence, 
                                  "aa_muts":aa_muts}
        parents_dict = return_all_parents(k.parent, parents_dict, sequence, cds_start, cds_stop)
        
    return(parents_dict)

In [101]:
def return_sequence_map(tree, alignment_dict, cds_start, cds_stop):
    
    all_nodes = {}

    for k in tree.Objects: 
        if k.branchType == "leaf":
            sequence = alignment_dict[k.name]
            aa_sequence = return_aa_sequence(sequence, cds_start, cds_stop)
            mutations = return_mutations_on_branch(k)
            mutated_sequence, mutated_aa_sequence, aa_muts = return_mutated_sequence(sequence, mutations, cds_start, cds_stop)
            all_nodes[k] = {"muts":mutations, "aa_muts":aa_muts, "leaves":"NA"}
            
            # parents dict will include all parental nodes from the tip back to the root with their mutations, 
            # nucleotide sequences, and names as 'branchName':{'nt_muts':[list of nt muts], 'sequence': str(nt sequence)}
            parents_dict = {}
            parents_dict = return_all_parents(k, parents_dict, sequence, cds_start, cds_stop)
            
            # make a master list of internal nodes we've already inferred to not repeat work
            for p in parents_dict:
                leaves = p.leaves
                sequence = parents_dict[p]['sequence']
                aa_sequence = parents_dict[p]['aa_sequence']
                aa_muts = parents_dict[p]['aa_muts']
                muts = parents_dict[p]['nt_muts']
                
                # check to see if the name matches and if the leaves match; sometimes baltic assigns the same numeric
                # name to 2 different nodes!
                if p in all_nodes:
                    if all_nodes[p]['leaves'] == leaves:
                        pass
                    else:
                        all_nodes[p] = {"muts":muts, "aa_muts":aa_muts, "leaves":leaves}
                else: 
                    all_nodes[p] = {"muts":muts, "aa_muts":aa_muts, "leaves":leaves}

    return(all_nodes)

## Infer each mutation that occurs across the tree

In [102]:
"""count the number of tips on the tree corresponding to each host category"""
def return_all_host_tips(tree):
    host_counts = {'human':0, 'domestic':0, 'wild':0}
    
    for k in tree.Objects: 
        if k.branchType == "leaf":
            host = k.traits['typeTrait']
            host_counts[host] += 1
    return(host_counts)

In [103]:
"""given a branch, return the mutations present on that branch"""

def return_muts_on_branch(branch):
    muts = []
    
    if 'mutations' in branch.traits:
        muts = branch.traits['mutations'].split(",")
                            
    return(muts)

In [104]:
"""this function does 3 things: 1. for each branch, it records the branch name and its branch length in a 
dictionary; 2. it adds up the total branch length on the tree. For the beast trees, we only have branch lengths 
in time. However, we can get a reasonable branch length (and I think this is perfectly reasonable for this purpose)
by just summing the total mutations on the branch and dividing by the total number of sites. The only purpose 
in this analysis for the total tree branch length is to get an idea of the number of mutations that should 
occur across the tree. So this should work.  3. Gather all mutations across the tree."""

def return_total_tree_branch_length(tree, n_sites_alignment, sequence_map):
    total_branch_length = 0
    branch_lengths = {}
    all_nt_muts = []
    all_aa_muts = []
    
    for k in tree.Objects:
        muts_on_branch = return_muts_on_branch(k)
        aa_muts_on_branch = sequence_map[k]['aa_muts']
        
        all_nt_muts.extend(muts_on_branch)
        all_aa_muts.extend(aa_muts_on_branch)
        
        branch_length_time = k.length
        branch_length_divergence = len(muts_on_branch)/n_sites_alignment
                
        total_branch_length += branch_length_divergence
        branch_lengths[k] = branch_length_divergence
    
    all_nt_muts = list(set(all_nt_muts))
    all_aa_muts = list(set(all_aa_muts))

    return(total_branch_length, branch_lengths, all_nt_muts, all_aa_muts)

In [105]:
"""return the total number of times that a particular mutation arises on the phylogeny. This includes instances 
of the mutation on internal nodes and on tips and counts each with the same weight."""

def return_number_times_on_tree(tree, mut):
    times_on_tree = 0
    
    for k in tree.Objects:
        if 'mutations' in k.traits:
        
            nt_muts = k.traits['mutations'].split(",")
            if mut in nt_muts: 
                times_on_tree += 1
                                                
    return(times_on_tree)

In [106]:
"""return the total number of times that the mutation arises on the phylogeny. This includes instances of mutation 
on internal nodes and on tips and counts each with the same weight"""

def return_branch_length_mut_on_tree(tree, mut,n_sites_alignment):
    branch_length = 0
            
    for k in tree.Objects:
        if 'mutations' in k.traits:
            nt_muts = k.traits['mutations'].split(",")
                
            # if the mutation arises on this branch
            if mut in nt_muts: 
                branch_length_div = len(nt_muts)/n_sites_alignment
                    
                branch_length += branch_length_div
                    
    return(branch_length)

In [107]:
"""Given a starting internal node, and a tip you would like to end at, traverse the full path from that node to
tip. Along the way, gather mutations that occur along that path. Once you have reached the ending 
tip, return the list of mutations that fell along that path. Input for the ending tip here is a tip name, while 
the starting node is a node object"""

def return_all_muts_on_path_to_tip(starting_node, ending_tip, muts, aa_muts, strains_dict, sequence_map):
    
    # set an empty list of mutations and enumerate the children of the starting node; children can be tips or nodes
    children = starting_node.children
    
    for child in children:
        local_muts = []
        local_aa_muts = []
        
        """if the child is a leaf: if leaf is the target end tip, add the mutations that occur on that branch to 
        the list and return the list; if leaf is not the target end tip, move on"""
        """if the child is an internal node: first, test whether that child node contains the target tips in its 
        children. child.leaves will output a list of the names of all tips descending from that node. If not, pass. 
        if the node does contain the target end tip in its leaves, keep traversing down that node recursively, 
        collecting mutations as you go"""

        if child.branchType == "leaf":
            if child.name != ending_tip:
                pass
            elif child.name == ending_tip:
                host = child.traits["typeTrait"]
                local_muts = return_muts_on_branch(child)
                local_aa_muts = sequence_map[child]['aa_muts']
                muts.extend(local_muts)
                aa_muts.extend(local_aa_muts)
                return(host, muts, aa_muts)
        
        elif child.branchType == "node":
            strain_leaves = convert_leaves_to_strains(child.leaves, strains_dict)
            if ending_tip not in strain_leaves:
                pass
            else:
                local_muts = return_muts_on_branch(child)
                local_aa_muts = sequence_map[child]['aa_muts']
                muts.extend(local_muts)
                aa_muts.extend(local_aa_muts)
                host, muts, aa_muts = return_all_muts_on_path_to_tip(child, ending_tip, muts, aa_muts, strains_dict, sequence_map)
    
    return(host, muts, aa_muts)

In [108]:
"""at times, will need to check whether the revertant mutation occcurs downstream. Return the revertant mutation"""

def return_opposite_mutation(mut):
    
    site = mut[1:-1]
    ref = mut[0]
    alt = mut[-1]
    opposite = alt+site+ref
    
    return(opposite)

In [120]:
"""given a tree, mutation, and gene, return the number of times that mutation is present in each host"""

def return_host_distribution_mutation(tree, mut, strains_dict, sequence_map):
    
    host_counts_dict = {'human':0, 'domestic':0, 'wild':0}
    back_mutation = return_opposite_mutation(mut)
    
    # iterate through tree
    for k in tree.Objects:
        if 'mutations' in k.traits:
            nt_muts = k.traits['mutations'].split(",")
                
            # if we have reached a node or tip in the tree with the target mutation, enumerate descendants
            if mut in nt_muts: 
                    
                # if the mutation occurs on a leaf, record the host and move on 
                if k.branchType == 'leaf':
                    host = k.traits['typeTrait']
                    host_counts_dict[host] += 1
                    
                # else, if the mutation occurs on a node, traverse the children and return host
                elif k.branchType == "node":
                    all_leaves = k.leaves
                    for leaf in all_leaves: 
                        muts = []
                        aa_muts = []
                        strain_name = strains_dict[leaf]
                        host, muts, aa_muts = return_all_muts_on_path_to_tip(k, strain_name, muts, aa_muts, strains_dict, sequence_map)
                        
                        if back_mutation in muts: 
                            pass
                        elif back_mutation and mut in muts:  # if both the mutation and backmutation occur, print
                            print("something odd happened",leaf, back_mutation, mut)
                        else:
                            host_counts_dict[host] += 1
                                
    return(host_counts_dict)

In [121]:
"""given a tree, mutation, and gene, return the number of times that mutation is present in each host"""

def return_host_distribution_aa_mutation(tree, aa_mut, strains_dict, sequence_map):
    
    aa_host_counts_dict = {'human':0, 'domestic':0, 'wild':0}
    back_mutation = return_opposite_mutation(aa_mut)
    
    # iterate through tree
    for k in tree.Objects:
        aa_muts = sequence_map[k]['aa_muts']
                
        # if we have reached a node or tip in the tree with the target mutation, enumerate descendants
        if aa_mut in aa_muts: 
                    
            # if the mutation occurs on a leaf, record the host and move on 
            if k.branchType == 'leaf':
                host = k.traits['typeTrait']
                aa_host_counts_dict[host] += 1
                    
            # else, if the mutation occurs on a node, traverse the children and return host
            elif k.branchType == "node":
                all_leaves = k.leaves
                for leaf in all_leaves: 
                    muts = []
                    aa_muts = []
                    strain_name = strains_dict[leaf]
                    host, muts, aa_muts = return_all_muts_on_path_to_tip(k, strain_name, muts, aa_muts, strains_dict, sequence_map)
                        
                    if back_mutation in aa_muts: 
                        pass
                    elif back_mutation and aa_mut in aa_muts:  # if both the mutation and backmutation occur, print
                        print("something odd happened",leaf, back_mutation, aa_mut)
                    else:
                        aa_host_counts_dict[host] += 1
                                
    return(aa_host_counts_dict)

## Calculate the enrichment scores

In [122]:
"""calculate an enrichment score for an individual mutation, based on the counts across hosts"""

def calculate_enrichment_score_counts(mut_counts_dict, host1, host2, host_counts):
    total_host1_tree = host_counts[host1]
    total_host2_tree = host_counts[host2]
    
    mut_host1 = mut_counts_dict[host1]
    mut_host2 = mut_counts_dict[host2]
    
    # this is calculating this table as counts
    presence_host1 = mut_host1
    absence_host1 = total_host1_tree - mut_host1
    presence_host2 = mut_host2
    absence_host2 = total_host2_tree - mut_host2
    
    if presence_host2 == 0:
        presence_host2 = 1
    if absence_host1 == 0:
        absence_host1 = 1

    # this score is calculated in terms of its enrichment in host 1
    score = (presence_host1 * absence_host2)/(presence_host2 * absence_host1)
#     score = (presence_host1 + absence_host2) - (presence_host2 + absence_host1)
    return(score)

In [123]:
"""calculate an enrichment score for an individual mutation, based on the counts across hosts"""

def calculate_enrichment_score_proportions(mut_counts_dict, host1, host2, host_counts):
    total_host1_tree = host_counts[host1]
    total_host2_tree = host_counts[host2]
    
    mut_host1 = mut_counts_dict[host1]
    mut_host2 = mut_counts_dict[host2]
    
    total_tips_in_tree = total_host1_tree + total_host2_tree
    
    # this is calculating this table as proportions
    presence_host1 = (mut_host1)/total_tips_in_tree
    absence_host1 = (total_host1_tree - mut_host1)/total_tips_in_tree
    presence_host2 = (mut_host2)/total_tips_in_tree
    absence_host2 = (total_host2_tree - mut_host2)/total_tips_in_tree
    
#     if presence_host2 == 0:
#         presence_host2 = 1
#     if absence_host1 == 0:
#         absence_host1 = 1

    # this score is calculated in terms of its enrichment in host 1
    #score = (presence_host1 * absence_host2)/(presence_host2 * absence_host1)
    score = (presence_host1 + absence_host2) - (presence_host2 + absence_host1)
    return(score)

In [112]:
"""for a tree and all amino acid mutations, calculate the enrichment scores across the tree"""
def calculate_enrichment_scores(tree, nt_muts, host1, host2, min_required_count, method, host_counts, n_sites_alignment, strains_dict, sequence_map):
    scores = []
    scores_dict = {}
    times_detected_dict = {}
    branch_lengths_dict = {}
    host_counts_dict2 = {}
    
    if method == "counts":
        enrichment_calculation_function = calculate_enrichment_score_counts
    elif method == "proportions":
        enrichment_calculation_function = calculate_enrichment_score_proportions
    
    for n in nt_muts:
        times_detected = return_number_times_on_tree(tree, n)
        times_detected_dict[n] = times_detected

        branch_length_mut = return_branch_length_mut_on_tree(tree, n, n_sites_alignment)
        branch_lengths_dict[n] = branch_length_mut

        host_counts_dict = return_host_distribution_mutation(tree, n, strains_dict, sequence_map)
        host_counts_dict2[n] = host_counts_dict
        total_tips_with_mut = host_counts_dict['human'] + host_counts_dict['domestic'] + host_counts_dict['wild']

        if total_tips_with_mut >= min_required_count:
            enrichment_score = enrichment_calculation_function(host_counts_dict, host1,host2, host_counts)
            #print(enrichment_score, total_tips_with_mut, host_counts_dict)
            scores.append(enrichment_score)
            scores_dict[n] = enrichment_score
            
            
    return(scores, scores_dict, times_detected_dict, branch_lengths_dict, host_counts_dict2)

In [125]:
"""for a tree and all amino acid mutations, calculate the enrichment scores across the tree"""
def calculate_enrichment_scores_aa_muts(tree, aa_muts, host1, host2, min_required_count, method, host_counts, n_sites_alignment, strains_dict, sequence_map):
    scores = []
    scores_dict = {}
    times_detected_dict = {}
    branch_lengths_dict = {}
    host_counts_dict2 = {}
    
    if method == "counts":
        enrichment_calculation_function = calculate_enrichment_score_counts
    elif method == "proportions":
        enrichment_calculation_function = calculate_enrichment_score_proportions
    
    for a in aa_muts:
        times_detected = return_number_times_on_tree(tree, a)
        times_detected_dict[a] = times_detected

        branch_length_mut = return_branch_length_mut_on_tree(tree, a, n_sites_alignment)
        branch_lengths_dict[a] = branch_length_mut

        host_counts_dict = return_host_distribution_aa_mutation(tree, a, strains_dict, sequence_map)
        host_counts_dict2[a] = host_counts_dict
        total_tips_with_mut = host_counts_dict['human'] + host_counts_dict['domestic'] + host_counts_dict['wild']

        if total_tips_with_mut >= min_required_count:
            enrichment_score = enrichment_calculation_function(host_counts_dict, host1,host2, host_counts)
            #print(enrichment_score, total_tips_with_mut, host_counts_dict)
            scores.append(enrichment_score)
            scores_dict[a] = enrichment_score
            
            
    return(scores, scores_dict, times_detected_dict, branch_lengths_dict, host_counts_dict2)

## Run on posterior

In [127]:
def run_on_posterior_trees(all_trees, burnin, n_sites_alignment, genbank_ref_file, alignment):
    start_time = time.time()
    all_scores = {}
    all_scores_dict = {}
    all_times_detected_dict = {}
    all_branch_lengths_dict = {}
    all_host_counts_dict2 = {}
    
    cds_start, cds_stop = return_cds_coordinates(genbank_ref_file)
    alignment_dict = read_alignment(alignment)

    with open(all_trees, "r") as infile:
        
        taxa_lines = get_taxa_lines(all_trees)
        strains_dict = convert_strain_to_number(taxa_lines)

        tree_counter = 0
        muts = []

        for line in infile:
            if 'tree STATE_' in line:
                tree_counter += 1
                
                if tree_counter >= burnin:
                    temp_tree = StringIO(taxa_lines + line)
                    tree = bt.loadNexus(temp_tree)
                    
                    # generate the sequence map, which maps for each branch the mutations, aa muts, and sequences
                    sequence_map = return_sequence_map(tree, alignment_dict, cds_start, cds_stop)

                    host_counts = return_all_host_tips(tree)
                    x, y, all_nt_muts, all_aa_muts = return_total_tree_branch_length(tree, n_sites_alignment, sequence_map)
                    
                    # to run on nucleotides instead, switch the function
                    scores, scores_dict, times_detected_dict, branch_lengths_dict, host_counts_dict2 = calculate_enrichment_scores_aa_muts(tree, all_aa_muts, "human","domestic", 1, "counts",host_counts, n_sites_alignment, strains_dict, sequence_map)
                    all_scores[tree_counter] = scores
                    all_scores_dict[tree_counter] = scores_dict
                    all_times_detected_dict[tree_counter] = times_detected_dict
                    all_branch_lengths_dict[tree_counter] = branch_lengths_dict
                    all_host_counts_dict2[tree_counter] = host_counts_dict2

    # print the amount of time this took
    total_time_seconds = time.time() - start_time
    total_time_minutes = total_time_seconds/60
    print("this took", total_time_seconds, "seconds (", total_time_minutes," minutes) to run on", tree_counter, "trees")
    return(all_scores, all_scores_dict, all_times_detected_dict, all_branch_lengths_dict, all_host_counts_dict2)

In [140]:
n_sites_alignment = 1762
min_required_count = 1
burnin_percent = 0.99

alignment = "../../h5n1-host-classification/beast/alignments/aligned_h5n1_ha-3deme-1per-country-month-host-downsampled-bad-dates-2021-06-09-with-annotations-2021-07-06.fasta"
genbank_ref_file = "../test-data/reference_h5n1_ha.gb"
all_trees = "../../h5n1-host-classification/beast/beast-runs/2022-04-19-mascot-3deme-skyline-fixed-muts-logger/it3/2022-04-19-mascot-3deme-skyline-tipdates.muts.trees"

In [141]:
taxa_lines = get_taxa_lines(all_trees)
burnin = get_burnin_value(all_trees, burnin_percent)
print(burnin)

429.65999999999997


  after removing the cwd from sys.path.
  


In [142]:
scores, scores_dict, times_detected_dict, branch_lengths_dict, host_counts_dict2 = run_on_posterior_trees(all_trees, burnin, n_sites_alignment, genbank_ref_file, alignment)

  after removing the cwd from sys.path.


something odd happened 362 R205K K205R
something odd happened 222 P157S S157P
something odd happened 222 P157S S157P
this took 110.43089318275452 seconds ( 1.840514886379242  minutes) to run on 434 trees


In [143]:
# C470T is A150V
# for s in scores_dict: 
#     print(scores_dict[s]['C470T'])
#
for s in scores_dict: 
    print(scores_dict[s]['A150V'])

13.875
13.875
13.875
13.875
13.875


In [139]:
scores_dict

{434: {'A143V': 0.0,
  'S239R': 0.7572590011614402,
  'A201E': 0.0,
  'R339K': 1.3346774193548387,
  'M5T': 2.6774193548387095,
  'V63G': 2.685483870967742,
  'I87T': 0.035959809624537285,
  'N252S': 2.6774193548387095,
  'T204V': 0.0,
  'G104D': 0.0,
  'K178T': 2.685483870967742,
  'R343K': 2.685483870967742,
  'R205K': 1.8036290322580646,
  'N140D': 0.3623812085350547,
  'R205M': 3.1519396551724137,
  'S137Y': 0.0,
  'A279T': 0.22131447384465308,
  'M448L': 0.0,
  'V364I': 0.0,
  'P210Q': 0.0,
  'N406K': 2.685483870967742,
  'N268Y': 2.685483870967742,
  'P337L': 0.0,
  'M82L': 0.05498502353444587,
  'R239S': 0.0,
  'T156A': 0.0,
  'I204K': 0.0,
  'Q223L': 0.6885964912280702,
  'T211N': 0.0,
  'R178K': 0.5194188722669736,
  'V11I': 0.15646484808161454,
  'L13I': 0.7836206896551724,
  'I132M': 2.685483870967742,
  'K30R': 0.0,
  'Q31R': 0.0,
  'R341T': 0.0,
  'S72R': 2.6774193548387095,
  'G286E': 0.0,
  'N100K': 0.0,
  'R473K': 0.07206284153005464,
  'I178M': 0.0,
  'T183I': 0.0,
  '