You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for writing BTE, we in the Matsen lab have found it very useful.
I ran into a potential bug. I've been using the tree.translate() method to analyze codon-level mutations on an UShER tree. And for some nodes, the alternative codon appears to be incorrect. This always happens in cases where there are two nucleotide mutations to the same codon. For instance, the below table shows the set of translated mutations for the node Indonesia/NT-EIJK26/2020|EPI_ISL_766048|2020-04-14:
There are a total of three nucleotide mutations. The bottom two are in the same gene (spike) in the same codon (348). Given these mutations, it seems like the codon-level mutation should be GCA->CAA (reflecting both nucleotide mutations). Instead, it is GCA->CCA for both rows, which is correct for the single mutation in the middle row, but ignores the single mutation in the bottom row.
I appreciate that the output of tree.translate() (one entry per nucleotide mutation) makes it a little unclear how to report multi-nucleotide mutations. But just wanted to raise this in case you all had not considered it before.
The below is code what I've used to identify nodes that show this problem. Of the first 10,000 nodes in the tree I'm analyzing, 15 showed the problem.
Thanks!
import bte
from collections import Counter
# Read in tree
tree = bte.MATree(usher_tree_file)
# Get translated mutations
translations = tree.translate(ref_gtf, ref_fasta)
# Iterate over nodes and identify ones with potential problem
nodes = tree.depth_first_expansion()
problematic_nodes = []
for (i, node) in enumerate(nodes):
if node.id not in translations:
continue
codon_muts = translations[node.id]
counts = Counter((codon_mut.gene, codon_mut.aa_index, codon_mut.original_codon, codon_mut.alternative_codon) for codon_mut in codon_muts)
if max(counts.values()) > 1:
problematic_nodes.append(node)
if i > 10000:
break
print(len(problematic_nodes))
The text was updated successfully, but these errors were encountered:
Hi, thank you for raising this issue. Glad BTE has been of use to you and your team. Definitely looks like a bug.
BTE is largely a wrapper around UShER's core library, but does do some of its own custom processing for mutation translation, so the bug might be in BTE or in UShER upstream. Do you find that the tabular output for UShER's translate command exhibits the same behavior?
Also, could you attach a small file and command/script that reproduces the issue?
Hi, thank you for writing BTE, we in the Matsen lab have found it very useful.
I ran into a potential bug. I've been using the
tree.translate()
method to analyze codon-level mutations on an UShER tree. And for some nodes, the alternative codon appears to be incorrect. This always happens in cases where there are two nucleotide mutations to the same codon. For instance, the below table shows the set of translated mutations for the nodeIndonesia/NT-EIJK26/2020|EPI_ISL_766048|2020-04-14
:gene, aa_index, original_codon, alternative_codon, nuc
S, 347, TTT, TTG, T22603G
S, 348, GCA, CCA, G22604C
S, 348, GCA, CCA, C22605A
There are a total of three nucleotide mutations. The bottom two are in the same gene (spike) in the same codon (348). Given these mutations, it seems like the codon-level mutation should be GCA->CAA (reflecting both nucleotide mutations). Instead, it is GCA->CCA for both rows, which is correct for the single mutation in the middle row, but ignores the single mutation in the bottom row.
I appreciate that the output of
tree.translate()
(one entry per nucleotide mutation) makes it a little unclear how to report multi-nucleotide mutations. But just wanted to raise this in case you all had not considered it before.The below is code what I've used to identify nodes that show this problem. Of the first 10,000 nodes in the tree I'm analyzing, 15 showed the problem.
Thanks!
The text was updated successfully, but these errors were encountered: