Empty Eventalign.txt for Negative Sense Viral RNA #151

AlexFitzgeraldBryan · 2024-02-17T16:53:58Z

I've previously posted a response to another issue as i've been having trouble with some of my data. I'm just going to briefly recap what i wrote as i think the original comment was probably missed:

I've been having with issues with analysis of negative sense viral RNA genomes. I figured the issue was likely something to do with the eventalign.txt file as the test data was working fine, so i spent an afternoon trying to break the eventalign.txt file manually. When I combined parts of the test data with my own, the dataprep process was terminating as soon as it hit my data, the only difference I could discern was that the test data had the same sequence in both the reference and the model kmer columns and my data contained a reverse complement sequence in the reference_kmer.

When I filter my eventalign.txt files to exclude any columns where the reference and model kmers were not matching, m6anet was able to carry out the dataprep and inference steps "normally", resulting in filled data.json and data.log files, however this alone isn't really a fix as you may filter out huge amounts of data (i would assume my NS genomic RNA?). My guess is that this might have something to do with the way that nanopolish is handling the read orientation of the sequence and that the negative sense RNA genomes are having their reference_kmer reversed for alignment to the genome, something that would work with mRNA but doesn't with negative sense genomes.

Looking through the dataprep_utils.py file i found the combine() function at line 279 ( cond_successfully_eventaligned = eventalign_result['reference_kmer'] == eventalign_result['model_kmer'] ). Looks like this function is checking whether the value in these two columns are equal and will halt the process if they are not....I assume this is used to account for NNNNN model kmers?

Is there a simple work around for this? I'm not able to understand how nanopolish generates these kmer_models or how important the specific sequence orientation is for the dataprep step

edit: Thanks in advance, you guys have built a great tool here!

yuukiiwa · 2024-02-27T00:28:02Z

Hi @AlexFitzgeraldBryan,

Here are several things I can think of that you can try:

Not sure whether it makes sense to convert your viral RNA genome to its reverse complement because we normally align our samples to cDNA(aka. transcriptome) for xpore and m6anet
f5c eventalign has an --rna option, I know that it points to the kmer model, but not sure whether that would convert the reference kmer to reverse complement

Thanks!

Best wishes,
Yuk Kei

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Empty Eventalign.txt for Negative Sense Viral RNA #151

Empty Eventalign.txt for Negative Sense Viral RNA #151

AlexFitzgeraldBryan commented Feb 17, 2024 •

edited

Loading

yuukiiwa commented Feb 27, 2024

Empty Eventalign.txt for Negative Sense Viral RNA #151

Empty Eventalign.txt for Negative Sense Viral RNA #151

Comments

AlexFitzgeraldBryan commented Feb 17, 2024 • edited Loading

yuukiiwa commented Feb 27, 2024

AlexFitzgeraldBryan commented Feb 17, 2024 •

edited

Loading