Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty Eventalign.txt for Negative Sense Viral RNA #151

Open
AlexFitzgeraldBryan opened this issue Feb 17, 2024 · 1 comment
Open

Empty Eventalign.txt for Negative Sense Viral RNA #151

AlexFitzgeraldBryan opened this issue Feb 17, 2024 · 1 comment

Comments

@AlexFitzgeraldBryan
Copy link

AlexFitzgeraldBryan commented Feb 17, 2024

I've previously posted a response to another issue as i've been having trouble with some of my data. I'm just going to briefly recap what i wrote as i think the original comment was probably missed:

I've been having with issues with analysis of negative sense viral RNA genomes. I figured the issue was likely something to do with the eventalign.txt file as the test data was working fine, so i spent an afternoon trying to break the eventalign.txt file manually. When I combined parts of the test data with my own, the dataprep process was terminating as soon as it hit my data, the only difference I could discern was that the test data had the same sequence in both the reference and the model kmer columns and my data contained a reverse complement sequence in the reference_kmer.

When I filter my eventalign.txt files to exclude any columns where the reference and model kmers were not matching, m6anet was able to carry out the dataprep and inference steps "normally", resulting in filled data.json and data.log files, however this alone isn't really a fix as you may filter out huge amounts of data (i would assume my NS genomic RNA?). My guess is that this might have something to do with the way that nanopolish is handling the read orientation of the sequence and that the negative sense RNA genomes are having their reference_kmer reversed for alignment to the genome, something that would work with mRNA but doesn't with negative sense genomes.

Looking through the dataprep_utils.py file i found the combine() function at line 279 ( cond_successfully_eventaligned = eventalign_result['reference_kmer'] == eventalign_result['model_kmer'] ). Looks like this function is checking whether the value in these two columns are equal and will halt the process if they are not....I assume this is used to account for NNNNN model kmers?

Is there a simple work around for this? I'm not able to understand how nanopolish generates these kmer_models or how important the specific sequence orientation is for the dataprep step

edit: Thanks in advance, you guys have built a great tool here!

@yuukiiwa
Copy link
Collaborator

Hi @AlexFitzgeraldBryan,

Here are several things I can think of that you can try:

  1. Not sure whether it makes sense to convert your viral RNA genome to its reverse complement because we normally align our samples to cDNA(aka. transcriptome) for xpore and m6anet
  2. f5c eventalign has an --rna option, I know that it points to the kmer model, but not sure whether that would convert the reference kmer to reverse complement

Thanks!

Best wishes,
Yuk Kei

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants