New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of bound reads #17
Comments
Hi @sguizard Not that I can think of off the top of my head. It would be great to get the offending read as well as the genome and gtf files so that I could reproduce it. Would it be possible? Best, |
The genome the genome and gtf are from Human genome GRCh38 (dowloaded from igenome). |
Hi @ksahlin, Thanks for looking into the issue. Please find the corresponding sequence below.
Also, these are the full paths to the reference files: |
I have found the error. Indeed, the read is aligned to The line causing the error seems to be
On this line, uLTRA uses the reference sequences in the reference fasta to form a template for the final SAM-file. uLTRAs uses temporary SAM files to produce the alignment (in ase of multiprocessing). Now, in the temporary SAM-file, reference accession The read as mapped with uLTRA has already set all fileds for the read, e.g., This leads me to believe that the library I need to figure out why uLTRA uses only a subset of the references in it's header (creating the diff), and I will fix this asap. This bug will most likely fix issue #2 as well. |
Hi @sguizard and @husensofteng , I have pushed a fix for this to master. Could you verify that the latest master solves the problem (and does not introduce something unexpected). The fix was pretty minimal so I think it should not introduce anything unexpected but good to check regardless. Ideally if would be good with a diff or something looking at the number of lines affected in the SAM between the versions. There could be other reads for where this reference name switch has occurred that are silent due to not being out of bounds. @sguizard, If everything works with the latest master, I kindly ask for your expertise in updating also the bioconda version whenever you have time :) I made sure to update the setup.py script to v0.0.4.2 |
Hi @ksahlin, Thanks for your investigation. |
I tested the patch with the complete set of reads.
So, it seems OK for me. |
@ksahlin do you plan to create a 0.0.4.2 release? |
Yes, was just waiting for your input. Will do it asap and get back to you here when done. |
@ksahlin the updated bioconda recipe is now live. I take care of the nf-core modules today. |
super, thanks! |
Hi Kristoffer,
I would need your help on a result that one of nf-core/isoseq pipeline user obtained.
He contacted me to present an issue with the pipeline.
He was able to run the complete pipeline with minimap2 and not with uLTRA.
I identified the read causing the failure and compared the bam files from both aligners.
I found the following result:
In the minimap2 alignment, the read is mapped once on the chromosome chr11_KI270721v1_random at POS 50361.
With uLTRA, it's mapped on chromosomes chr11 and chr1_KI270713v1_random at POS 1995179 and 50361.
The length of the sequence are
As you see third mapping is outside the chromosome itself.
Do you have an idea of could cause that?
Is it possible that there is an error in the sequence name resolution? The POS value of the faulty mapping and minimap2 are the same.
The text was updated successfully, but these errors were encountered: