Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent sequence and quality string for unaligned reads #3

Closed
skoren opened this issue Jul 18, 2017 · 2 comments
Closed

Inconsistent sequence and quality string for unaligned reads #3

skoren opened this issue Jul 18, 2017 · 2 comments
Labels

Comments

@skoren
Copy link

skoren commented Jul 18, 2017

In some sam outputs, unaligned read sequence and quality doesn't match in the output sam file like this example:

f3920def-68ae-4956-8900-6398064c4f93_Basecall_Alignment_template	4	*	0	0	*	*	0	0	CTTTTATCTGCCACCTCTTCCACCTCACC	#$$$%&'%%#&''&%$#$$$''&&$&'&&2

The sequence in this case is 29 bases so I think the quality is being output incorrectly (it has the extra 2 on the end that isn't in the input). Here is the corresponding read fastq:

@f3920def-68ae-4956-8900-6398064c4f93_Basecall_Alignment_template llssbzms2p35x_20161128_FNFAB45271_MN17073_mux_scan_Hu_Bir_R94_1Dlig_fc5_13909_ch310_read17_strand
CTTTTATCTGCCACCTCTTCCACCTCACC
+f3920def-68ae-4956-8900-6398064c4f93_Basecall_Alignment_template llssbzms2p35x_20161128_FNFAB45271_MN17073_mux_scan_Hu_Bir_R94_1Dlig_fc5_13909_ch310_read17_strand
#$$$%&'%%#&''&%$#$$$''&&$&'&&

running minimap2 as:

minimap2 -ax map10k -t 16 GRCh38_full_analysis_set_plus_decoy_hla.mmi rel3-nanopore-wgs-152889212-FAB45271.fastq.gz

Mapping just this read works so there might be some buffer overflow or other issue on the full set. This read set is available from the NA12878 consortium: http://s3.amazonaws.com/nanopore-human-wgs/rel3-nanopore-wgs-152889212-FAB45271.fastq.gz mapped to the 1000 genome reference: ftp://ftp.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/. So far I've only seen this on unmapped reads.

lh3 added a commit that referenced this issue Jul 18, 2017
Only happen to unmapped read, with quality, and in the SAM output
@lh3
Copy link
Owner

lh3 commented Jul 18, 2017

Fixed via b9b0b6f. Thanks so much!

@lh3 lh3 closed this as completed Jul 18, 2017
@lh3
Copy link
Owner

lh3 commented Jul 18, 2017

I have a script to check inconsistent SAM (e.g. cigar length inconsistent with sequence length, etc). However, the first step of the script is to skip unmapped reads. It failed to catch this bug. Thanks again. It is really helpful to have someone to catch bugs before they get wide spread.

@lh3 lh3 added the bug label Oct 12, 2017
pacbbbbot pushed a commit to PacificBiosciences/minimap2 that referenced this issue Jun 19, 2020
* commit '4f4a5b2be849a5a436e429de224724ce5c363610':
  Mask out warnings and sanitizer failures
  Avoid trying to access uninitialized variable
  Avoid buffer overflow in mm_sync_regs
  Add zlib fallback
  Add PacBio Meson
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants