-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Hi, @ArtRand
First, thank you for developing such a useful tool.
I have a question about the definition and behavior of "skipped reads" when using the modkit extract full command. I've noticed a significant difference in the ratio of skipped reads when using the --include-bed flag.
Here are the two commands I ran and their respective outputs:
- Command without --include-bed:
modkit extract full \
--force \
--mapped-only \
--log-filepath "output.full.wo_bed.log" \
"input.bam" \
"output.full.txt"
Result: processed 3,491,586 reads, skipped ~601,161 reads, failed ~535,375 reads
(Total number of reads in fastq.gz with ML/MM tags used to produced this input bam file is 3,491,641 reads, almost same as number of processed reads)
- Command with --include-bed:
modkit extract full \
--force \
--mapped-only \
--include-bed "my_regions.bed" \
--log-filepath "output.full.with_bed.log" \
"input.bam" \
"output.full.with_bed.txt"
Result: processed 1,407,735 reads, skipped ~1,600,198 reads, failed ~217,237 reads
As you can see, the ratio of skipped reads increased dramatically when using --include-bed. I also checked the log file specified by --log-filepath, and while it contained detailed information on failed reads, there was no information about skipped reads.
I would appreciate it if you could clarify the following:
- What is the exact definition of a "skipped read" in modkit? (e.g., secondary/supplementary alignments, unmapped reads, etc.)
- Why does the count of skipped reads increase so much with the --include-bed flag? Is my assumption correct that all reads not overlapping the regions in the BED file are counted as "skipped"?
- Is it the intended behavior for the log file to contain no information on skipped reads?
Thank you for your time and help.
Best regards,
Jen