Skip to content

Question about "skipped reads" #525

@jennyp76

Description

@jennyp76

Hi, @ArtRand

First, thank you for developing such a useful tool.

I have a question about the definition and behavior of "skipped reads" when using the modkit extract full command. I've noticed a significant difference in the ratio of skipped reads when using the --include-bed flag.

Here are the two commands I ran and their respective outputs:

  1. Command without --include-bed:
modkit extract full \
--force \
--mapped-only \
--log-filepath "output.full.wo_bed.log" \
"input.bam" \
"output.full.txt"

Result: processed 3,491,586 reads, skipped ~601,161 reads, failed ~535,375 reads
(Total number of reads in fastq.gz with ML/MM tags used to produced this input bam file is 3,491,641 reads, almost same as number of processed reads)

  1. Command with --include-bed:
modkit extract full \
--force \
--mapped-only \
--include-bed "my_regions.bed" \
--log-filepath "output.full.with_bed.log" \
"input.bam" \
"output.full.with_bed.txt"

Result: processed 1,407,735 reads, skipped ~1,600,198 reads, failed ~217,237 reads

As you can see, the ratio of skipped reads increased dramatically when using --include-bed. I also checked the log file specified by --log-filepath, and while it contained detailed information on failed reads, there was no information about skipped reads.

I would appreciate it if you could clarify the following:

  1. What is the exact definition of a "skipped read" in modkit? (e.g., secondary/supplementary alignments, unmapped reads, etc.)
  2. Why does the count of skipped reads increase so much with the --include-bed flag? Is my assumption correct that all reads not overlapping the regions in the BED file are counted as "skipped"?
  3. Is it the intended behavior for the log file to contain no information on skipped reads?

Thank you for your time and help.
Best regards,
Jen

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionLooking for clarification on inputs and/or outputs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions