Skip to content

modkit pileup --interval-size 1000000 --chunk-size 100 #537

@blipinskiaima

Description

@blipinskiaima

Hello

When running modkit pileup on the same BAM and reference genome, I observed output differences depending on whether interval and chunk parameters were set.

Commands

modkit pileup <input.bam> <output_default.bedMethyl>
modkit pileup --interval-size 1000000 --chunk-size 100 <input.bam> <output_chunked.bedMethyl>

Data

  • Same sample (129.4M reads)
  • Same basecalling, alignment, and input BAM
  • Same modkit version and compute environment

Results
bedMethyl lines by default = 2,397,050
bedMethyl lines by with --interval-size 1000000 --chunk-size 100 = 2,397,256
Difference: +206 lines

Even with identical inputs, enabling --interval-size and --chunk-size produces a slightly larger bedMethyl output. This suggests these parameters may affect determinism or completeness in CpG methylation calls.
Could the team please confirm if this behavior is expected, or if it should be flagged for further investigation in a future release?

We stay in touch.
Many thanks in advance.
Best,
Boris

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions