-
Notifications
You must be signed in to change notification settings - Fork 21
Open
Labels
troubleshootingworkflow and data preparation questionsworkflow and data preparation questions
Description
Hello
When running modkit pileup on the same BAM and reference genome, I observed output differences depending on whether interval and chunk parameters were set.
Commands
modkit pileup <input.bam> <output_default.bedMethyl>
modkit pileup --interval-size 1000000 --chunk-size 100 <input.bam> <output_chunked.bedMethyl>
Data
- Same sample (129.4M reads)
- Same basecalling, alignment, and input BAM
- Same modkit version and compute environment
Results
bedMethyl lines by default = 2,397,050
bedMethyl lines by with --interval-size 1000000 --chunk-size 100 = 2,397,256
Difference: +206 lines
Even with identical inputs, enabling --interval-size and --chunk-size produces a slightly larger bedMethyl output. This suggests these parameters may affect determinism or completeness in CpG methylation calls.
Could the team please confirm if this behavior is expected, or if it should be flagged for further investigation in a future release?
We stay in touch.
Many thanks in advance.
Best,
Boris
Metadata
Metadata
Assignees
Labels
troubleshootingworkflow and data preparation questionsworkflow and data preparation questions