Skip to content

Modkit pileup - segmentation fault and performance issues in high-coverage regions #607

@mfurla

Description

@mfurla

Dear ONT staff,

I am working with modkit to profile m6A and inosine in a dRNA-seq sample, and I am encountering a couple of issues.

First, when I try to use the --modified-bases parameter, I encounter a segmentation fault error, with no additional information regarding the cause of the issue.

This is the command I am using:

~/Software/dist_modkit_v0.6.1_481e3c9/modkit pileup
/path/to/aligned/bam
/path/to/output/bed.gz
--reference /path/to/fa.gz
--log /path/to/log
--threads 10
--modified-bases inosine
--region 1

This is the output:

parsing region 1
discarded 0 contigs with zero aligned reads
parsed 1 base modification(s). Base modifications other than 'A:17596' will be counted as 'N_other'.
adding single-base motif: 'A 0'
Segmentation fault (core dumped)

Furthermore, I have a couple of genomic regions covered by ~1M reads (due to high expression of specific transcripts), and this results in the code getting stuck, with only 2 out of 10 CPUs (according to top) being actively used.

This is the output:

parsing region 2
discarded 0 contigs with zero aligned reads
attempting to sample 10042 reads
Threshold of 0.69921875 for base A is low. Consider increasing the filter-percentile or specifying a higher threshold.
using general workers
93037146 B written to output: Test.bed.gz [4.78 MB/s]
[00:00:18] ###############------------------------- 88000000/242193529 genome positions 4,732,930.0021/s 33s
1222798 rows written
0 ~records errored

Could you recommend a workaround to avoid this issue and/or improve performance so that all available CPUs are effectively utilized? I also tried with the --high-depth --max-depth 100 options but it did not help.

Finally, I attempted to skip the two chromosomes with very high coverage. However, when I pass a comma-separated list of chromosome names to the --region option (as suggested in the documentation), I encounter a “contig missing” error.

Thank you very much for your support.

Best regards,

Mattia

Metadata

Metadata

Assignees

No one assigned

    Labels

    troubleshootingworkflow and data preparation questions

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions