Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault (core dumped) #9

Closed
raphaelbetschart opened this issue Dec 4, 2023 · 6 comments
Closed

Segmentation fault (core dumped) #9

raphaelbetschart opened this issue Dec 4, 2023 · 6 comments

Comments

@raphaelbetschart
Copy link

raphaelbetschart commented Dec 4, 2023

Hi, I am trying to compress a PacBio HiFi GIAB sample (https://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/ChineseTrio/HG006_NA24694-huCA017E_father/PacBio_CCS_15kb_20kb_chemistry2/uBAMs/m64017_191213_003759.hifi_reads.bam). With this specific sample I always get a "Segmentation fault (core dumped)" message during or after "Counting k-mers". I use the following command:

colord compress-pbhifi --qual org --threads 8 --reference-genome hs38DH.fa m64017_191213_003759.hifi_reads.fastq.gz m64017_191213_003759.hifi_reads.fastq.colord

The BAM file was converted to fastq.gz with the pbtk bam2fastq (from here: https://github.com/PacificBiosciences/pbtk#bam2fastx).

I am using colord 1.2.0.

Other samples worked fine, but I am having trouble with this specific one.
Any ideas?

@marekkokot
Copy link
Collaborator

Hi,

I cannot reproduce this :(
This is how I run it:

./pbindex ../m64017_191213_003759.hifi_reads.bam
./bam2fastq -o m64017_191213_003759.hifi_reads ../m64017_191213_003759.hifi_reads.bam

The first run was without ref seq:

colord/bin/colord compress-pbhifi --qual org --threads 8 m64017_191213_003759.hifi_reads.fastq.gz m64017_191213_003759.hifi_reads.fastq.colord
Counting k-mers.
Stage 1: 100%
Stage 2: 100%
Filtering k-mers.
100%
Running compression.
100%
DNA size        : 968988426
Quality size    : 7181648469
Header size     : 1203516
Meta size       : 54
Info size       : 203
Total time      : 948.253s

And for the second, I downloaded ref. seq (Is this the same file you have used?):

wget https://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/GRCh38_reference_genome/GRCh38_full_analysis_set_plus_decoy_hla.fa
mv GRCh38_full_analysis_set_plus_decoy_hla.fa hs38DH.fa

And then:

/usr/bin/time -v colord/bin/colord compress-pbhifi  --qual org --threads 8 --reference-genome hs38DH.fa m64017_191213_003759.hifi_reads.fastq.gz m64017_191213_003759.hifi_reads.fastq.colord+ref
Counting k-mers.
Stage 1: 100%
Stage 2: 100%
Filtering k-mers.
100%
Running compression.
100%8%
DNA size        : 279460182
Quality size    : 7180109405
Header size     : 1203515
Meta size       : 83
Info size       : 236
Total time      : 3842.36s
        Command being timed: "colord/bin/colord compress-pbhifi --qual org --threads 8 --reference-genome hs38DH.fa m64017_191213_003759.hifi_reads.fastq.gz m64017_191213_003759.hifi_reads.fastq.colord+ref"
        User time (seconds): 21653.69
        System time (seconds): 81.94
        Percent of CPU this job got: 565%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:04:02
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 12528628
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 48
        Minor (reclaiming a frame) page faults: 18038895
        Voluntary context switches: 3727105
        Involuntary context switches: 22179
        Swaps: 0
        File system inputs: 7128
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I have run this on WSL.
Let me know your operating system and hardware environment and if you can try it in a different environment.
Best
Marek

@raphaelbetschart
Copy link
Author

raphaelbetschart commented Dec 6, 2023

Hi Marek, thanks for your reply.
I can get colord to run without a reference genome, but as soon as I specify one I get the Segmentation fault (I've tried the one you mentioned, plus hs38DH.fa and the standard hg38.fa). Interestingly, it works when I specify the reference genome AND only use a single thread. Two and three threads works fine too, but more than 4 leads to the Segmentation fault.

I'm running it on Rocky Linux 9.2, with AMD Epyc 7742 CPUs.

Best,
Raphael

Edit: I have the following md5sum:
3c0a0006322b140e6e39bb02cdf207a2 m64017_191213_003759.hifi_reads.fastq.gz

@marekkokot
Copy link
Collaborator

Hi Raphael,

My md5sum is the same.
I am able to reproduce this on another machine. I hope I will be able to fix this as fast as possible.

marekkokot added a commit that referenced this issue Dec 8, 2023
@marekkokot
Copy link
Collaborator

Hi Raphael,

It should now be fixed with 3e87a22
Please try to verify this. I have also created a new release (1.2.1) containing this fix if you, for some reason, cannot compile the code.
Let me know if it works in your environment now.

@raphaelbetschart
Copy link
Author

Hi @marekkokot,

I can confirm that the bug is fixed, thanks for the quick fix.

Best,
Raphael

@marekkokot
Copy link
Collaborator

Great, I am closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants