Chromap process gets kill #59

JoseEspinosa · 2022-01-25T21:25:45Z

Hi there,
When running chromap with the following: chromap --preset chip -x genome.index -r genome.fa -1 SPT5_INPUT_REP1_T1_1_val_1.fq.gz -2 SPT5_INPUT_REP1_T1_2_val_2.fq.gz -o SPT5_INPUT_REP1_T1.bed, I get the process kill, as you can see below:

Preset parameters for ChIP-seq are used.
Start to map reads.
Parameters: error threshold: 8, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 2000, MAPQ-threshold: 30, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 1
Analyze bulk data.
Won't try to remove adapters on 3'.
Will remove PCR duplicates after mapping.
Will remove PCR duplicates at bulk level.
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping.
Only output mappings of which barcodes are in whitelist.
Output mappings in BED/BEDPE format.
Reference file: genome.fa
Index file: genome.index
1th read 1 file: SPT5_INPUT_REP1_T1_1_val_1.fq.gz
1th read 2 file: SPT5_INPUT_REP1_T1_2_val_2.fq.gz
Output file: SPT5_INPUT_REP1_T1.bed
Loaded all sequences successfully in 0.02s, number of sequences: 17, number of bases: 12157105.
Kmer size: 17, window size: 7.
Lookup table size: 2857826, occurrence table size: 243803.
Loaded index successfully in 0.03s.
Mapped 99321 read pairs in 0.67s.
Mapped all reads in 0.92s.
Number of reads: 198642.
Number of mapped reads: 187220.
Number of uniquely mapped reads: 165306.
Number of reads have multi-mappings: 21914.
Number of candidates: 403416.
Number of mappings: 187220.
Number of uni-mappings: 165306.
Number of multi-mappings: 21914.
Killed

I reproduced the error using the test data:

$ chromap --preset chip -x ref.index -r ref.fa -1 read1.fq -2 read2.fq -o test.bed
Preset parameters for ChIP-seq are used.
Start to map reads.
Parameters: error threshold: 8, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 2000, MAPQ-threshold: 30, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 1
Analyze bulk data.
Won't try to remove adapters on 3'.
Will remove PCR duplicates after mapping.
Will remove PCR duplicates at bulk level.
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping.
Only output mappings of which barcodes are in whitelist.
Output mappings in BED/BEDPE format.
Reference file: ref.fa
Index file: ref.index
1th read 1 file: read1.fq
1th read 2 file: read2.fq
Output file: test.bed
Loaded all sequences successfully in 0.00s, number of sequences: 1, number of bases: 100000.
Kmer size: 17, window size: 7.
Lookup table size: 25079, occurrence table size: 0.
Loaded index successfully in 0.00s.
Mapped 10 read pairs in 0.00s.
Mapped all reads in 0.00s.
Number of reads: 20.
Number of mapped reads: 20.
Number of uniquely mapped reads: 20.
Number of reads have multi-mappings: 0.
Number of candidates: 20.
Number of mappings: 20.
Number of uni-mappings: 20.
Number of multi-mappings: 0.
Killed

However, if I just use one of the fastq files, both with the real and the test data, it works e.g. with the test data chromap --preset chip -x ref.index -r ref.fa -1 read1.fq -o test.bed
I am running it using the last version of chromap 0.1.5-r302 in a conda environment in Ubuntu 18.04.6.
Thanks a lot!

The text was updated successfully, but these errors were encountered:

haowenz · 2022-01-25T22:48:32Z

Can you provide me with the test data so that I can reproduce the error and figure out the issue? Thanks!

JoseEspinosa · 2022-01-26T08:24:24Z

Sure, sorry I reproduced the error I got with me own files using the test files in this repo: https://github.com/haowenz/chromap/tree/master/test

haowenz · 2022-01-26T15:45:09Z

Sure, sorry I reproduced the error I got with me own files using the test files in this repo: https://github.com/haowenz/chromap/tree/master/test

I see. I had no issue running it on my machine. It seems that there is some problem for Chromap to run on Ubuntu 18 as #37. This is really weird. I will get such a machine and try to reproduce the error.

haowenz · 2022-01-27T06:30:47Z

I was only able to reproduce the error with a low memory machine. So this should be an out of memory issue. How much memory do you have on your machine?

JoseEspinosa · 2022-01-27T08:30:17Z

I free some memory in my machine and then it worked, but it needed ~11 Gb for the test data.

Actually, I started to perform this test since I was also trying to run some real data in the cluster of my institution (running Scientific Linux release 7.2 ) and I got the process kill similarly as what was reported in #37:

41 Segmentation fault      (core dumped) chromap --preset chip --SAM -t 12 -x genome.index -r genome.fa -1 INPUT_TKO_REP1_T1_trimmed.fq.gz -o INPUT_TKO_REP1_T1.Lb.sam

I requested 62.0 GB for the execution above (since a previous execution with just 32.0 GB got the same error).
I just found #46 but in my case, although the adapters have been trimmed I am not using the atac preset but the chip preset so in principle trimming adapters are turned off. As in the case of #42, some other samples worked fine with exactly the same command and just 32 GB.

Thanks a lot for the help!

haowenz · 2022-01-27T15:31:12Z

I free some memory in my machine and then it worked, but it needed ~11 Gb for the test data.

As the human genome index is more than 10GB, Chromap will take at least 10GB memory when running with human sequencing data. So this was somehow made as an assumption and hard-coded at the moment. But it is not necessary for small genome like the test genome. I will tune this down for next version.

Actually, I started to perform this test since I was also trying to run some real data in the cluster of my institution (running Scientific Linux release 7.2 ) and I got the process kill similarly as what was reported in #37:
41 Segmentation fault      (core dumped) chromap --preset chip --SAM -t 12 -x genome.index -r genome.fa -1 INPUT_TKO_REP1_T1_trimmed.fq.gz -o INPUT_TKO_REP1_T1.Lb.sam
I requested 62.0 GB for the execution above (since a previous execution with just 32.0 GB got the same error). I just found #46 but in my case, although the adapters have been trimmed I am not using the atac preset but the chip preset so in principle trimming adapters are turned off. As in the case of #42, some other samples worked fine with exactly the same command and just 32 GB.

Thanks a lot for the help!

This may not be an out of memory issue. If your data is publicly available, you can let me know the way to download it and I will debug on that. Otherwise, as you said, since the adapters have been trimmed, it is very likely that it is caused by the same reason. In this case, I will fix #46 first and then let you know. Then you can try on your data again and see if the problem is still there.

JoseEspinosa · 2022-01-28T11:20:27Z

As the human genome index is more than 10GB, Chromap will take at least 10GB memory when running with human sequencing data. So this was somehow made as an assumption and hard-coded at the moment. But it is not necessary for small genome like the test genome. I will tune this down for next version.

Fair enough!

This may not be an out of memory issue. If your data is publicly available, you can let me know the way to download it and I will debug on that. Otherwise, as you said, since the adapters have been trimmed, it is very likely that it is caused by the same reason. In this case, I will fix #46 first and then let you know. Then you can try on your data again and see if the problem is still there.

You can download the files from here. The commands that I used were:

# index creation cmd
chromap \
    -i \
    -t 6 \
    -r genome.fa \
    -o genome.index

# mapping cmd
chromap \
    --preset chip --SAM \
    -t 6 \
    -x genome.index \
    -r genome.fa \
    -1 INPUT_TKO_REP1_T1_trimmed.fq.gz \
    -o INPUT_TKO_REP1_T1.Lb.sam

Thanks a lot! 😃

haowenz · 2022-02-19T03:29:51Z

I was able to confirm that the error is caused by out of memory. After the pull request, I was able to finish mapping this dataset in 18GB memory. You can use the master branch if you want to give it a try. Or you can wait a couple of more days and we will include this fix in the next release on conda.

JoseEspinosa · 2022-03-01T11:35:59Z

Thanks a lot @haowenz! I will give it a try to version v0.2.0 and come back to you in case I find any other issue.

JoseEspinosa · 2022-03-30T08:58:57Z

Hi again, I checked and version v0.2.1 is solving this issue, thanks a lot again! 😄

haowenz mentioned this issue Feb 18, 2022

Add low-mem mode for single-end reads. #74

Merged

haowenz self-assigned this Feb 19, 2022

haowenz added the enhancement New feature or request label Feb 19, 2022

This was referenced Mar 1, 2022

Add CHROMAP as aligner nf-core/chipseq#233

Closed

Bump chromap version 0.2.0 nf-core/modules#1374

Merged

haowenz closed this as completed Mar 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chromap process gets kill #59

Chromap process gets kill #59

JoseEspinosa commented Jan 25, 2022

haowenz commented Jan 25, 2022

JoseEspinosa commented Jan 26, 2022

haowenz commented Jan 26, 2022

haowenz commented Jan 27, 2022 •

edited

Loading

JoseEspinosa commented Jan 27, 2022

haowenz commented Jan 27, 2022

JoseEspinosa commented Jan 28, 2022

haowenz commented Feb 19, 2022

JoseEspinosa commented Mar 1, 2022

JoseEspinosa commented Mar 30, 2022

Chromap process gets kill #59

Chromap process gets kill #59

Comments

JoseEspinosa commented Jan 25, 2022

haowenz commented Jan 25, 2022

JoseEspinosa commented Jan 26, 2022

haowenz commented Jan 26, 2022

haowenz commented Jan 27, 2022 • edited Loading

JoseEspinosa commented Jan 27, 2022

haowenz commented Jan 27, 2022

JoseEspinosa commented Jan 28, 2022

haowenz commented Feb 19, 2022

JoseEspinosa commented Mar 1, 2022

JoseEspinosa commented Mar 30, 2022

haowenz commented Jan 27, 2022 •

edited

Loading