Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chromap process gets kill #59

Closed
JoseEspinosa opened this issue Jan 25, 2022 · 10 comments
Closed

Chromap process gets kill #59

JoseEspinosa opened this issue Jan 25, 2022 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@JoseEspinosa
Copy link

Hi there,
When running chromap with the following: chromap --preset chip -x genome.index -r genome.fa -1 SPT5_INPUT_REP1_T1_1_val_1.fq.gz -2 SPT5_INPUT_REP1_T1_2_val_2.fq.gz -o SPT5_INPUT_REP1_T1.bed, I get the process kill, as you can see below:

Preset parameters for ChIP-seq are used.
Start to map reads.
Parameters: error threshold: 8, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 2000, MAPQ-threshold: 30, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 1
Analyze bulk data.
Won't try to remove adapters on 3'.
Will remove PCR duplicates after mapping.
Will remove PCR duplicates at bulk level.
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping.
Only output mappings of which barcodes are in whitelist.
Output mappings in BED/BEDPE format.
Reference file: genome.fa
Index file: genome.index
1th read 1 file: SPT5_INPUT_REP1_T1_1_val_1.fq.gz
1th read 2 file: SPT5_INPUT_REP1_T1_2_val_2.fq.gz
Output file: SPT5_INPUT_REP1_T1.bed
Loaded all sequences successfully in 0.02s, number of sequences: 17, number of bases: 12157105.
Kmer size: 17, window size: 7.
Lookup table size: 2857826, occurrence table size: 243803.
Loaded index successfully in 0.03s.
Mapped 99321 read pairs in 0.67s.
Mapped all reads in 0.92s.
Number of reads: 198642.
Number of mapped reads: 187220.
Number of uniquely mapped reads: 165306.
Number of reads have multi-mappings: 21914.
Number of candidates: 403416.
Number of mappings: 187220.
Number of uni-mappings: 165306.
Number of multi-mappings: 21914.
Killed

I reproduced the error using the test data:

$ chromap --preset chip -x ref.index -r ref.fa -1 read1.fq -2 read2.fq -o test.bed
Preset parameters for ChIP-seq are used.
Start to map reads.
Parameters: error threshold: 8, min-num-seeds: 2, max-seed-frequency: 500,1000, max-num-best-mappings: 1, max-insert-size: 2000, MAPQ-threshold: 30, min-read-length: 30, bc-error-threshold: 1, bc-probability-threshold: 0.90
Number of threads: 1
Analyze bulk data.
Won't try to remove adapters on 3'.
Will remove PCR duplicates after mapping.
Will remove PCR duplicates at bulk level.
Won't allocate multi-mappings after mapping.
Only output unique mappings after mapping.
Only output mappings of which barcodes are in whitelist.
Output mappings in BED/BEDPE format.
Reference file: ref.fa
Index file: ref.index
1th read 1 file: read1.fq
1th read 2 file: read2.fq
Output file: test.bed
Loaded all sequences successfully in 0.00s, number of sequences: 1, number of bases: 100000.
Kmer size: 17, window size: 7.
Lookup table size: 25079, occurrence table size: 0.
Loaded index successfully in 0.00s.
Mapped 10 read pairs in 0.00s.
Mapped all reads in 0.00s.
Number of reads: 20.
Number of mapped reads: 20.
Number of uniquely mapped reads: 20.
Number of reads have multi-mappings: 0.
Number of candidates: 20.
Number of mappings: 20.
Number of uni-mappings: 20.
Number of multi-mappings: 0.
Killed

However, if I just use one of the fastq files, both with the real and the test data, it works e.g. with the test data chromap --preset chip -x ref.index -r ref.fa -1 read1.fq -o test.bed
I am running it using the last version of chromap 0.1.5-r302 in a conda environment in Ubuntu 18.04.6.
Thanks a lot!

@haowenz
Copy link
Owner

haowenz commented Jan 25, 2022

Can you provide me with the test data so that I can reproduce the error and figure out the issue? Thanks!

@JoseEspinosa
Copy link
Author

Sure, sorry I reproduced the error I got with me own files using the test files in this repo: https://github.com/haowenz/chromap/tree/master/test

@haowenz
Copy link
Owner

haowenz commented Jan 26, 2022

Sure, sorry I reproduced the error I got with me own files using the test files in this repo: https://github.com/haowenz/chromap/tree/master/test

I see. I had no issue running it on my machine. It seems that there is some problem for Chromap to run on Ubuntu 18 as #37. This is really weird. I will get such a machine and try to reproduce the error.

@haowenz
Copy link
Owner

haowenz commented Jan 27, 2022

I was only able to reproduce the error with a low memory machine. So this should be an out of memory issue. How much memory do you have on your machine?

@JoseEspinosa
Copy link
Author

I free some memory in my machine and then it worked, but it needed ~11 Gb for the test data.

Actually, I started to perform this test since I was also trying to run some real data in the cluster of my institution (running Scientific Linux release 7.2 ) and I got the process kill similarly as what was reported in #37:

41 Segmentation fault      (core dumped) chromap --preset chip --SAM -t 12 -x genome.index -r genome.fa -1 INPUT_TKO_REP1_T1_trimmed.fq.gz -o INPUT_TKO_REP1_T1.Lb.sam

I requested 62.0 GB for the execution above (since a previous execution with just 32.0 GB got the same error).
I just found #46 but in my case, although the adapters have been trimmed I am not using the atac preset but the chip preset so in principle trimming adapters are turned off. As in the case of #42, some other samples worked fine with exactly the same command and just 32 GB.

Thanks a lot for the help!

@haowenz
Copy link
Owner

haowenz commented Jan 27, 2022

I free some memory in my machine and then it worked, but it needed ~11 Gb for the test data.

As the human genome index is more than 10GB, Chromap will take at least 10GB memory when running with human sequencing data. So this was somehow made as an assumption and hard-coded at the moment. But it is not necessary for small genome like the test genome. I will tune this down for next version.

Actually, I started to perform this test since I was also trying to run some real data in the cluster of my institution (running Scientific Linux release 7.2 ) and I got the process kill similarly as what was reported in #37:

41 Segmentation fault      (core dumped) chromap --preset chip --SAM -t 12 -x genome.index -r genome.fa -1 INPUT_TKO_REP1_T1_trimmed.fq.gz -o INPUT_TKO_REP1_T1.Lb.sam

I requested 62.0 GB for the execution above (since a previous execution with just 32.0 GB got the same error). I just found #46 but in my case, although the adapters have been trimmed I am not using the atac preset but the chip preset so in principle trimming adapters are turned off. As in the case of #42, some other samples worked fine with exactly the same command and just 32 GB.

Thanks a lot for the help!

This may not be an out of memory issue. If your data is publicly available, you can let me know the way to download it and I will debug on that. Otherwise, as you said, since the adapters have been trimmed, it is very likely that it is caused by the same reason. In this case, I will fix #46 first and then let you know. Then you can try on your data again and see if the problem is still there.

@JoseEspinosa
Copy link
Author

As the human genome index is more than 10GB, Chromap will take at least 10GB memory when running with human sequencing data. So this was somehow made as an assumption and hard-coded at the moment. But it is not necessary for small genome like the test genome. I will tune this down for next version.

Fair enough!

This may not be an out of memory issue. If your data is publicly available, you can let me know the way to download it and I will debug on that. Otherwise, as you said, since the adapters have been trimmed, it is very likely that it is caused by the same reason. In this case, I will fix #46 first and then let you know. Then you can try on your data again and see if the problem is still there.

You can download the files from here. The commands that I used were:

# index creation cmd
chromap \
    -i \
    -t 6 \
    -r genome.fa \
    -o genome.index

# mapping cmd
chromap \
    --preset chip --SAM \
    -t 6 \
    -x genome.index \
    -r genome.fa \
    -1 INPUT_TKO_REP1_T1_trimmed.fq.gz \
    -o INPUT_TKO_REP1_T1.Lb.sam

Thanks a lot! 😃

@haowenz
Copy link
Owner

haowenz commented Feb 19, 2022

I was able to confirm that the error is caused by out of memory. After the pull request, I was able to finish mapping this dataset in 18GB memory. You can use the master branch if you want to give it a try. Or you can wait a couple of more days and we will include this fix in the next release on conda.

@haowenz haowenz self-assigned this Feb 19, 2022
@haowenz haowenz added the enhancement New feature or request label Feb 19, 2022
@JoseEspinosa
Copy link
Author

Thanks a lot @haowenz! I will give it a try to version v0.2.0 and come back to you in case I find any other issue.

@JoseEspinosa
Copy link
Author

Hi again, I checked and version v0.2.1 is solving this issue, thanks a lot again! 😄

@haowenz haowenz closed this as completed Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants