Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

not generating .best and .sing2 output files #69

Open
slyahn opened this issue Aug 18, 2020 · 5 comments
Open

not generating .best and .sing2 output files #69

slyahn opened this issue Aug 18, 2020 · 5 comments

Comments

@slyahn
Copy link

slyahn commented Aug 18, 2020

I am processing 8 multiplexed samples through demuxlet and everything seems to run fine until the very end. Demuxlet generates the .single file but not the .best and .sing2 files. The standard output shows that it finishes processing the droplets ("Finished processing 21976 droplets total") but then reports a segmentation fault (core dumped) error.

I started with 60GB memory and went up to 180GB and that did not fix it. The vcf is filtered to include only biallelic SNPS, it is sorted, and the contigs match in the bam and the vcf. I don't know what could be causing it to fail at the very end when writing the .best file. Do you have any suggestions?

Edited to add:
I've tried downsampling the bam to 10% of the original, and I still get the same segmentation fault and only the .single file is generated, so I don't think it's a memory issue.

I should note that this experiment is essentially a simulation using real data. We combined fastq files from 8 individual runs to simulate a multiplexed run. The combined fastq was processed with Cellranger without error. The genotype vcf was generated by a private company who did low pass whole genome sequencing and imputation.

@ddsouz5
Copy link

ddsouz5 commented Nov 11, 2020

Hi @slyahn, did you figure out what the problem was? Having the same issue too!

@VincentGardeux
Copy link

Would Fix #59 fix the memory issue?
We tested on ~50 genotypes / 5M snps and it runs without out of RAM.

@boxiangliu
Copy link

Dear @hyunminkang, I am having the same issue as stated above. The software runs to the step where *.single has been generated, but reports a segmentation fault. The *.sing2 and *.best files are empty.

I am not sure how to debug this error. Memory does not seem to be the issue (my machine has 386G RAM). Do you have any idea why this would happen? Could you point us to the right path?

@VincentGardeux
Copy link

VincentGardeux commented May 18, 2022

Hi @boxiangliu,

The way it's coded in demuxlet is definitely not the best, i.e. they generate huge HUGE arrays (which is both not optimal, and not needed). For e.g. there is a line which creates an array gpAB:

double* gpAB = new double[scl.nsnps * nv * nv * 9];

So in my example of 5M snps (nsnps), 50 genotypes (nv), and since double size is 8 bytes, this would generate an array of size 5000000 x 50 x 50 x 9 x 8 = 900Gb. Do you have 900Gb of RAM? :D

That's why I suggested the Fix #59 two years ago, which does not create the array, and just compute the data on the go without storing it to RAM. But it was never merged to the main branch.

I guess you can maybe try it (Fix #59), to see if it solves your issue.

Hope this helps.

Cheers

@hyunminkang
Copy link
Contributor

hyunminkang commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants