Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

terminate called after throwing an instance of 'std::bad_array_new_length' #73

Open
vkartha opened this issue Oct 20, 2020 · 9 comments

Comments

@vkartha
Copy link

vkartha commented Oct 20, 2020

Hi! I had run demuxlet successfully before, but am now encountering an error:

NOTICE [2020/10/19 20:10:39] - Processing 7470000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7480000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7490000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7500000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7510000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7520000 markers...
NOTICE [2020/10/19 20:10:39] - Identifying best-matching individual..
NOTICE [2020/10/19 20:10:39] - Processing 1000 droplets...
NOTICE [2020/10/19 20:10:39] - Finished processing 1153 droplets total
terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Aborted (core dumped)

My call was as follows (same as the one I used before, which worked for a different bam/vcf file combo):

demuxlet --sam ./sample.bam --tag-group DB --field GT --geno-error 0.1 --min-TD 0 --alpha 0.5 --vcf ./hg38_merged_final_filtered.vcf_sorted.vcf --out ./test_demuxlet.out

I haven't seen this error before, and noticed another (perhaps related?) issue that suggested something about memory. Does this point to something similar, or is it different? After running it, I see all 3 output files (.best, .sing2, and .single), but the .best and .sing2 files are empty, assuming since it was terminated.

Any help would be greatly appreciated!

@vkartha
Copy link
Author

vkartha commented Oct 20, 2020

Sorry, follow up to that, here are the QC logs prior to the markers being processed and that error being thrown:

NOTICE [2020/10/19 20:10:31] - Finished reading 7527981 markers from the VCF file
NOTICE [2020/10/19 20:10:31] - Total number input reads : 12252483
NOTICE [2020/10/19 20:10:31] - Total number valid droplets observed : 1153
NOTICE [2020/10/19 20:10:31] - Total number valid SNPs observed : 7527981
NOTICE [2020/10/19 20:10:31] - Total number of read-QC-passed reads : 12214806
NOTICE [2020/10/19 20:10:31] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2020/10/19 20:10:31] - Total number of non-skipped reads with considered barcodes : 11926820
NOTICE [2020/10/19 20:10:31] - Total number of gapped/noninformative reads : 10590479
NOTICE [2020/10/19 20:10:31] - Total number of base-QC-failed reads : 0
NOTICE [2020/10/19 20:10:31] - Total number of redundant reads : 196126
NOTICE [2020/10/19 20:10:31] - Total number of pass-filtered reads : 1140215
NOTICE [2020/10/19 20:10:31] - Total number of pass-filtered reads overlapping with multiple SNPs : 108279
NOTICE [2020/10/19 20:10:31] - Starting to prune out cells with too few reads...
NOTICE [2020/10/19 20:10:31] - Finishing pruning out 0 cells with too few reads...
NOTICE [2020/10/19 20:10:36] - Starting to identify best matching individual IDs

I was testing a pool of 2 samples, against a joint reference consisting 97 samples.

@rhart604
Copy link

rhart604 commented Aug 9, 2021

I'm also seeing this error. Identical output as vkartha, above. I tried re-compiling htslib and demuxlet in case it was something to do with newer compilers, but I get the same errors.

Any thoughts?

@hyunminkang
Copy link
Contributor

hyunminkang commented Aug 10, 2021 via email

@rhart604
Copy link

It turns out to be due to large numbers of SNPs in the VCF file. I originally had 9 million SNPs and that crashed demuxlet. I filtered down to less than 2 million and it works now. More than about 2 million causes the error. Could demuxlet be modified to allow larger VCF files/more SNPs?

@hyunminkang
Copy link
Contributor

hyunminkang commented Aug 14, 2021 via email

@VicenteFR
Copy link

Has anyone found a clever way to overcome this issue? Since it is related to memory, I thought that downsampling the variants in the vcf file would do, but I can't seem to find any efficient and safe way to downsample vcf files. If anyone has found a good way to do this, would you please share it?

Thanks in advance!

@bdferris642
Copy link

bdferris642 commented Nov 17, 2023

This was happening to me when I tried demultiplexing with 45M+ SNPs. Demuxlet succeeded with ~6M rows in the VCF. I played around with the numbers to find an upper bound for which demuxlet would not abort...I suspect that this will be sensitive to the memory constraints of your machine but not sure...

@VicenteFR I don't know if you're using imputed SNPs or genotyped ones, but a couple principled way to subset the vcf would be to filter on the basis of imputation R^2 or on Minor Allele Frequency if you have access to that information (sometimes included in the INFO field). If none of that information is available, you could make the assumption that all SNPs are equally informative, in which case random downsampling non-header rows of the vcf would be "safe". You could downsample with different seeds and compare the output (which is not ideal but would give you some idea of what fraction of SNPs was necessary to achieve consistent results)

@yimmieg
Copy link
Collaborator

yimmieg commented Nov 17, 2023 via email

@hyunminkang
Copy link
Contributor

If you are using for scRNA-seq, filtering on 1000G exonic SNPs with MAF > 1% (usually ~300K) should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants