terminate called after throwing an instance of 'std::bad_array_new_length' #73

vkartha · 2020-10-20T00:21:36Z

Hi! I had run demuxlet successfully before, but am now encountering an error:

NOTICE [2020/10/19 20:10:39] - Processing 7470000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7480000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7490000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7500000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7510000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7520000 markers...
NOTICE [2020/10/19 20:10:39] - Identifying best-matching individual..
NOTICE [2020/10/19 20:10:39] - Processing 1000 droplets...
NOTICE [2020/10/19 20:10:39] - Finished processing 1153 droplets total
terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Aborted (core dumped)

My call was as follows (same as the one I used before, which worked for a different bam/vcf file combo):

demuxlet --sam ./sample.bam --tag-group DB --field GT --geno-error 0.1 --min-TD 0 --alpha 0.5 --vcf ./hg38_merged_final_filtered.vcf_sorted.vcf --out ./test_demuxlet.out

I haven't seen this error before, and noticed another (perhaps related?) issue that suggested something about memory. Does this point to something similar, or is it different? After running it, I see all 3 output files (.best, .sing2, and .single), but the .best and .sing2 files are empty, assuming since it was terminated.

Any help would be greatly appreciated!

vkartha · 2020-10-20T01:03:28Z

Sorry, follow up to that, here are the QC logs prior to the markers being processed and that error being thrown:

NOTICE [2020/10/19 20:10:31] - Finished reading 7527981 markers from the VCF file
NOTICE [2020/10/19 20:10:31] - Total number input reads : 12252483
NOTICE [2020/10/19 20:10:31] - Total number valid droplets observed : 1153
NOTICE [2020/10/19 20:10:31] - Total number valid SNPs observed : 7527981
NOTICE [2020/10/19 20:10:31] - Total number of read-QC-passed reads : 12214806
NOTICE [2020/10/19 20:10:31] - Total number of skipped reads with ignored barcodes : 0
NOTICE [2020/10/19 20:10:31] - Total number of non-skipped reads with considered barcodes : 11926820
NOTICE [2020/10/19 20:10:31] - Total number of gapped/noninformative reads : 10590479
NOTICE [2020/10/19 20:10:31] - Total number of base-QC-failed reads : 0
NOTICE [2020/10/19 20:10:31] - Total number of redundant reads : 196126
NOTICE [2020/10/19 20:10:31] - Total number of pass-filtered reads : 1140215
NOTICE [2020/10/19 20:10:31] - Total number of pass-filtered reads overlapping with multiple SNPs : 108279
NOTICE [2020/10/19 20:10:31] - Starting to prune out cells with too few reads...
NOTICE [2020/10/19 20:10:31] - Finishing pruning out 0 cells with too few reads...
NOTICE [2020/10/19 20:10:36] - Starting to identify best matching individual IDs

I was testing a pool of 2 samples, against a joint reference consisting 97 samples.

rhart604 · 2021-08-09T17:45:57Z

I'm also seeing this error. Identical output as vkartha, above. I tried re-compiling htslib and demuxlet in case it was something to do with newer compilers, but I get the same errors.

Any thoughts?

hyunminkang · 2021-08-10T22:49:19Z

This seems like a memory-related issues. How many variants and individuals are you using? Hyun. ----------------------------------------------------- Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : ***@***.***

…

On Mon, Aug 9, 2021 at 1:46 PM rhart604 ***@***.***> wrote: I'm also seeing this error. Identical output as vkartha, above. I tried re-compiling htslib and demuxlet in case it was something to do with newer compilers, but I get the same errors. Any thoughts? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPY5ONSZU6ICKPPR42THWLT4AH6BANCNFSM4SXCAMNQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

rhart604 · 2021-08-11T14:47:08Z

It turns out to be due to large numbers of SNPs in the VCF file. I originally had 9 million SNPs and that crashed demuxlet. I filtered down to less than 2 million and it works now. More than about 2 million causes the error. Could demuxlet be modified to allow larger VCF files/more SNPs?

hyunminkang · 2021-08-14T15:37:13Z

I think it is possible and I believe that there was a pull request that I did not have a chance to merge into yet. I cannot promise the timeline though.. Thanks, Hyun. ----------------------------------------------------- Hyun Min Kang, Ph.D. Associate Professor of Biostatistics University of Michigan, Ann Arbor Email : ***@***.***

…

On Wed, Aug 11, 2021 at 10:47 AM rhart604 ***@***.***> wrote: It turns out to be due to large numbers of SNPs in the VCF file. I originally had 9 million SNPs and that crashed demuxlet. I filtered down to less than 2 million and it works now. More than about 2 million causes the error. Could demuxlet be modified to allow larger VCF files/more SNPs? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABPY5OK3J2MKQ53XWXEAD7TT4KEPPANCNFSM4SXCAMNQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email> .

VicenteFR · 2022-08-01T01:24:24Z

Has anyone found a clever way to overcome this issue? Since it is related to memory, I thought that downsampling the variants in the vcf file would do, but I can't seem to find any efficient and safe way to downsample vcf files. If anyone has found a good way to do this, would you please share it?

Thanks in advance!

bdferris642 · 2023-11-17T16:30:02Z

This was happening to me when I tried demultiplexing with 45M+ SNPs. Demuxlet succeeded with ~6M rows in the VCF. I played around with the numbers to find an upper bound for which demuxlet would not abort...I suspect that this will be sensitive to the memory constraints of your machine but not sure...

@VicenteFR I don't know if you're using imputed SNPs or genotyped ones, but a couple principled way to subset the vcf would be to filter on the basis of imputation R^2 or on Minor Allele Frequency if you have access to that information (sometimes included in the INFO field). If none of that information is available, you could make the assumption that all SNPs are equally informative, in which case random downsampling non-header rows of the vcf would be "safe". You could downsample with different seeds and compare the output (which is not ideal but would give you some idea of what fraction of SNPs was necessary to achieve consistent results)

yimmieg · 2023-11-17T16:33:39Z

Do you need this many SNPs? We often filter for variants in 1000 genomes... ~J

…

On Nov 17, 2023, at 8:30 AM, bdferris642 ***@***.***> wrote: This was happening to me when I tried demultiplexing with 45M+ SNPs. Demuxlet succeeded with ~6M rows in the VCF. I played around with the numbers to find an upper bound for which demuxlet would not abort...I suspect that this will be sensitive to the memory constraints of your machine but not sure... @VicenteFR <https://github.com/VicenteFR> I don't know if you're using imputed SNPs or real ones, but a couple principled way to subset the vcf would be to filter on the basis of imputation R^2 or on Minor Allele Frequency if you have access to that information (sometimes included in the INFO field). If none of that information is available, you could make the assumption that all SNPs are equally informative, in which case random downsampling non-header rows of the vcf would be "safe". You could downsample with different seeds and compare the output (which is not ideal but would give you some idea of what fraction of SNPs was necessary to achieve consistent results) — Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCQY7ZLT4GPVKD7GAWYDBDYE6GJNAVCNFSM4SXCAMN2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGY3TEOBZGM4Q>. You are receiving this because you are subscribed to this thread.

hyunminkang · 2023-11-17T16:42:10Z

If you are using for scRNA-seq, filtering on 1000G exonic SNPs with MAF > 1% (usually ~300K) should be sufficient.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

terminate called after throwing an instance of 'std::bad_array_new_length' #73

terminate called after throwing an instance of 'std::bad_array_new_length' #73

vkartha commented Oct 20, 2020 •

edited

Loading

vkartha commented Oct 20, 2020

rhart604 commented Aug 9, 2021

hyunminkang commented Aug 10, 2021 via email

rhart604 commented Aug 11, 2021

hyunminkang commented Aug 14, 2021 via email

VicenteFR commented Aug 1, 2022

bdferris642 commented Nov 17, 2023 •

edited

Loading

yimmieg commented Nov 17, 2023 via email

hyunminkang commented Nov 17, 2023

terminate called after throwing an instance of 'std::bad_array_new_length' #73

terminate called after throwing an instance of 'std::bad_array_new_length' #73

Comments

vkartha commented Oct 20, 2020 • edited Loading

vkartha commented Oct 20, 2020

rhart604 commented Aug 9, 2021

hyunminkang commented Aug 10, 2021 via email

rhart604 commented Aug 11, 2021

hyunminkang commented Aug 14, 2021 via email

VicenteFR commented Aug 1, 2022

bdferris642 commented Nov 17, 2023 • edited Loading

yimmieg commented Nov 17, 2023 via email

hyunminkang commented Nov 17, 2023

vkartha commented Oct 20, 2020 •

edited

Loading

bdferris642 commented Nov 17, 2023 •

edited

Loading