-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
terminate called after throwing an instance of 'std::bad_array_new_length' #73
Comments
Sorry, follow up to that, here are the QC logs prior to the markers being processed and that error being thrown: NOTICE [2020/10/19 20:10:31] - Finished reading 7527981 markers from the VCF file I was testing a pool of 2 samples, against a joint reference consisting 97 samples. |
I'm also seeing this error. Identical output as vkartha, above. I tried re-compiling htslib and demuxlet in case it was something to do with newer compilers, but I get the same errors. Any thoughts? |
This seems like a memory-related issues. How many variants and individuals
are you using?
Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Associate Professor of Biostatistics
University of Michigan, Ann Arbor
Email : ***@***.***
…On Mon, Aug 9, 2021 at 1:46 PM rhart604 ***@***.***> wrote:
I'm also seeing this error. Identical output as vkartha, above. I tried
re-compiling htslib and demuxlet in case it was something to do with newer
compilers, but I get the same errors.
Any thoughts?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPY5ONSZU6ICKPPR42THWLT4AH6BANCNFSM4SXCAMNQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
It turns out to be due to large numbers of SNPs in the VCF file. I originally had 9 million SNPs and that crashed demuxlet. I filtered down to less than 2 million and it works now. More than about 2 million causes the error. Could demuxlet be modified to allow larger VCF files/more SNPs? |
I think it is possible and I believe that there was a pull request that I
did not have a chance to merge into yet. I cannot promise the timeline
though..
Thanks,
Hyun.
-----------------------------------------------------
Hyun Min Kang, Ph.D.
Associate Professor of Biostatistics
University of Michigan, Ann Arbor
Email : ***@***.***
…On Wed, Aug 11, 2021 at 10:47 AM rhart604 ***@***.***> wrote:
It turns out to be due to large numbers of SNPs in the VCF file. I
originally had 9 million SNPs and that crashed demuxlet. I filtered down to
less than 2 million and it works now. More than about 2 million causes the
error. Could demuxlet be modified to allow larger VCF files/more SNPs?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#73 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABPY5OK3J2MKQ53XWXEAD7TT4KEPPANCNFSM4SXCAMNQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email>
.
|
Has anyone found a clever way to overcome this issue? Since it is related to memory, I thought that downsampling the variants in the vcf file would do, but I can't seem to find any efficient and safe way to downsample vcf files. If anyone has found a good way to do this, would you please share it? Thanks in advance! |
This was happening to me when I tried demultiplexing with 45M+ SNPs. Demuxlet succeeded with ~6M rows in the VCF. I played around with the numbers to find an upper bound for which demuxlet would not abort...I suspect that this will be sensitive to the memory constraints of your machine but not sure... @VicenteFR I don't know if you're using imputed SNPs or genotyped ones, but a couple principled way to subset the vcf would be to filter on the basis of imputation R^2 or on Minor Allele Frequency if you have access to that information (sometimes included in the |
Do you need this many SNPs? We often filter for variants in 1000 genomes...
~J
… On Nov 17, 2023, at 8:30 AM, bdferris642 ***@***.***> wrote:
This was happening to me when I tried demultiplexing with 45M+ SNPs. Demuxlet succeeded with ~6M rows in the VCF. I played around with the numbers to find an upper bound for which demuxlet would not abort...I suspect that this will be sensitive to the memory constraints of your machine but not sure...
@VicenteFR <https://github.com/VicenteFR> I don't know if you're using imputed SNPs or real ones, but a couple principled way to subset the vcf would be to filter on the basis of imputation R^2 or on Minor Allele Frequency if you have access to that information (sometimes included in the INFO field). If none of that information is available, you could make the assumption that all SNPs are equally informative, in which case random downsampling non-header rows of the vcf would be "safe". You could downsample with different seeds and compare the output (which is not ideal but would give you some idea of what fraction of SNPs was necessary to achieve consistent results)
—
Reply to this email directly, view it on GitHub <#73 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCQY7ZLT4GPVKD7GAWYDBDYE6GJNAVCNFSM4SXCAMN2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOBRGY3TEOBZGM4Q>.
You are receiving this because you are subscribed to this thread.
|
If you are using for scRNA-seq, filtering on 1000G exonic SNPs with MAF > 1% (usually ~300K) should be sufficient. |
Hi! I had run demuxlet successfully before, but am now encountering an error:
NOTICE [2020/10/19 20:10:39] - Processing 7470000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7480000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7490000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7500000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7510000 markers...
NOTICE [2020/10/19 20:10:39] - Processing 7520000 markers...
NOTICE [2020/10/19 20:10:39] - Identifying best-matching individual..
NOTICE [2020/10/19 20:10:39] - Processing 1000 droplets...
NOTICE [2020/10/19 20:10:39] - Finished processing 1153 droplets total
terminate called after throwing an instance of 'std::bad_array_new_length'
what(): std::bad_array_new_length
Aborted (core dumped)
My call was as follows (same as the one I used before, which worked for a different bam/vcf file combo):
demuxlet --sam ./sample.bam --tag-group DB --field GT --geno-error 0.1 --min-TD 0 --alpha 0.5 --vcf ./hg38_merged_final_filtered.vcf_sorted.vcf --out ./test_demuxlet.out
I haven't seen this error before, and noticed another (perhaps related?) issue that suggested something about memory. Does this point to something similar, or is it different? After running it, I see all 3 output files (.best, .sing2, and .single), but the .best and .sing2 files are empty, assuming since it was terminated.
Any help would be greatly appreciated!
The text was updated successfully, but these errors were encountered: