Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paftools.js call with k8 v1 can't handle references > 1.5 Gb #1166

Closed
ASLeonard opened this issue Feb 29, 2024 · 3 comments
Closed

paftools.js call with k8 v1 can't handle references > 1.5 Gb #1166

ASLeonard opened this issue Feb 29, 2024 · 3 comments
Labels

Comments

@ASLeonard
Copy link

When using paftools call -f <reference> with k8-1.0 I now get OOM errors like

<--- Last few GCs --->
[26894:0x192f630]     3176 ms: Mark-sweep (reduce) 1449.0 (1450.8) -> 1449.0 (1450.6) MB, 0.6 / 0.0 ms  (average mu = 0.997, current mu = 0.991) allocation failure; scavenge might not succeed
<--- JS stacktrace --->
#
# Fatal javascript OOM in Reached heap limit

Trying to adjust the heap size with export NODE_OPTIONS="--max-old-space-size=10000" etc did not help. I checked this with both commits 9506e7a and bc588c0. I tested the exact same call and files with k8-0.2.5 and it works fine.

I was playing around in the code and it seems to be an issue of the size of the sequence dictionary h. I can get this to run (up to an expected error about missing contigs) by only selecting the first 12 contigs (totalling 1431554824 bp), but then crashed on trying with 13 contigs (totalling 1518796476 bp). It also works removing the first 15 or so contigs (totalling 1470948544 bp), so it is definitely related to the size of the values rather than the size of the keys.

Changing this line to some random short string rather than the sequence line allows the fasta to be read and then crashes later because the sequence lengths obviously don't match the variant positions.

} else seq.set(line);

Changing the fasta line lengths (seqtk seq -l 60) also doesn't help.

I have no idea why it appears the dict can't handle >1.5 Gb of strings in k8 v1 but can in k8 v0.2.5, but that appears to be the case.

Best,
Alex

@ASLeonard ASLeonard changed the title paftools.js call with k8 v1 can't handle paftools.js call with k8 v1 can't handle references > 1.5 Gb Feb 29, 2024
@lh3
Copy link
Owner

lh3 commented Mar 9, 2024

Could you provide a test case? Thanks.

@ASLeonard
Copy link
Author

To double check, I downloaded a fresh copy of k8, minimap2, and the cattle reference. If I run

curl https://api.ncbi.nlm.nih.gov/datasets/v2alpha/genome/accession/GCF_002263795.3/download?include_annotation_type=GENOME_FASTA > ncbi.zip
unzip ncbi.zip

touch empty.paf

./k8-1.0/k8-x86_64-Linux minimap2/misc/paftools.js call -f ncbi_dataset/data/GCF_002263795.3/GCF_002263795.3_ARS-UCD2.0_genomic.fna empty.paf

I can still get the OOM heap limit error. If I reduce the reference to

seqtk seq ncbi_dataset/data/GCF_002263795.3/GCF_002263795.3_ARS-UCD2.0_genomic.fna | head -n 24 > small_enough.fa
./k8-1.0/k8-x86_64-Linux minimap2/misc/paftools.js call -f small_enough.fa empty.paf

I get the expected VCF output (which is the header and 0s for all the stats).

@lh3 lh3 added the bug label Mar 11, 2024
@lh3 lh3 closed this as completed in 0efc886 Mar 11, 2024
@lh3
Copy link
Owner

lh3 commented Mar 11, 2024

Thanks for the example. It should have been fixed now. paftools.js is still not fully compatible with k8 v1.0. A few seldom used subcommands are not working. I will fix when I need them or users raise issues.

Actually, you can make the unfixed paftools.js work with

k8 --max-old-space-size=10000 paftools.js call -f hs38.fa empty.paf

Nonetheless, the fix is still preferred and may be a little faster.

Note that k8 inherits all v8 command-line options and --max-old-space-size is one of them. You can use k8 --help to see all v8 options. K8 requires the node source code to compile because it is very difficult to compile the original v8 and node makes this much easier. However, k8 only needs v8 and doesn't have node-specific features. It doesn't parse NODE_OPTIONS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants