Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

low complexity plant samples are completely masked "xxxxx" and use up a lot of memory #236

Open
mclaugsf opened this issue Sep 7, 2023 · 0 comments

Comments

@mclaugsf
Copy link

mclaugsf commented Sep 7, 2023

we're creating a database using bracken-build and the plant option has some low complexity sequences that seem to be causing a memory leak in kraken2.

we limited the memory size to --max-db-size 16000000000 when running kraken2-build but discovered that these trouble sequences seem to be causing kraken2 to use upwards of 70Gb of memory when kraken2 is called to create the database.kraken file. I discovered that chopping up the sequences into chunks seemed to help as the suspected memory leak issue was minimized to some extent. But, then I discovered 3 sequences in the plant database that were masked to 100% "x" by k2mask and it was actually these three sequences running by themselves that were using up a ton of extra memory. The sequences are:

kraken:taxid|45834|NW_026605105.1
kraken:taxid|45834|NW_026605106.1
kraken:taxid|45834|NW_026605107.1

seems this is a bug as a sequence that is entirely masked by "x" characters really shouldn't be using any memory to lookup strings of "x". or these sequences could just be thrown out entirely? not sure if this is a kraken2 or a bracken bug, but it seems to be caused in part by the way bracken is masking the sequences then using kraken2. I also wonder if this is the reason so many people report having to use very large instances due to memory issues?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant