You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
we're creating a database using bracken-build and the plant option has some low complexity sequences that seem to be causing a memory leak in kraken2.
we limited the memory size to --max-db-size 16000000000 when running kraken2-build but discovered that these trouble sequences seem to be causing kraken2 to use upwards of 70Gb of memory when kraken2 is called to create the database.kraken file. I discovered that chopping up the sequences into chunks seemed to help as the suspected memory leak issue was minimized to some extent. But, then I discovered 3 sequences in the plant database that were masked to 100% "x" by k2mask and it was actually these three sequences running by themselves that were using up a ton of extra memory. The sequences are:
seems this is a bug as a sequence that is entirely masked by "x" characters really shouldn't be using any memory to lookup strings of "x". or these sequences could just be thrown out entirely? not sure if this is a kraken2 or a bracken bug, but it seems to be caused in part by the way bracken is masking the sequences then using kraken2. I also wonder if this is the reason so many people report having to use very large instances due to memory issues?
The text was updated successfully, but these errors were encountered:
we're creating a database using
bracken-build
and theplant
option has some low complexity sequences that seem to be causing a memory leak inkraken2
.we limited the memory size to
--max-db-size 16000000000
when runningkraken2-build
but discovered that these trouble sequences seem to be causingkraken2
to use upwards of 70Gb of memory when kraken2 is called to create thedatabase.kraken
file. I discovered that chopping up the sequences into chunks seemed to help as the suspected memory leak issue was minimized to some extent. But, then I discovered 3 sequences in the plant database that were masked to 100% "x" byk2mask
and it was actually these three sequences running by themselves that were using up a ton of extra memory. The sequences are:seems this is a bug as a sequence that is entirely masked by "x" characters really shouldn't be using any memory to lookup strings of "x". or these sequences could just be thrown out entirely? not sure if this is a kraken2 or a bracken bug, but it seems to be caused in part by the way bracken is masking the sequences then using kraken2. I also wonder if this is the reason so many people report having to use very large instances due to memory issues?
The text was updated successfully, but these errors were encountered: