You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Migrating this from our Discord channel as reported by a user. Basically, when a few thousand CREATE statements are sent via individual transactions, the on-disk size of the database directory is much larger (400 MB) compared to sending batched transactions (50 MB) via COPY. The question is about whether some sort of compaction process can be triggered to reduce the disk space usage when batched transactions are not possible.
Original message:
Is there a way to reduce the on-disk footprint of kuzu graph db? Can we trigger some sort of compaction on-demand? If I load a few thousand transactions, nodes and edges combined, my on-disk size grows to 400MB. Same transactions when batched create less than 50MB. However, batching is not practical in many cases, therefore is there a way to load transactions and then trigger compaction to reduce the disk footprint?
Questions we asked to clarify more:
"A few thousand transactions, nodes and edges combined": Is this for creation only? i.e., no deletions?
Batched meaning a few thousand creations within a single transaction?
Does most of the size difference come from data.kz file?
Clarifications:
Creation only, no deletion.
It was not a single batch... around 20 batches.
Size difference is all from data.kz file. That single file has most of the data.
Reproducible example:
Here is a test, that reproduces. Just two node types, create a 1000 instances of each. Once for batched and once for non-batched, and print the directory sizes for each.
By looking at the script, one hypothesis for the size difference is due to compression, a few thousand transactions might trigger re-compression of existing tuples, and right now we don't reclaim those space yet (will be added for sure later), while a single COPY statement doesn't trigger re-compression at all.
Will profile a bit more to verify if that's the case.
Migrating this from our Discord channel as reported by a user. Basically, when a few thousand
CREATE
statements are sent via individual transactions, the on-disk size of the database directory is much larger (400 MB) compared to sending batched transactions (50 MB) viaCOPY
. The question is about whether some sort of compaction process can be triggered to reduce the disk space usage when batched transactions are not possible.Original message:
Questions we asked to clarify more:
Clarifications:
Reproducible example:
The text was updated successfully, but these errors were encountered: