Support compressed databases #12

luposlip · 2022-10-10T12:01:01Z

Currently one of the databases I'm is around 85GB big uncompressed.
Compressed with i.e. xz it's only around 2GB.

When having a lot of databases lying around for different purposes, it really takes a toll on the remaining free space on the harddrive.

Personally supporting compressed database files, would literally free up 100s of GBs on my drive, and the indexing/querying performance shouldn't suffer much.

The individual lines representing documents is actually compressed. But for small documents, there's a lot of duplication across the documents when they get base64 encoded.

Main issue right now is, that when reading compressed files in an efficient way, we have to use a BufferedInputStream. So even though we can count the amount of uncompressed bytes read in the stream, the compressed bytes read is not accurate.

This can probably be mitigated by using a seekable inputstream on top of the compressed one. Then we can index based on the uncompressed bytes count. The downside to this is, that we may need to keep an open seekable inputstream as long as we keep the database value, to not have to open and close a pipe of streams for every query.

I'm open to other ideas! :)

The text was updated successfully, but these errors were encountered:

luposlip added the enhancement New feature or request label Oct 10, 2022

luposlip self-assigned this Oct 10, 2022

luposlip changed the title ~~Support for compressed databases~~ Support compressed databases Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support compressed databases #12

Support compressed databases #12

luposlip commented Oct 10, 2022 •

edited

Loading

Support compressed databases #12

Support compressed databases #12

Comments

luposlip commented Oct 10, 2022 • edited Loading

luposlip commented Oct 10, 2022 •

edited

Loading