Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support compressed databases #12

Open
luposlip opened this issue Oct 10, 2022 · 0 comments
Open

Support compressed databases #12

luposlip opened this issue Oct 10, 2022 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@luposlip
Copy link
Owner

luposlip commented Oct 10, 2022

Currently one of the databases I'm is around 85GB big uncompressed.
Compressed with i.e. xz it's only around 2GB.

When having a lot of databases lying around for different purposes, it really takes a toll on the remaining free space on the harddrive.

Personally supporting compressed database files, would literally free up 100s of GBs on my drive, and the indexing/querying performance shouldn't suffer much.

The individual lines representing documents is actually compressed. But for small documents, there's a lot of duplication across the documents when they get base64 encoded.

Main issue right now is, that when reading compressed files in an efficient way, we have to use a BufferedInputStream. So even though we can count the amount of uncompressed bytes read in the stream, the compressed bytes read is not accurate.

This can probably be mitigated by using a seekable inputstream on top of the compressed one. Then we can index based on the uncompressed bytes count. The downside to this is, that we may need to keep an open seekable inputstream as long as we keep the database value, to not have to open and close a pipe of streams for every query.

I'm open to other ideas! :)

@luposlip luposlip added the enhancement New feature or request label Oct 10, 2022
@luposlip luposlip self-assigned this Oct 10, 2022
@luposlip luposlip changed the title Support for compressed databases Support compressed databases Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant