Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FSST-based string compression #2861

Open
semihsalihoglu-uw opened this issue Feb 10, 2024 · 0 comments
Open

FSST-based string compression #2861

semihsalihoglu-uw opened this issue Feb 10, 2024 · 0 comments
Assignees
Labels
feature New features or missing components of existing features performance optimization

Comments

@semihsalihoglu-uw
Copy link
Contributor

I suggest we apply FSST-based string compression, which assigns codes to sub-strings and uses them. This should decrease our database size in many string columns, especially on RDFGraphs, where the size of the resource table is the long IRIs that are stored. We could define a new data type called IRI for these but that would be a very specialized optimization. Assuming we get most of the benefits through FSST, getting the benefits through FSST would be ideal as, any string column benefits from it.

Here at the pointers to get started on this:

FSST paper: https://dl.acm.org/doi/abs/10.14778/3407790.3407851
FSST library by Boncz: https://github.com/cwida/fsst
Integration of FSST into DuckDB for reference: duckdb/duckdb#4366

@semihsalihoglu-uw semihsalihoglu-uw added feature New features or missing components of existing features performance optimization labels Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New features or missing components of existing features performance optimization
Projects
None yet
Development

No branches or pull requests

2 participants