You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I suggest we apply FSST-based string compression, which assigns codes to sub-strings and uses them. This should decrease our database size in many string columns, especially on RDFGraphs, where the size of the resource table is the long IRIs that are stored. We could define a new data type called IRI for these but that would be a very specialized optimization. Assuming we get most of the benefits through FSST, getting the benefits through FSST would be ideal as, any string column benefits from it.
I suggest we apply FSST-based string compression, which assigns codes to sub-strings and uses them. This should decrease our database size in many string columns, especially on RDFGraphs, where the size of the resource table is the long IRIs that are stored. We could define a new data type called IRI for these but that would be a very specialized optimization. Assuming we get most of the benefits through FSST, getting the benefits through FSST would be ideal as, any string column benefits from it.
Here at the pointers to get started on this:
FSST paper: https://dl.acm.org/doi/abs/10.14778/3407790.3407851
FSST library by Boncz: https://github.com/cwida/fsst
Integration of FSST into DuckDB for reference: duckdb/duckdb#4366
The text was updated successfully, but these errors were encountered: