Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request - Column Dictionary Compression #2684

Open
goranschwarz opened this issue Jun 3, 2020 · 1 comment
Open

Feature request - Column Dictionary Compression #2684

goranschwarz opened this issue Jun 3, 2020 · 1 comment

Comments

@goranschwarz
Copy link

Feature request -- Column Dictionary Compression

The idea with Dictionary Compression:

  • "Same" kind of idea as the JVM "may" do with Strings (literal pool or interning)
  • Only store a column content once for every value (this technique is used by some column store databases)
    • So in the case where you have many rows with the same column content, the value is only stored once!
    • On update: old values might be "orphan", which might be removed on a "shutdown compress/defrag"
  • For "log" tables (where nearly 100% are inserts), which probably have many rows with the same column content... this will be a tremendous space saver

On a column level set a compression level, example (syntax could probably be better):

create table t1 (
    id         int                               not null,
    c1         varchar(30)                       not null,
    c2         clob            compress=dict         null,
    c3         varchar(4000)   compress=dict         null
)

or possibly

create table t1 (
    id         int             not null,
    c1         varchar(30)     not null,
    c2         clob                null,
    c3         varchar(4000)       null,
)
with dictionary_compression on (c2, c3)

Is this a good idea?

@katzyn
Copy link
Contributor

katzyn commented Jun 4, 2020

It creates a lot of complexity and reduces performance in the most of cases where people use general-purpose database systems to store their data.

If you need a simple compact storage for character data with some search capabilities, you can use something like Apache Lucene instead.

But if you need a database, I suggest you to take a fresh look of your schema and perform its normalization instead. Better database design is superior to all possible storage optimizations in DBMS itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants