Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema DDL: add support for ZSTD encoding #237

Closed
miike opened this issue Mar 14, 2017 · 8 comments
Closed

Schema DDL: add support for ZSTD encoding #237

miike opened this issue Mar 14, 2017 · 8 comments
Assignees

Comments

@miike
Copy link

miike commented Mar 14, 2017

Should ZSTD be implemented and considered as a compression type particularly for VARCHAR columns?

@chuwy
Copy link
Contributor

chuwy commented Mar 14, 2017

Thanks @miike! That's question for @bogaert, I think.

@bogaert
Copy link

bogaert commented Mar 14, 2017

@miike have you run any tests to see how this compares to LZO (our current default)?

@miike
Copy link
Author

miike commented Mar 24, 2017

@bogaert I'm getting pretty good compressions ratios all round for varchar columns but ZSTD works on all types. Is there any sort of standardised datasets we can use for Snowplow that would make sense so we can make like for like comparisons?

@alexanderdean
Copy link
Member

Is there any sort of standardised datasets we can use for Snowplow that would make sense so we can make like for like comparisons?

We have a retail dataset that we could potentially use - @bogaert knows more...

@bogaert
Copy link

bogaert commented Mar 24, 2017

I have created an internal card to see what we can do. I'll keep you posted @miike.

@miike
Copy link
Author

miike commented Mar 25, 2017

Thanks @bogaert!

@bernardosrulzon
Copy link

bernardosrulzon commented Apr 22, 2017

@miike @alexanderdean @bogaert I'm seeing very good results with ZSTD. The org.w3/performance_timing table was reduced to 30% its original size after encoding eveything as ZSTD. It should be safe to use as default as it works on all data types.

@miike
Copy link
Author

miike commented Jun 26, 2017

Bumping this - we've just compressed a moderately sized (~3TB) atomic.events dataset down to 30% of it's size using ZSTD that's resulted in some speed improvements. I think it's worth further testing and including in Iglu as a default over LZO (analyze compression will now recommend ZSTD over LZO apart from some instances in which columns contain a large number of nulls).

@chuwy chuwy added this to the Release 8 Stamp TBC milestone Dec 15, 2017
@oguzhanunlu oguzhanunlu changed the title RFC: Inclusion of ZSTD Schema DDL: add support for ZSTD encoding Dec 18, 2017
@oguzhanunlu oguzhanunlu self-assigned this Dec 18, 2017
@oguzhanunlu oguzhanunlu mentioned this issue Jan 16, 2018
19 tasks
@chuwy chuwy closed this as completed in 6f5be11 Feb 7, 2018
rzats pushed a commit to snowplow/schema-ddl that referenced this issue Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants