Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Snappy Compression Support #38

Open
jcannelos opened this issue Aug 23, 2018 · 1 comment
Open

Snappy Compression Support #38

jcannelos opened this issue Aug 23, 2018 · 1 comment

Comments

@jcannelos
Copy link

  • Spectrify version: 1.0.1
  • Python version: 3.5
  • Operating System: Win 10 Pro

Hi all, this is a great product and has saved us a ton of time on our Redshift to Spectrum transition. Question: Is it possible to store the Parquet files in snappy format, rather than gzip? I can see in Writer._get_writer where it's being specified as gzip. Do I have to sub-class Writer, then CsvManifestConverter and ConcurrentManifestConverter in order to specify snappy or is there a simpler way?

Thanks!

Sincerely,

J'son

@c-nichols
Copy link
Collaborator

Hi J'son :)

Happy you've found Spectrify useful. Regarding your question -- I think that's the easiest way right now... if it's any comfort, it used to be significantly more difficult, so it's at least it's trending in the right direction!

The default is gzip because in the benchmarks I performed:

  • it was actually faster to convert with gzip
  • there didn't seem to be a performance difference on the spectrum side

Either/both of those may have been artifacts of our configuration, or may have changed since those tests (last October).

Ways forward:

  • If you're up for contributing, maybe a PR implementing an environment variable SPECTRIFY_COMPRESSION=snappy?
  • Provide convincing benchmarks and I'll implement it myself :)
  • In talking with a few other users, it seems like subclasses become necessary over time anyways -- so maybe it's not such a bad option.

Thanks,
Colin

@c-nichols c-nichols changed the title Parquet + Snappy Question Snappy Compression Support Jan 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants