You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi all, this is a great product and has saved us a ton of time on our Redshift to Spectrum transition. Question: Is it possible to store the Parquet files in snappy format, rather than gzip? I can see in Writer._get_writer where it's being specified as gzip. Do I have to sub-class Writer, then CsvManifestConverter and ConcurrentManifestConverter in order to specify snappy or is there a simpler way?
Thanks!
Sincerely,
J'son
The text was updated successfully, but these errors were encountered:
Happy you've found Spectrify useful. Regarding your question -- I think that's the easiest way right now... if it's any comfort, it used to be significantly more difficult, so it's at least it's trending in the right direction!
The default is gzip because in the benchmarks I performed:
it was actually faster to convert with gzip
there didn't seem to be a performance difference on the spectrum side
Either/both of those may have been artifacts of our configuration, or may have changed since those tests (last October).
Ways forward:
If you're up for contributing, maybe a PR implementing an environment variable SPECTRIFY_COMPRESSION=snappy?
Provide convincing benchmarks and I'll implement it myself :)
In talking with a few other users, it seems like subclasses become necessary over time anyways -- so maybe it's not such a bad option.
Hi all, this is a great product and has saved us a ton of time on our Redshift to Spectrum transition. Question: Is it possible to store the Parquet files in snappy format, rather than gzip? I can see in Writer._get_writer where it's being specified as gzip. Do I have to sub-class Writer, then CsvManifestConverter and ConcurrentManifestConverter in order to specify snappy or is there a simpler way?
Thanks!
Sincerely,
J'son
The text was updated successfully, but these errors were encountered: