New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Apache parquet as an output format for enriched data #87
Comments
I'm interested in picking this up, would a PR get merged for this work? |
yup sure 👍 |
Any update? @darrenhaken are you working on it? |
I’ve been away on leave, I can take a look when I’m back in a few weeks.
Feel free to pick it up if you want it sooner though!
…On Fri, 19 Oct 2018 at 17:04, Yoel Benharrous ***@***.***> wrote:
Any update? @darrenhaken <https://github.com/darrenhaken> are you working
on it?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#87 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA6Me5FQJeKQes8xFMpAhTQ3nq61ocLZks5umYdzgaJpZM4OFQHw>
.
|
@darrenhaken any updates? |
Any update here? |
is there any progress on this |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In order to use S3 as a queryable data lake, it would be beneficial to store the enriched data in a columnar data format like Apache Parquet [1]. We did some performance tests with Athena, and it seems to perform best for parquet, as opposed to TSV.
Thanks!
[1] https://parquet.apache.org/
The text was updated successfully, but these errors were encountered: