Skip to content
This repository has been archived by the owner on Jan 11, 2021. It is now read-only.

Add Arrow Support #186

Open
6 tasks
sunchao opened this issue Nov 6, 2018 · 8 comments
Open
6 tasks

Add Arrow Support #186

sunchao opened this issue Nov 6, 2018 · 8 comments

Comments

@sunchao
Copy link
Owner

sunchao commented Nov 6, 2018

This is the umbrella ticket to track adding Apache Arrow support. Tasks:

@liurenjie1024
Copy link
Contributor

I think the next tasks will be:

  • Add reader that reads parquet into arrow.
  • Complete the converter to convert arrow schema to parquet schema.
  • Add writer to save arrow data to parquet format.

@sunchao
Copy link
Owner Author

sunchao commented Nov 7, 2018

Thanks @liurenjie1024 . Updated the description for some potential tasks.

@sadikovi
Copy link
Collaborator

sadikovi commented Nov 8, 2018

I suggest adding an item to update the existing doc to reflect the addition of arrow reader/writer.

@andygrove
Copy link
Contributor

andygrove commented Nov 8, 2018 via email

@sunchao
Copy link
Owner Author

sunchao commented Nov 8, 2018

@sadikovi Thanks - added.
@andygrove cool - will take a look.

@liurenjie1024
Copy link
Contributor

@andygrove Yes, I'll take that as a reference. Also I'll also reference the cpp implementation of arrow adapter of parquet.

@andygrove
Copy link
Contributor

I am very interested in this. I am wondering if we can add a generic reader trait to the main arrow project and then have an implementation in parquet-rs.

I have a CSV reader for arrow that could be published as a separate crate and implement the same trait.

@andygrove
Copy link
Contributor

Actually, maybe this is as simple as implementing Iterator<Arc<RecordBatch>>

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants