-
Notifications
You must be signed in to change notification settings - Fork 19
Write support for Parquet (low-level writes) #116
Comments
@sunchao Would you mind commenting on this issue? I would like to know your opinion and any suggestions you have on the write support and overall direction, or if there is anything that I missed. I am happy to take the ownership of this work and my original plan was exploring this in my branch and creating incremental PRs to support write functionality. Let me know what you think and if there are any changes you want to make. Thanks! |
Thanks @sadikovi , this will be a great feature to have. I'm not very familiar with the write path so far but will take a look at the existing implementations and add my inputs. I do suggest we make this as a umbrella issue and break the task into pieces (just discovered this feature in GitHub). |
Sorry for the delay, I can finally start working on this, will experiment in my branch, see what happens. |
no worries at all! I was also busy with some other stuff recently and haven't done much at all on parquet-rs... do plan to start looking at arrow-parquet integration as well as understanding how write can be implemented for parquet-rs. |
Initial write support is proposed in #127 and is considered WIP at the moment. The PR implements low-level write API, which means we provide some basic traits and structs for user to write data: Unfortunately, PR contains some other code for metadata conversion, etc., which might complicate OverviewYou can have a look at User creates Each All reads are sequential, when user asks for a row group, she cannot write data in parallel, it must User is expected to write column by column; in fact, it enforced in the implementation. Every time Technically The overall API resembles the structure of read path, including The main files are:
Features
Gotchas and limitations
|
@sunchao I opened a PR with the initial write support. Could you review when you have time? We can discuss the details and high level approaches in the PR. Thanks! |
I updated description with the sub-tasks. |
Prototype did not use |
@sunchao would it be okay to open PR with just column writer with some changes in page writer, and add tests in a separate pull request? It could be difficult to review them in one PR. |
@sadikovi Sure. Please go ahead. Thanks. |
@sunchao There are 2 stories left to address. Would you like to wait until we fix them, or would you like to close the write support issue and move those 2 stories into separate issues? |
It's up to you :) we can tackle them in later releases if you think they do not affect the functionality of writer but just an improvement. |
I moved two last tasks into separate issues, because they are merely enhancements and do not affect the core functionality of the writers. I am going to close this issue, indicating that write support has been added and we can start writing files using column writers or building on top of it using Arrow or some other approach. Any issues that arise as features or bugs will go as separate issues. Thanks! |
This is an RFC for write support in this crate.
Prototype PR #127. API resembles closely read support. See the design overview and implementation details below in the comments.
Sub-tasks:
Add reader propertiesUpdate the batch size (Add reader properties #161).The text was updated successfully, but these errors were encountered: