Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Write key value metadata after rows #397

Closed
tschaub opened this issue Nov 2, 2022 · 2 comments · Fixed by #399
Closed

Write key value metadata after rows #397

tschaub opened this issue Nov 2, 2022 · 2 comments · Fixed by #399
Assignees
Labels
enhancement Improve a feature that already exists question Further information is requested

Comments

@tschaub
Copy link
Contributor

tschaub commented Nov 2, 2022

I see how to use the KeyValueMetadata function to configure a writer when constructing that writer. This is useful if the key value metadata is known ahead of writing the rows. I'm hoping to find a way to write key value metadata after writing rows. In my case, the key value metadata includes summary information about the rows. I'm trying to write the parquet file in a streaming manner and wanted to avoid buffering all rows to create the key value metadata summary.

I tried using NewWriterConfig, creating a reader with NewGenericReader, and then modifying the config.KeyValueMetadata after writing rows. However, NewGenericReader creates a new config internally, so the one I am able to modify is no longer used.

Is there a way to add key value metadata after writing rows and before writing the file footer?

@achille-roussel achille-roussel self-assigned this Nov 3, 2022
@achille-roussel achille-roussel added question Further information is requested enhancement Improve a feature that already exists labels Nov 3, 2022
@achille-roussel
Copy link
Contributor

Hello @tschaub, thanks for reaching out!

You are correct that there is no way of doing this at this time, this is a use case that we did not anticipate.

We will probably have to add new APIs to support it, let me know if you have a suggestion of what would work well for your use case, otherwise I'll submit a proposal.

@tschaub
Copy link
Contributor Author

tschaub commented Nov 3, 2022

Thanks for the reply, @achille-roussel. For my use case, a writer.KeyValueMetadata function (with behavior similar to the parquet.KeyValueMetadata function would be sufficient. I've opened #399 with a proposed addition.

I'm not attached to that, and if a different API change would be better, I can adapt what is there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Improve a feature that already exists question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants