Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[data] Allow users to specify how many rows/bytes to write per file for write ops #41219

Closed
raulchen opened this issue Nov 16, 2023 · 0 comments · Fixed by #42694
Closed

[data] Allow users to specify how many rows/bytes to write per file for write ops #41219

raulchen opened this issue Nov 16, 2023 · 0 comments · Fixed by #42694
Assignees
Labels
data Ray Data-related issues data-ux P0 Issues that should be fixed in short order ray 2.10 size:small

Comments

@raulchen
Copy link
Contributor

raulchen commented Nov 16, 2023

When we have a massive amount of data to write as parquet files, we usually want to save many rows in one file, instead of ending up with too many files.

Today the write api doesn't have a parameter to control how many rows/bytes to write per file. One workaround is to an empty map_batches op: ds.map_batches(lambda x: x, batch_size=N).write_parquet(...). But the output of map_batches will be chunked based on the target_max_block_size.

@raulchen raulchen added P1 Issue that should be fixed within a few weeks data Ray Data-related issues ray 2.10 data-ux labels Nov 16, 2023
@anyscalesam anyscalesam assigned raulchen and c21 and unassigned raulchen Dec 8, 2023
@c21 c21 assigned bveeramani and unassigned c21 Dec 11, 2023
@anyscalesam anyscalesam added P0 Issues that should be fixed in short order and removed P1 Issue that should be fixed within a few weeks labels Jan 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues data-ux P0 Issues that should be fixed in short order ray 2.10 size:small
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants