We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add one or multiple options to allow user specify a strategy to split the dataset among multiple files.
It could be great for example to have :
info: output_name: test output_format: parquet rows: 2_000_000 files: 5
So each file will contains approx. 2M/5 = 400k rows.
We could have parameters like:
files
target_size
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Add one or multiple options to allow user specify a strategy to split the dataset among multiple files.
It could be great for example to have :
So each file will contains approx. 2M/5 = 400k rows.
We could have parameters like:
files
: described abovetarget_size
: split when the file is above a certain threshold (to test HDFS optimal block size for example)The text was updated successfully, but these errors were encountered: