Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write metadata file #864

Merged
merged 1 commit into from
Feb 20, 2024
Merged

Write metadata file #864

merged 1 commit into from
Feb 20, 2024

Conversation

RobbeSneyders
Copy link
Member

We currently don't preserve the divisions of the data when writing and reading again, which leads to errors when merging datasets with a low and high amount of partitions. This PR enables the writing of a metadata file which should fix this.

This was originally introduced in #391, which contains more information, but then later reverted in #403 without a clear reasoning.

Let's reactivate it, and if there's a reason to remove it again, let's document it properly.

Copy link
Contributor

@mrchtr mrchtr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RobbeSneyders.

@RobbeSneyders RobbeSneyders merged commit f7c7e10 into main Feb 20, 2024
9 checks passed
@RobbeSneyders RobbeSneyders deleted the feature/metadata-file branch February 20, 2024 13:38
RobbeSneyders added a commit that referenced this pull request Feb 24, 2024
We reintroduced writing the metadata file in #864 to preserve the
divisions of the data when writing and reading again. We turned this
behavior off in the past, but without proper documentation of the
reason.

I'm now running into issues with Dask workers dying when writing large
datasets though, presumably because of the metadata file, as documented
in these Dask issues:
- dask/dask#6600
- dask/dask#3873
- dask/dask#8901

Also, while I ran into issues with the preservation of divisions before,
I can't reproduce this locally with a small example. Let's turn writing
metadata off again and validate if we are still having issues with this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants