Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Remove unexpected files from parallel text exports #11921

Merged

Conversation

chrisvittal
Copy link
Collaborator

This fixes an issue introduced by #11910, where extra part files may be
created by speculative execution. This is an issue for parallel export
where we did not delete any files.

Now we list the output directory, and compare it to the proper list of
files. The paths that are not in the list of 'good' paths are deleted.

@chrisvittal chrisvittal requested a review from danking June 16, 2022 16:55
Comment on lines 514 to 516
val outputFiles = fs.listStatus(outputPath).map(_.getPath).toSet
outputFiles.diff(files.toSet).foreach(fs.delete(_, false))
val fileset = files.toSet
val diff = outputFiles.diff(fileset)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the local backend, fs.listStatus is returning results with paths being URIs with the file scheme, causing this set difference to return every file in the directory on the local backend. I'm not sure how to get around this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a systemic fix (we really need to get everything on one file system implementation and agree on what scheme (if any) local paths have). The rapid fix is to use fs.canonicalize_path.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to canonicalize the paths in the cleanup method.

This fixes an issue introduced by hail-is#11910, where extra part files may be
created by speculative execution. This is an issue for parallel export
where we did not delete any files.

Now we list the output directory, and compare it to the proper list of
files. The paths that are not in the list of 'good' paths are deleted.
@chrisvittal chrisvittal force-pushed the lowering/better-speculation-overwrite-fix branch from e18b143 to b6d74c3 Compare June 17, 2022 15:57
@danking danking merged commit 55423ea into hail-is:main Jun 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants