Prevent race condition for from_url(create=True) in case of parallel processing #254
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is to fix the issue that
from_url(xxx, create=True)
implemented in #245 can fail in parallel processing situation.When
create=True
option is set, pfio checks the existence of the directory, then creates if it doesn't exist, but this process is not atomic (especially when combined with NFS, where there can be relatively large delay after a filesystem operation is recognized from other processes).It is quite difficult to realize it as an atomic filesystem operation, but in this case it is sufficient to specify
exists_ok=True
option tomakedirs
.The same issue can happen to HDFS, so I fixed both local and HDFS filesystems (for S3 it's not necessary) in this PR.