Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent race condition for from_url(create=True) in case of parallel processing #254

Merged
merged 1 commit into from
Jan 11, 2022

Conversation

belltailjp
Copy link
Member

@belltailjp belltailjp commented Jan 7, 2022

This PR is to fix the issue that from_url(xxx, create=True) implemented in #245 can fail in parallel processing situation.

# from_url.py
import pfio
pfio.v2.from_url('non-existent-dir', create=True)
> mpiexec -n 4 python from_url.py
Traceback (most recent call last):
  File "from_url.py", line 5, in <module>
    pfio.v2.from_url('non-existent-dir', create=True)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 359, in from_url
    fs = _from_scheme(scheme, dirname, kwargs, bucket=parsed.netloc)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/fs.py", line 371, in _from_scheme
    fs = Local(dirname, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/pfio/v2/local.py", line 56, in __init__
    os.makedirs(self._cwd)
  File "/usr/local/lib/python3.8/os.py", line 223, in makedirs
    mkdir(name, mode)
FileExistsError: [Errno 17] File exists: 'non-existent-dir'

When create=True option is set, pfio checks the existence of the directory, then creates if it doesn't exist, but this process is not atomic (especially when combined with NFS, where there can be relatively large delay after a filesystem operation is recognized from other processes).
It is quite difficult to realize it as an atomic filesystem operation, but in this case it is sufficient to specify exists_ok=True option to makedirs.

The same issue can happen to HDFS, so I fixed both local and HDFS filesystems (for S3 it's not necessary) in this PR.

@kuenishi kuenishi added the cat:bug Bug report or fix. label Jan 11, 2022
@kuenishi
Copy link
Member

/test

@pfn-ci-bot
Copy link

Successfully created a job for commit 1e373c9:

1 similar comment
@pfn-ci-bot
Copy link

Successfully created a job for commit 1e373c9:

@kuenishi kuenishi added this to the 2.1.2 milestone Jan 11, 2022
@kuenishi kuenishi merged commit 7fb3698 into pfnet:master Jan 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:bug Bug report or fix.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants