Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FIX] Also allow errno.EBUSY during emptydirs on NFS #3357

Merged
merged 5 commits into from
Oct 13, 2021

Conversation

HippocampusGirl
Copy link
Contributor

Summary

Running a workflow with nipype 1.6.1, I sometimes get the following exception:

│     Traceback (most recent call last):
│       File "/usr/local/miniconda/lib/python3.7/site-packages/halfpipe/plugins/multiproc.py", line 71, in run_node
│         result["result"] = node.run(updatehash=updatehash)
│       File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 493, in run
│         emptydirs(outdir, noexist_ok=True)
│       File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 794, in emptydirs
│         raise ex
│       File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 781, in emptydirs                                                                                           │         shutil.rmtree(path)
│       File "/usr/local/miniconda/lib/python3.7/shutil.py", line 494, in rmtree
│         _rmtree_safe_fd(fd, path, onerror)
│       File "/usr/local/miniconda/lib/python3.7/shutil.py", line 452, in _rmtree_safe_fd
│         onerror(os.unlink, fullname, sys.exc_info())
│       File "/usr/local/miniconda/lib/python3.7/shutil.py", line 450, in _rmtree_safe_fd
│         os.unlink(entry.name, dir_fd=topfd)
│     OSError: [Errno 16] Device or resource busy: '.nfs0000000002273a940000254d'
│     During handling of the above exception, another exception occurred:
│     Traceback (most recent call last):
│       File "/usr/local/miniconda/lib/python3.7/concurrent/futures/process.py", line 239, in _process_worker
│         r = call_item.fn(*call_item.args, **call_item.kwargs)
│       File "/usr/local/miniconda/lib/python3.7/site-packages/halfpipe/plugins/multiproc.py", line 74, in run_node
│         result["result"] = node.result
│       File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 217, in result
│         op.join(self.output_dir(), "result_%s.pklz" % self.name)
│       File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 291, in load_resultfile
│         raise FileNotFoundError(results_file)
│     FileNotFoundError

As far as I can tell, if a file on an NFS mount is still open somewhere, when it is deleted, NFS will rename that file to a hidden file ("placeholder") in the same directory so that the data stays available to the open file handle (according to https://unix.stackexchange.com/a/348678). When shutil discovers that hidden file, it will try to delete it, and we get an OSError with errno.EBUSY

List of changes proposed in this PR (pull-request)

  • This pull request catches that error number in addition to errno.ENOTEMPTY, which is already handled by current code

Acknowledgment

  • (Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

@effigies
Copy link
Member

Might need to merge/rebase master to fix the dipy issue.

- Can occur if a file is still open somewhere, so NFS will rename it to
  a hidden file in the same directory
- When `shutil` tries to delete that hidden file, we get an `OSError`
  with `errno.EBUSY`
- I forgot that `os.listdir` also lists hidden files in the previous
  commit
- With mock for NFS silly-rename (yes, it's really called that) behavior
@HippocampusGirl
Copy link
Contributor Author

HippocampusGirl commented Jul 30, 2021

I just noticed that I had this error, because I had another instance of the workflow open in a different terminal, still open in the debugger. Closing that instance of Python solved the issue. As a result, I think the issue I had comes down to user error. Nevertheless, it's probably still nice to keep this code, I can imagine other cases of people opening and keeping open files while the workflow is still running (like inspecting an image).

@codecov
Copy link

codecov bot commented Jul 30, 2021

Codecov Report

Merging #3357 (eafdb93) into master (7080ef9) will increase coverage by 0.00%.
The diff coverage is 66.66%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #3357   +/-   ##
=======================================
  Coverage   65.11%   65.12%           
=======================================
  Files         307      307           
  Lines       40373    40373           
  Branches     5326     5327    +1     
=======================================
+ Hits        26288    26291    +3     
+ Misses      13014    13010    -4     
- Partials     1071     1072    +1     
Flag Coverage Δ
unittests 64.83% <66.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
nipype/utils/filemanip.py 73.17% <66.66%> (+0.70%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7080ef9...eafdb93. Read the comment docs.

Copy link
Contributor Author

@HippocampusGirl HippocampusGirl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to try commiting like this, let's see if it works :-)

nipype/utils/tests/test_filemanip.py Outdated Show resolved Hide resolved
Handle mock test case when no `dir_fd` is passed
Comment on lines +677 to +682
if dir_fd is None:
path = Path(pathlike)
deleted = path.with_name(".nfs00000000")
path.rename(deleted)
else:
os.rename(pathlike, ".nfs1111111111", src_dir_fd=dir_fd, dst_dir_fd=dir_fd)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like your idea of avoiding branching. What if we just construct the target correctly?

Suggested change
if dir_fd is None:
path = Path(pathlike)
deleted = path.with_name(".nfs00000000")
path.rename(deleted)
else:
os.rename(pathlike, ".nfs1111111111", src_dir_fd=dir_fd, dst_dir_fd=dir_fd)
target = os.path.join(os.path.dirname(pathlike), ".nfs00000000")
os.rename(pathlike, target, src_dir_fd=dir_fd, dst_dir_fd=dir_fd)

Demo:

>>> [os.path.join(os.path.dirname(pathlike), ".nfs00000000")
...  for pathlike in ["path/subdir/busyfile", "busyfile"]]
['path/subdir/.nfs00000000', '.nfs00000000']

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Green check in any case. Happy to merge as-is, or if you want to make that change.

@effigies effigies merged commit 4d915a8 into nipy:master Oct 13, 2021
@HippocampusGirl HippocampusGirl deleted the fix-emptydirs branch October 13, 2021 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants