New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retry to storagePosixPathRemove(). #1786
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. Sorry for the noise/initial comment. I definitely need more caffeine this morning, I completely misread the left hand side of the diffs in git
result = false; | ||
removed = true; | ||
} | ||
// Else if not empty but recursion requested then remove all sub paths/files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the comment and the name of the function. Should the function be named "storagePosixErrorIsNotEmpty"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It certainly should. Fixed in 5c5f599.
|
||
// Delete the path | ||
if (rmdir(strZ(path)) == -1) | ||
do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a great place to leave a comment 1) keep trying until the directory is fully removed, and 2) why it might not work the first time.
Can we guarantee it can't go into an infinite loop? Perhaps a limit on how many times we retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment updated in 8fcb15c and b36d5af.
Can we guarantee it can't go into an infinite loop? Perhaps a limit on how many times we retry?
It seems unlikely, but not impossible. A retry limit would be a good here -- that's pretty standard practice elsewhere. That requires shimming something, but not that big a deal.
Adding and deleting files while reading a directory with readdir() can be a bit unpredictable. The retry approach used here is robust and should work across local and remote file systems, so I'd say go with it. For some of the situations mentioned earlier, perhaps scandir() could be of help. It tends to be more predictable, especially when the directory is changing as we're scanning it. The memory requirements might be a showstopper. |
I'm still not entirely happy with this approach. I think we'd have more control if we moved readdir() within this function, or better yet created an abstraction for readdir() that can be used at will in the module. I'm doing other work in this area so I'll think about that. |
I agree a readdir abstraction a good idea. One of the properties we might want from the abstraction is to take a consistent "snapshot" of the directory contents. I don't think it matters for local Linux file systems, but it may matter for remote ones. I'm only hypothesizing, but the directory locking mechanisms used internally on local file systems don't exist for remote file systems such as NFS. Perhaps scanning should be part of the storage interface so we can tailor it as needed. Except, oops, NFS is scanned under the Posix interface. Sigh. There must be a way. At any rate, this topic doesn't impact the current PR. I'm good to go. |
We decided on a path snapshot approach as implemented by #1805. By not removing files during a scan we should be able to achieve the same functionality without the possible side effects of a retry approach, e.g succeeding even when another process is actively writing into the path we are trying to remove. |
Add a retries to handle the case where readdir() skips over existing files when files are simultaneously being deleted.
This is a bit of a band aid. There are lots of other places where this readdir() issue could bite us.
NOTE: this could use better testing. There is 100% coverage because of the way to code is written, but a regression of the retry would likely not be detected.