Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry to storagePosixPathRemove(). #1786

Closed
wants to merge 3 commits into from

Conversation

dwsteele
Copy link
Member

Add a retries to handle the case where readdir() skips over existing files when files are simultaneously being deleted.

This is a bit of a band aid. There are lots of other places where this readdir() issue could bite us.

NOTE: this could use better testing. There is 100% coverage because of the way to code is written, but a regression of the retry would likely not be detected.

@dwsteele dwsteele added this to the 2.40 milestone Jun 15, 2022
@dwsteele dwsteele self-assigned this Jun 15, 2022
@dwsteele dwsteele linked an issue Jun 15, 2022 that may be closed by this pull request
Copy link
Contributor

@jreidthompson jreidthompson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Sorry for the noise/initial comment. I definitely need more caffeine this morning, I completely misread the left hand side of the diffs in git

result = false;
removed = true;
}
// Else if not empty but recursion requested then remove all sub paths/files

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the comment and the name of the function. Should the function be named "storagePosixErrorIsNotEmpty"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It certainly should. Fixed in 5c5f599.


// Delete the path
if (rmdir(strZ(path)) == -1)
do

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a great place to leave a comment 1) keep trying until the directory is fully removed, and 2) why it might not work the first time.

Can we guarantee it can't go into an infinite loop? Perhaps a limit on how many times we retry?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment updated in 8fcb15c and b36d5af.

Can we guarantee it can't go into an infinite loop? Perhaps a limit on how many times we retry?

It seems unlikely, but not impossible. A retry limit would be a good here -- that's pretty standard practice elsewhere. That requires shimming something, but not that big a deal.

@precision-software
Copy link

precision-software commented Jun 30, 2022

Adding and deleting files while reading a directory with readdir() can be a bit unpredictable. The retry approach used here is robust and should work across local and remote file systems, so I'd say go with it.

For some of the situations mentioned earlier, perhaps scandir() could be of help. It tends to be more predictable, especially when the directory is changing as we're scanning it. The memory requirements might be a showstopper.

@dwsteele
Copy link
Member Author

I'm still not entirely happy with this approach. I think we'd have more control if we moved readdir() within this function, or better yet created an abstraction for readdir() that can be used at will in the module. I'm doing other work in this area so I'll think about that.

@precision-software
Copy link

precision-software commented Jun 30, 2022

I agree a readdir abstraction a good idea. One of the properties we might want from the abstraction is to take a consistent "snapshot" of the directory contents. I don't think it matters for local Linux file systems, but it may matter for remote ones. I'm only hypothesizing, but the directory locking mechanisms used internally on local file systems don't exist for remote file systems such as NFS. Perhaps scanning should be part of the storage interface so we can tailor it as needed.

Except, oops, NFS is scanned under the Posix interface. Sigh. There must be a way.

At any rate, this topic doesn't impact the current PR. I'm good to go.

@dwsteele
Copy link
Member Author

dwsteele commented Jul 8, 2022

We decided on a path snapshot approach as implemented by #1805. By not removing files during a scan we should be able to achieve the same functionality without the possible side effects of a retry approach, e.g succeeding even when another process is actively writing into the path we are trying to remove.

@dwsteele dwsteele closed this Jul 8, 2022
@dwsteele dwsteele deleted the dws-remove-cig branch July 22, 2022 19:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

A way to store backups in a non-cloud storage?
3 participants