Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(io/default): Try to clean up during idle updates #182

Merged
merged 2 commits into from
Jun 17, 2024

Conversation

ketiltrout
Copy link
Member

This adds an attempt to clean up a DefaultIONode during an idle update by:

  • looking for .placeholder files and deleting them
  • attempting to remove acq directories

Because this routine runs when the node is idle (i.e. only when there's no other I/O occurring), no placeholders should be on the node. Any which are found are clearly spurious due to prior crashes.

I've also implemented a check to reduce how often it runs. It will always run at start-up (when I suspect most uncleanliness would be found), and then once every 100 times the node transitions from not-idle to idle. Not really sure how often is appropriate. It might even be sufficient to only run it on start-up.

While implementing this, I discovered that the code that was deleting acq dirs wasn't stopping at the StorageNode.root, meaning there was a potential to delete the node directory itself (plus anything above that)!

In practice, on DefaultIO nodes, this couldn't happen because all such nodes have a ALPENHORN_NODE file at the top level, but that's not necessarily true for other IO classes which still use the DefaultIO's delete function (for example, the LustreHSM I/O class).

I've fixed this bug while moving the directory deletion code from the delete_async into its own function in ioutil because the cleanup task is now also using it.

Also, removed submitting an uncessary job which was deleting zero file copies.

This adds an attempt to clean up a DefaultIONode during an
idle update by:
 * looking for `.placeholder` files and deleting them
 * attempting to remove acq directories

Because this routine runs when the node is idle (i.e. only
when there's no other I/O occurring), no placeholders should be
on the node.  Any which are found are clearly spurious due to
prior crashes.

I've also implemented a check to reduce how often it runs.
It will always run at start-up (when I suspect most uncleanliness
would be found), and then once every 100 times the node transitions
from not-idle to idle.

While implementing this, I discovered that the code that was
deleting acq dirs wasn't stopping at the StorageNode.root, meaning
there was a potential to delete the node directory itself (plus anything
above that)!

In practice, on DefaultIO nodes, this couldn't happen because
all such nodes have a `ALPENHORN_NODE` file at the top level, but
that's not necessarily true for other IO classes which still use the
DefaultIO's delete function (for example, the LustreHSM I/O class).

I've fixed this bug while moving the directory deletion code from
the delete_async into its own function in `ioutil` because the
cleanup task is now also using it.

Also, removed submitting an uncessary job which was deleting zero file
copies.
@ketiltrout ketiltrout requested review from ljgray and rikvl June 15, 2024 00:21
alpenhorn/io/default.py Outdated Show resolved Hide resolved
alpenhorn/io/default.py Outdated Show resolved Hide resolved
@ketiltrout ketiltrout merged commit 742165e into master Jun 17, 2024
3 checks passed
@ketiltrout ketiltrout deleted the idle_cleanup branch June 17, 2024 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants