-
Notifications
You must be signed in to change notification settings - Fork 435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
InterProcessMutex leaves empty node #29
Comments
This is actually an inherit problem with ZooKeeper. The desired behavior is for the parent node to get deleted when it has no children. But, there is no way to do this atomically with the deletion of the last child - at least not from the client. Server support is needed for this. There is a feature enhancement being discussed here: https://issues.apache.org/jira/browse/ZOOKEEPER-723 FYI - it's my understanding that these garbage nodes don't add much overhead. |
I've experienced an issue with Curator where, after many InterProcessMutex locks for unique paths, the leftover garbage nodes have caused a very large snapshot file to be generated. This causes leader election in ZK to fail while trying to transfer the snapshot between servers, unless initLimit and syncLimit in ZK configs are bumped up considerably. The fix i've implemented in our code is to simply delete the node that is generated to support the lock (right above the leaf nodes) once a release() is called. This way, locks no longer leave behind garbage that collects over time. Does this approach make sense? If so, would it be something that I should try to implement in Curator? |
I'd like to see what you've done. From what I've tried, there's no way to safely remove the parent as other processes might be assuming that it exists. |
What I've implemented to account for this is some simple retry logic. Considering the following situation:
The simple solution would be for Process A to continue trying to acquire() until SUCCESS (or until X number of retries) in the event of a NoNodeException (I see no other circumstances where this exception can arise in InterProcessMutex). Does this seem like an adequate solution on our end? In terms of implementing something like this in Curator, here is my (untested) idea:
|
That's an interesting idea to use the ACL. I'm not sure a general purpose library like Curator can do that, though, as the client may have its own ACL for that node. I'll think about that a bit. Another avenue: I've been thinking recently that EnsurePath and the create method creatingParentsIfNeeded() are a bit backwards. Instead of pre-checking for the parent paths, they should be reactive. i.e. only create the paths on Exception. This way, deleting the parent node is safe as subsequent locks will get a NoNodeExists exception which would cause it to create the parents. |
After attempting to implement the solution posted above (to fix things on my end), I found that any instance of EnsurePath can only have EnsurePath.ensure() be called once (subsequent calls are NOPs). Why is this the case? Wouldn't it be safe to assume that a path may need to be created more than once? If this can be changed, might it be a good idea for ensurePath.ensure() to be called in both circumstances (i.e. both preemptive and reactive)? If this (or the reactive-only version) can be implemented, I think the only missing piece would be for the InterProcessMutex, or the client, to manually remove unneeded znodes on lock release. |
The reason is for performance. It would be expensive to check for the parent paths each time. |
Right, makes sense. But, what happens in the event of something like this?
Thank you for your help, by the way! |
Guys, |
artemip - I'd end not using EnsurePath or heavily modifying it. Let me try some ideas and I'll report back here. |
Awesome, thank you. |
I don't see any way to do this with good guarantees. So, a workaround occurred to me. Why not have a reaper thread that periodically checks for registered nodes. If they are empty, delete them. Here's what I'm thinking of: https://gist.github.com/2970233 Thoughts? Should I add this? |
It still has the potential race condition where another process taking that lock might have the directory yanked out from under it, right? (It's greatly reduced the probability though, I think.) @ph1lm that's an interesting approach too. Shouldn't really be done in Curator of course. |
As to the race - if the recipes/usages are written correctly there should be no race. FYI - I rewrote creatingParentsIfNeeded() as I talked about above. So, if the Reaper deletes the parent, the lock recipes in Curator will be fine as they will just re-make the parents when the error is caught. |
Oh ok, I thought you were going to create the reaper instead of the above change. My bad. |
So, to be clear, Curator users will need to create/start the Reaper. Curator won't do it by default. If you like, I can have it on tomorrow's release. |
@artemip ? |
Didn't get to it today. I tried, I swear ;) I'll have it by Monday hopefully. |
I think this is a great solution. Would the reaper be configurable to only listen to and clean up certain directories in ZK? Would be annoying if it consistently removed nodes that are meant to be temporarily empty. |
Yes - you give the Reaper the paths to check. |
Thank you very much for the fix - exactly what we needed. I do have some concerns though:
Again, thank you very much for your help. Let me know if there's anything I can do to assist you with this. |
Good ideas. Please try to do the implementation and submit a pull request. |
Pull request submitted: #102 Thanks |
Below is groovy code I used for testing:
After execution I just connect to ZK using Cli:
Node /some/node still exists even after session disconnect.
That may cause a problem because if we don't clear child nodes - parent node may exceed size limit.
Jars used:
curator-client-1.1.0.jar
curator-framework-1.1.0.jar
curator-recipes-1.1.0.jar
The text was updated successfully, but these errors were encountered: