InterProcessMutex leaves empty node #29

ph1lm · 2012-02-14T15:35:39Z

Below is groovy code I used for testing:

CuratorFramework framework = CuratorFrameworkFactory.builder().
        connectString(System.getProperty('zookeeper.ensemble', config?.zookeeper?.ensemble ?: 'localhost:2281')).
        sessionTimeoutMs(config?.zookeeper?.session?.timeout ?: 2 * 5000).
        connectionTimeoutMs(config?.zookeeper?.connect?.timeout ?: 2 * 5000).
        retryPolicy(new RetryNTimes(0, 0)).
        build();
framework.start();

def getInteger(property, defaultValue) {
    System.getProperty(property, "${defaultValue}") as Integer
}

getInteger('zookeeper.lock.quantity', 1).times {
    def mutex = new InterProcessMutex(framework, '/some/test');
    mutex.acquire()
    try {
        Thread.sleep(getInteger('zookeeper.lock.sleep', 0))
    } finally {
        mutex.release()
    }
}

framework.close()

After execution I just connect to ZK using Cli:

[zk: localhost:2281(CONNECTED) 7] ls2 /some/test
[]
cZxid = 0x100000004
ctime = Tue Feb 14 17:20:05 EET 2012
mZxid = 0x100000004
mtime = Tue Feb 14 17:20:05 EET 2012
pZxid = 0x100000006
cversion = 2
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 0
numChildren = 0

Node /some/node still exists even after session disconnect.
That may cause a problem because if we don't clear child nodes - parent node may exceed size limit.

Jars used:
curator-client-1.1.0.jar
curator-framework-1.1.0.jar
curator-recipes-1.1.0.jar

Randgalt · 2012-02-14T17:38:29Z

This is actually an inherit problem with ZooKeeper. The desired behavior is for the parent node to get deleted when it has no children. But, there is no way to do this atomically with the deletion of the last child - at least not from the client. Server support is needed for this. There is a feature enhancement being discussed here: https://issues.apache.org/jira/browse/ZOOKEEPER-723

FYI - it's my understanding that these garbage nodes don't add much overhead.

artemip · 2012-06-19T00:55:59Z

I've experienced an issue with Curator where, after many InterProcessMutex locks for unique paths, the leftover garbage nodes have caused a very large snapshot file to be generated. This causes leader election in ZK to fail while trying to transfer the snapshot between servers, unless initLimit and syncLimit in ZK configs are bumped up considerably.

The fix i've implemented in our code is to simply delete the node that is generated to support the lock (right above the leaf nodes) once a release() is called. This way, locks no longer leave behind garbage that collects over time.

Does this approach make sense? If so, would it be something that I should try to implement in Curator?

Randgalt · 2012-06-19T02:07:18Z

I'd like to see what you've done. From what I've tried, there's no way to safely remove the parent as other processes might be assuming that it exists.

artemip · 2012-06-19T19:42:52Z

What I've implemented to account for this is some simple retry logic.

Considering the following situation:

Process A: Taking the lock
Process B: Releasing the lock

A: Check to see that /namespace/#{lock_id} exits, and if not create it. (Let's say it already exists)
B: Delete the ephemeral node /namespace/#{lock_id}/FOO
B: Delete /namespace/#{lock_id}, since it is now empty
A: Try to create /namespace/#{lock_id}/BAR
--> NoNodeException with message "/namespace/#{lock_id} does not exist."

The simple solution would be for Process A to continue trying to acquire() until SUCCESS (or until X number of retries) in the event of a NoNodeException (I see no other circumstances where this exception can arise in InterProcessMutex). Does this seem like an adequate solution on our end?

In terms of implementing something like this in Curator, here is my (untested) idea:

Process A calls mutex.acquire()

In or after the 'ensurePath' step in LockInternals (which makes sure that the path exists, creating nodes as appropriate), set the ACL of /namespace/#{lock_id} to disallow DELETE globally

Upon successful acquisition, remove the NO-DELETE flag. If another process (B) has set the flag after step 2 (tried to acquire a lock), skip this step. Only the last process calling acquire() should be able to remove the flag. (not sure how to go about this)

Randgalt · 2012-06-21T18:01:23Z

That's an interesting idea to use the ACL. I'm not sure a general purpose library like Curator can do that, though, as the client may have its own ACL for that node. I'll think about that a bit.

Another avenue: I've been thinking recently that EnsurePath and the create method creatingParentsIfNeeded() are a bit backwards. Instead of pre-checking for the parent paths, they should be reactive. i.e. only create the paths on Exception. This way, deleting the parent node is safe as subsequent locks will get a NoNodeExists exception which would cause it to create the parents.

artemip · 2012-06-21T18:25:47Z

After attempting to implement the solution posted above (to fix things on my end), I found that any instance of EnsurePath can only have EnsurePath.ensure() be called once (subsequent calls are NOPs).
Due to this, I ended up implementing something along the lines of 'ZookeeperClient.newNamespaceAwareEnsurePath(path).ensure(ZookeeperClient)' in the retry logic for the acquisition step (Pretty much exactly the same solution you posted above, but outside of Curator).

Why is this the case? Wouldn't it be safe to assume that a path may need to be created more than once?

If this can be changed, might it be a good idea for ensurePath.ensure() to be called in both circumstances (i.e. both preemptive and reactive)?

If this (or the reactive-only version) can be implemented, I think the only missing piece would be for the InterProcessMutex, or the client, to manually remove unneeded znodes on lock release.

Randgalt · 2012-06-21T18:28:14Z

The reason is for performance. It would be expensive to check for the parent paths each time.

artemip · 2012-06-21T18:48:28Z

Right, makes sense.

But, what happens in the event of something like this?

Processes A, B, C:

A: Release lock on #{lock_id}
A: Delete the ephemeral node /namespace/#{lock_id}/FOO
A: Delete /namespace/#{lock_id}, since it is now empty

B: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure()

C: Acquire, then release lock
C: Delete /namespace/#{lock_id}, since it is now empty

B: Try to get lock for /namespace/#{lock_id}. NoNodeException occurs, so run EnsurePath.ensure(), which is now a NOP. #{lock_id} doesn't get created, and process B loops eternally (until another lock on #{lock_id} is attempted by another process).

Thank you for your help, by the way!

ph1lm · 2012-06-21T20:49:55Z

Guys,
Just for the case if you don't find solution for this.
We solved the issue by using hash of lock key string instead of the key string itself - thus limiting lock space to some reasonable and limited number. e.g.:
def lockPath = "/our/lock/items/${item.hashCode() % maxNumberOfLocks}"

Randgalt · 2012-06-21T21:57:24Z

artemip - I'd end not using EnsurePath or heavily modifying it. Let me try some ideas and I'll report back here.

artemip · 2012-06-21T22:02:25Z

Awesome, thank you.

Randgalt · 2012-06-22T04:30:08Z

I don't see any way to do this with good guarantees. So, a workaround occurred to me. Why not have a reaper thread that periodically checks for registered nodes. If they are empty, delete them. Here's what I'm thinking of: https://gist.github.com/2970233

Thoughts? Should I add this?

jlaban · 2012-06-22T05:27:30Z

It still has the potential race condition where another process taking that lock might have the directory yanked out from under it, right? (It's greatly reduced the probability though, I think.)

@ph1lm that's an interesting approach too. Shouldn't really be done in Curator of course.

Randgalt · 2012-06-22T05:47:24Z

As to the race - if the recipes/usages are written correctly there should be no race. FYI - I rewrote creatingParentsIfNeeded() as I talked about above. So, if the Reaper deletes the parent, the lock recipes in Curator will be fine as they will just re-make the parents when the error is caught.

jlaban · 2012-06-22T05:49:30Z

Oh ok, I thought you were going to create the reaper instead of the above change. My bad.

Randgalt · 2012-06-22T05:52:10Z

So, to be clear, Curator users will need to create/start the Reaper. Curator won't do it by default. If you like, I can have it on tomorrow's release.

jlaban · 2012-06-23T07:07:42Z

@artemip ?

Randgalt · 2012-06-23T07:15:48Z

Didn't get to it today. I tried, I swear ;) I'll have it by Monday hopefully.

artemip · 2012-06-23T20:38:08Z

I think this is a great solution. Would the reaper be configurable to only listen to and clean up certain directories in ZK? Would be annoying if it consistently removed nodes that are meant to be temporarily empty.

Randgalt · 2012-06-23T20:48:58Z

Yes - you give the Reaper the paths to check.

artemip · 2012-07-04T19:32:47Z

Thank you very much for the fix - exactly what we needed.

I do have some concerns though:

The currently available Reaper modes do not take into account the scenario where a path is removed before the Reaper gets a chance to delete it. Would it be possible to implement a mode like REAP_UNTIL_GONE (or similar)? With this mode, you can configure the Reaper to only remove nodes that exist, and forget them if they do not (as opposed to REAP_UNTIL_DELETE, in which case the Reaper only forgets about the path after it successfully deletes it). If this is something I can do myself and send you a pull request for, that would be great.
There seems to be a possible memory leak with the 'activePaths' variable. In our application, we never know when a path should be removed from the Reaper. So, in our case, this set would grow indefinitely, unless the application explicitly tells the Reaper to remove the paths once it knows that they no longer need to be monitored. Ideally, once a path is deleted, the Reaper should remove all references to it forever. Is this something that can be made configurable as well?

Again, thank you very much for your help.

Let me know if there's anything I can do to assist you with this.

Randgalt · 2012-07-04T21:10:56Z

Good ideas. Please try to do the implementation and submit a pull request.

artemip · 2012-07-05T19:15:09Z

Pull request submitted: #102

Thanks

Randgalt closed this as completed Feb 14, 2012

Randgalt reopened this Jun 21, 2012

Randgalt closed this as completed Jun 22, 2012

Randgalt reopened this Jul 4, 2012

Randgalt closed this as completed Jul 6, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InterProcessMutex leaves empty node #29

InterProcessMutex leaves empty node #29

ph1lm commented Feb 14, 2012

Randgalt commented Feb 14, 2012

artemip commented Jun 19, 2012

Randgalt commented Jun 19, 2012

artemip commented Jun 19, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

ph1lm commented Jun 21, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 22, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 22, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 23, 2012

Randgalt commented Jun 23, 2012

artemip commented Jun 23, 2012

Randgalt commented Jun 23, 2012

artemip commented Jul 4, 2012

Randgalt commented Jul 4, 2012

artemip commented Jul 5, 2012

InterProcessMutex leaves empty node #29

InterProcessMutex leaves empty node #29

Comments

ph1lm commented Feb 14, 2012

Randgalt commented Feb 14, 2012

artemip commented Jun 19, 2012

Randgalt commented Jun 19, 2012

artemip commented Jun 19, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

ph1lm commented Jun 21, 2012

Randgalt commented Jun 21, 2012

artemip commented Jun 21, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 22, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 22, 2012

Randgalt commented Jun 22, 2012

jlaban commented Jun 23, 2012

Randgalt commented Jun 23, 2012

artemip commented Jun 23, 2012

Randgalt commented Jun 23, 2012

artemip commented Jul 4, 2012

Randgalt commented Jul 4, 2012

artemip commented Jul 5, 2012