Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Potential Bug with KetamaNodeLocator and DefaultServerPool #113

Open
sirdavidwong opened this Issue · 2 comments

2 participants

@sirdavidwong

I think I've uncovered a bug in the way the KetamaNodeLocator and DefaultServerPool interact. My Enyim setup consists of 2 servers, S1 and S2, and specifying the KetamaNodeLocator as my locator.

The bug occurs when I simulate servers S1 and S2 going on and offline. If I start out with S1 and S2 online, and I take down S1, the keys on S1 will redistribute to S2. However, when S1 comes back online, its keys don't map back to S1. In fact, if I take S1 down again, the server pool won't even detect that. At this point all keys map to S2.

Afterwards, if I take S2 down, S2 is marked as dead and all keys begin to map to S1. If I bring S2 back online, again, keys do not map correctly to S2. All keys map to S1. If S2 is taken down again, the server pool won't detect it.

I believe this is a bug relating to how the DefaultServerPool.NodeFail method and KetamaNodeLocator.Initialize method interact.

KetamaNodeLocator.Initialize initializes its ring of servers only once. Any attempts to do so afterwards will just return.
https://github.com/enyim/EnyimMemcached/blob/master/Enyim.Caching/Memcached/Locators/KetamaNodeLocator.cs#L59

DefaultServerPool.NodeFail creates a new instance of a locator after a node failure is detected. It initializes the the locator with only the servers that are currently alive:
https://github.com/enyim/EnyimMemcached/blob/master/Enyim.Caching/Memcached/DefaultServerPool.cs#L159

In the case of the bug, when S1 is taken offline, a ketama node locator is created and initialized with only S2.

When DefaultServerPool.rezCallback tries to reinitialize the locator if it detects a server has come back online, the KetamaNodeLocator will just return.
https://github.com/enyim/EnyimMemcached/blob/master/Enyim.Caching/Memcached/DefaultServerPool.cs#L118

So when S1 comes back online, there's no way for the already initialized locator to know it's available. When I take S2 down, NodeFail is called again, but this time a new ketama node locator is created with only S1. The same logic applies when S2 comes back online -- it is ignored.

Please let me know what you think about this. One other question: is there an configurable option to choose whether or not to redistribute keys to other servers when a server is dead for the KetamaNodeLocator?
https://github.com/enyim/EnyimMemcached/blob/master/Enyim.Caching/Memcached/Locators/KetamaNodeLocator.cs#L150

In some cases, it'd be helpful to turn this off.

@sirdavidwong

I can attach the NLog Debug output if you'd like.

@hgear

Was there a resolution to your issue?. We are seeing the same thing in production where we have two nodes , 10 instances of web application. The clients seem to favor one node over the other by a lot. When we turn node1 off, node2 ends up getting all the traffic, once node1 is brought online the traffic stays on node2

We are creating a instance of the client for every / get / set request. I assume the internal implementation of the enyim client is a singleton so it should be able to manage the pools and distribute correctly between them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.