You can clone with
HTTPS or Subversion.
I think I've uncovered a bug in the way the KetamaNodeLocator and DefaultServerPool interact. My Enyim setup consists of 2 servers, S1 and S2, and specifying the KetamaNodeLocator as my locator.
The bug occurs when I simulate servers S1 and S2 going on and offline. If I start out with S1 and S2 online, and I take down S1, the keys on S1 will redistribute to S2. However, when S1 comes back online, its keys don't map back to S1. In fact, if I take S1 down again, the server pool won't even detect that. At this point all keys map to S2.
Afterwards, if I take S2 down, S2 is marked as dead and all keys begin to map to S1. If I bring S2 back online, again, keys do not map correctly to S2. All keys map to S1. If S2 is taken down again, the server pool won't detect it.
I believe this is a bug relating to how the DefaultServerPool.NodeFail method and KetamaNodeLocator.Initialize method interact.
KetamaNodeLocator.Initialize initializes its ring of servers only once. Any attempts to do so afterwards will just return.
DefaultServerPool.NodeFail creates a new instance of a locator after a node failure is detected. It initializes the the locator with only the servers that are currently alive:
In the case of the bug, when S1 is taken offline, a ketama node locator is created and initialized with only S2.
When DefaultServerPool.rezCallback tries to reinitialize the locator if it detects a server has come back online, the KetamaNodeLocator will just return.
So when S1 comes back online, there's no way for the already initialized locator to know it's available. When I take S2 down, NodeFail is called again, but this time a new ketama node locator is created with only S1. The same logic applies when S2 comes back online -- it is ignored.
Please let me know what you think about this. One other question: is there an configurable option to choose whether or not to redistribute keys to other servers when a server is dead for the KetamaNodeLocator?
In some cases, it'd be helpful to turn this off.
I can attach the NLog Debug output if you'd like.
Was there a resolution to your issue?. We are seeing the same thing in production where we have two nodes , 10 instances of web application. The clients seem to favor one node over the other by a lot. When we turn node1 off, node2 ends up getting all the traffic, once node1 is brought online the traffic stays on node2
We are creating a instance of the client for every / get / set request. I assume the internal implementation of the enyim client is a singleton so it should be able to manage the pools and distribute correctly between them.