ISPN-1000 - Block new transactions while rehash is in progress #405

danberindei · 2011-06-26T06:02:29Z

https://issues.jboss.org/browse/ISPN-1000
Master only

I updated the rebalance algorithm to block transactions in the entire cluster untill all nodes have finished pushing state. The apply state commands will bypass the block since they use the SKIP_LOCKING flag.

In an earlier commit for ISPN-1106 I was catching InterruptedException and wrapping it in an IllegalStateException, now I moved the wait to the RehashControlCommand-related methods only and they don't catch the InterruptedException.

Sanne · 2011-06-26T15:12:02Z

Was it not one of the goals of Infinispan to be always available, even during rehashes ? I thought that in London's meeting Manik had suggested a design in which the transactions log would be used to apply the changes to the rehashing nodes, I didn't think much about it but I was assuming a solution without blocking transactions was the goal.
I don't know if this was discussed, I'm just surprised; this might well be the best that can be implemented.

danberindei · 2011-06-27T09:56:12Z

Sanne, you mean even if one or more of the nodes is doing a rehash and is blocked, the rest of the nodes should still respond to requests? You are right, we did lose this possibility with these changes.

If you mean that every node should be available all the time, that was never the case. Before the push-based redesign we had indeed a much shorter blocking period, only while draining the transaction log. But Manik suggested that we first make the new algorithm work without the non-blocking part, since concurrent state update was a big source of problems with the previous algorithm.

I admit blocking the entire cluster is yet another step back from the availability POV (note that this only applies to write operations, though). I think it is however a bigger step forward from the consistency POV. I have thought about blocking the transactions on just the sending/receiving nodes, but it's very hard for a node to know how long to wait for state or even if it will receive state from another node for a certain view change (JGroups AND our rehashing algorithm will sometimes skip views).

I have started implementing something that would block a single node and still give us consistency, but I couldn't make it work reliably (yet). In the end I think we will want to enable virtual nodes by default and then almost every rehash will involve all the nodes so it won't hurt that much.

I certainly plan to work on this again when we re-implement non-blocking state transfer, or even sooner if I get any good ideas.

Sanne · 2011-06-27T10:06:19Z

@danberindei thanks for the great explanation.

Sanne, you mean even if one or more of the nodes is doing a rehash and is blocked, the rest of the nodes should still respond to requests? You are right, we did lose this possibility with these changes.

Yes that's what I meant.

But Manik suggested that we first make the new algorithm work without the non-blocking part
+1000 ;)

So this affects only write operations? that's not that bad after all.

Apply state commands and all commands using the SKIP_LOCKING flag are still allowed while new transactions are blocked. Transactions are now blocked while rehashing is in progress, they are not resumed immediately after pushing state.

…trolCommand

Dan Berindei added 2 commits June 27, 2011 13:56

ISPN-1000 - Wait for the join to start only when handling a RehashCon…

3c91e16

…trolCommand

maniksurtani closed this Jun 28, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-1000 - Block new transactions while rehash is in progress #405

ISPN-1000 - Block new transactions while rehash is in progress #405

danberindei commented Jun 26, 2011

Sanne commented Jun 26, 2011

danberindei commented Jun 27, 2011

Sanne commented Jun 27, 2011

ISPN-1000 - Block new transactions while rehash is in progress #405

ISPN-1000 - Block new transactions while rehash is in progress #405

Conversation

danberindei commented Jun 26, 2011

Sanne commented Jun 26, 2011

danberindei commented Jun 27, 2011

Sanne commented Jun 27, 2011