Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Inconsistent system in case of operation retry #7270
If the system retries on operation because a member is leaving the cluster, the invocation can be retried due to the response but also due to the member-left-event. In most cases this should not lead to problem, but it can happen that the invocation is executed twice. This can be a problem and can lead to a permanent inconsistent system.
The simplest way I have come up to deal with this properly, making 1 thread responsible for dealing with retrying request; the InvocationMonitorThread. It scans all invocations periodically anyway. When an invocation needs retrying, any thread can set a flag on the invocation (volatile Object retry e.g.). The InvocationMonitorThread can check for this flag and trigger a retry.
The InvocationMonitorThread will also be in charge of modifying the fields of the Invocation/Operation and this should resolve a whole bunch of potential race problems in case of retrying.
For this to work propperly, all invocations need to be registered in the InvocationRegistry so that the InvocationMonitor can detect the retry request. For the time being this is a problem since readonly local calls skip the InvocationRegistry for performance reasons, but in 3.7 we'll get an improved InvocationRegistry where registration/deregistration is very cheap and doesn't generate any litter.