Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

My commits for ISPN-1106 that got in 5.0.x were not quite the same as in master, this pull request gets 5.0.x to the same state as master (re: ISPN-1106) #518

Closed
wants to merge 8 commits into from

Conversation

danberindei
Copy link
Member

Dan Berindei added 8 commits September 7, 2011 16:45
…y were waiting for the default cache to finish rehashing.
Fixed mismatchedTestNames.sh script.
* In LockManagerImpl log the other keys owned by the current transaction.
* In DefaultCacheManager push the cache name to the NDC during cache startup.
* Improved toString() for RehashControlCommand and DistributedExecuteCommand.
* In InboundInvocationHandler log the cache name.
* Log cache start/stop.
* Log the read lock owners in JGroupsDistSync.
…nceTask can invalidate the keys after rehashing is done but before the cache listeners (e.g. KeyAffinityService) know it.
…hash from finishing

The generic scenario involves multiple caches.
Say we have transactions Tx1 and Tx2 spanning caches C1 and C2.
A new node joins the cluster, starting C1 and C2.
With the following sequence of events rehashing will be blocked for lockAcquisitionTimeout.

1. Tx1 prepares on C1 locking K1
2. Tx2 wants to prepare on C2, Tx2 gets the tx lock
3. Tx2 now waits to lock K1 while holding the tx lock on C2
4. Rehash starts on C2 but it can't proceed because Tx2 has the tx lock
5. Tx1 now wants to prepare on C2, but can't acquire the tx lock

I've implemented a crude "deadlock detection" scheme: a new tx will wait
the full lockAcquisitionTimeout for the tx lock, but a tx that already
has locks acquired will only wait 1/100 of that. So if there is a cycle
it will break much quicker and allow rehashing to proceed.

There is also a simpler variant where the transactions work with a single cache.
In that case if the remote command can't acquire the tx lock with 0 timeout it knows
that it has the tx lock on the origin node and it's in a deadlock situation.
This is no longer strictly necessary for ISPN-1106, as we are waiting
with a shorter timeout on transactions with locks and so the rehash
does not block for a very long period of time.

It is recommended however to start all caches on application startup,
and this method provides an easy way for users to start all their caches.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants