New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ISPN-8533 Deadlock in pessimistic transaction #6075
Conversation
run performance tests please |
Performance tests didn't finish successfully. @diegolovison, can you review it? Additional info: |
run performance tests please |
Performance tests didn't finish successfully. @diegolovison, can you review it? Additional info: |
@diegolovison ok! thanks! |
run performance tests please |
Perf looks good. |
Some connection here too (like in #6045), restarted build |
Gotta love the CI stability... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry Pedro, I forgot about this yesterday.
I'm still not sure what this change does TBH, I see you've split backupLockReleased
into multiple CFs, but it's not clear how that fixes the problem :)
// TODO: should we check the write skew configuration here? | ||
// TODO: version seen or looked up remote version? | ||
if (lastVersion != null && lastVersion.compareTo(version) != InequalVersionComparisonResult.EQUAL) { | ||
throw log.writeSkewOnRead(key, key, lastVersion, version); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do? The JIRA only talks about pessimistic transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastVersion
is always null
since getLookedUpRemoteVersion()
is deprecated since 9.0 and always return null
.
@@ -88,6 +88,7 @@ private KeyAwareLockPromise acquireLocalLock(InvocationContext ctx, DataCommand | |||
final TxInvocationContext txContext = (TxInvocationContext) ctx; | |||
Object key = command.getKey(); | |||
txContext.addAffectedKey(key); | |||
((TxInvocationContext) ctx).getCacheTransaction().cleanupBackupLocksForKey(key); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
txContext
//clear the backup locks | ||
context.getCacheTransaction().cleanupBackupLocks(); | ||
keysToLock.removeAll(context.getLockedKeys()); | ||
} | ||
if (command instanceof LockControlCommand) { | ||
//the lock command is only issue if the transaction doesn't have the lock for the keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issued?
And I'd like to see a comment above these 2 blocks explaining why we need to remove backup locks first
} | ||
//avoids the warning about synchronizing in a local variable. | ||
//and allows us to change the CacheTransaction internals without having to worry about it | ||
tx.collectBackupLockKeysForSegments(segments, keyPartitioner, filteredLockedKeys); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here instead I don't think a comment isn't necessary, it's just common sense. However, I would have liked the extracted method to handle both regular locks and backup locks, and I would have preferred a more generic method like forEachLock
.
// we need a synchronized collection to be able to get a valid snapshot from another thread during state transfer | ||
final Set<Object> keys = backupKeyLocks.updateAndGet((value) -> value == null ? Collections.synchronizedSet(new HashSet<>(INITIAL_LOCK_CAPACITY)) : value); | ||
keys.add(key); | ||
if (backupKeyLocks == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment above is no longer true
/** | ||
* testing purpose only! | ||
*/ | ||
@Deprecated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should deprecate the implementation as well, and replace the javadoc with something like @deprecated Since 9.3, please use ...
public class PessimisticDeadlockTest extends MultipleCacheManagersTest { | ||
|
||
public void testDeadlock() throws Exception { | ||
assertEquals(4, managers().length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, maybe you should move createCacheManagers()
so it's obvious without having to assert :)
ConfigurationBuilder builder = getDefaultClusteredCacheConfig(CacheMode.DIST_SYNC, true); | ||
builder.transaction().lockingMode(LockingMode.PESSIMISTIC); | ||
builder.clustering().hash().consistentHashFactory(new ControlledConsistentHashFactory.Default(1, 2)) | ||
.numSegments(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't really need MagicKey
if you only have 1 segment. But this got me thinking, should this test have more scenarios to test merging locks into an existing transaction (if the originator was backup/non-owner and becomes primary owner).
Iterator<Address> iterator = writeOwners.iterator(); | ||
assertEquals(address(1), iterator.next()); | ||
assertEquals(address(2), iterator.next()); | ||
assertFalse(writeOwners.contains(address(0))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about assertEquals(Arrays.asList(address(1), address(2)), writeOwners)
?
@@ -59,7 +61,9 @@ | |||
private final AtomicReference<Set<Object>> lockedKeys = new AtomicReference<>(); | |||
|
|||
/** Holds all the locks for which the local node is a secondary data owner. */ | |||
private final AtomicReference<Set<Object>> backupKeyLocks = new AtomicReference<>(); | |||
@GuardedBy("this") | |||
private Map<Object, CompletableFuture<Void>> backupKeyLocks; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you modify the comment to say what these CompletableFutures actually represent?
if (entry instanceof RepeatableReadEntry) { | ||
return ((RepeatableReadEntry) entry).isRead(); | ||
} else { | ||
return false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When possible, I prefer keeping deprecated methods working as before instead of keeping only a skeleton for compilation reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry I didn't understand what you mean :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if the method still works, I don't see any reason to replace it with an empty method.
//clear the backup locks | ||
context.getCacheTransaction().cleanupBackupLocks(); | ||
keysToLock.removeAll(context.getLockedKeys()); | ||
} | ||
if (command instanceof LockControlCommand) { | ||
//the lock command is only issued if the transaction doesn't have the lock for the keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't know why we have to remove the backup locks here...
} | ||
//avoids the warning about synchronizing in a local variable. | ||
//and allows us to change the CacheTransaction internals without having to worry about it | ||
tx.forEachBackupLock(key -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not forEachLock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it isn't related to this PR (?) Do you want to replace all the getLockedKeys()
with a forEachLock()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll wait for https://issues.jboss.org/browse/ISPN-3927 then :)
* <p> | ||
* A {@link CompletableFuture} is created for each key and it is completed when the backup lock is release for that | ||
* key. A transaction, before acquiring the locks, must wait for all the backup locks (i.e. the {@link | ||
* CompletableFuture}) for all transaction created in the previous topology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking: is released, all transactions
* Minor deprecation cleanups
Integrated, thanks Pedro! |
https://issues.jboss.org/browse/ISPN-8533