Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[lock] [entry-processor] Transaction couldn't obtain lock for the key #9374

Closed
dmitrymz opened this issue Dec 6, 2016 · 20 comments
Closed

[lock] [entry-processor] Transaction couldn't obtain lock for the key #9374

dmitrymz opened this issue Dec 6, 2016 · 20 comments

Comments

@dmitrymz
Copy link

@dmitrymz dmitrymz commented Dec 6, 2016

we are sporadically experiencing TransactionException: Transaction couldn't obtain lock for the key, while calling TransactionalMap.getForUpdate().

What is sad that the cluster itself looks normal, no exceptions in the logs, and on the client side we have no idea how to handle this exception. Is this partition corruption, data issue...
How as a client we can rely on datastore throwing such generic exceptions...

could you please explain what conditions are leading to this exception? what should be done / best practices dealing with transactions to avoid this situation? Should we retry the request?
how else we can handle this exception?

from the source of TransactionalMapProxySupport:270

            versionedValue = future.get();
            if (versionedValue == null) {
                throw new TransactionException("Transaction couldn't obtain lock for the key: " + toObjectIfNeeded(key));
            }

what does this check for null mean? was the value deleted while trying to obtain the lock?

Caused by: com.hazelcast.transaction.TransactionException: Transaction couldn't obtain lock for the key: HeapData{type=-11, hashCode=-1540458485, partitionHash=-1540458485, totalSize=48, dataSize=40, heapCost=68}
	at com.hazelcast.map.impl.tx.TransactionalMapProxySupport.lockAndGet(TransactionalMapProxySupport.java:270) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.map.impl.tx.TransactionalMapProxySupport.getForUpdateInternal(TransactionalMapProxySupport.java:117) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.map.impl.tx.TransactionalMapProxy.getForUpdate(TransactionalMapProxy.java:124) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.client.impl.protocol.task.transactionalmap.TransactionalMapGetForUpdateMessageTask.innerCall(TransactionalMapGetForUpdateMessageTask.java:43) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.client.impl.protocol.task.AbstractTransactionalMessageTask.call(AbstractTransactionalMessageTask.java:34) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.client.impl.protocol.task.AbstractCallableMessageTask.processMessage(AbstractCallableMessageTask.java:34) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.initializeAndProcessMessage(AbstractMessageTask.java:118) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.client.impl.protocol.task.AbstractMessageTask.run(AbstractMessageTask.java:98) ~[hazelcast-3.6.5.jar:3.6.5]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_101]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-3.6.5.jar:3.6.5]
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[hazelcast-3.6.5.jar:3.6.5]
	at ------ End remote and begin local stack-trace ------.(Unknown Source) ~[na:na]
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.resolveResponse(ClientInvocationFuture.java:128) ~[hazelcast-client-3.6.5.jar:3.6.5]
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.get(ClientInvocationFuture.java:95) ~[hazelcast-client-3.6.5.jar:3.6.5]
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.get(ClientInvocationFuture.java:74) ~[hazelcast-client-3.6.5.jar:3.6.5]
	at com.hazelcast.client.spi.impl.ClientInvocationFuture.get(ClientInvocationFuture.java:37) ~[hazelcast-client-3.6.5.jar:3.6.5]
	at com.hazelcast.client.proxy.txn.ClientTxnProxy.invoke(ClientTxnProxy.java:48) ~[hazelcast-client-3.6.5.jar:3.6.5]
	at com.hazelcast.client.proxy.txn.ClientTxnMapProxy.getForUpdate(ClientTxnMapProxy.java:81) ~[hazelcast-client-3.6.5.jar:3.6.5]
@jerrinot jerrinot added this to the 3.8 milestone Dec 6, 2016
@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 6, 2016

Hi,

This exception means inside the transaction your are trying to lock a key which is already locked by other transactions/threads. If the key is already locked by other resources then getForUpdate will block for the duration equal to transaction-timeout.

Do you lock (calling getForUpdate or any mutating operation on TransctionalMap) multiple keys in a transaction ? If true, I suggest you to order the keys otherwise deadlocks may occur and you get above exception.

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Dec 6, 2016

1 "If the key is already locked by other resources then getForUpdate will block for the duration equal to transaction-timeout." - this is expected behavior. the question is what happens after transaction timeout expires? shouldn't we get TransactionTimedOutException? since I am not getting timeout exception, then this has to be the case 2

2 we are only using getForUpdate() call in our clients to do guaranteed lock.
and yes, we do iterate over multiple keys (unfortunately as there is no bulk operation like getForUpdateAll(keys)). are you suggesting we should process the keys one by transaction?

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 6, 2016

Yes, you are correct about the exception type. We should wrap the exception to TransactionTimedOutException.

I suggest you to sort the keys before calling getForUpdate. This way all transactions will request to lock the key in the same order, so no deadlock

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Dec 7, 2016

does it mean we can only have one call getForUpdate() per transaction or we can have multiple calls for getForUpdate() following by put, ie

Set<> keys = new HashSet<>();
for (Key k : keys) {
 getForUpdate(k);
 modify k;
 put(k);
}

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 7, 2016

you can call getForUpdate multiple times in a transaction, that is not the problem, it is the order of the keys that will cause problem. let's say you have 2 transactions

for (Key k : keys) {
 getForUpdate(k);
...
}

For the first transaction this is the order of the keys: key1,key2,key3
and for the second transaction, the order: key2,key1,key3
Both transactions call getForUpdate for first keys and obtain the corresponding locks
Now for the second round they will not be able to get the locks, each one has the lock for the other.
If you had them sorted with the same order, on of them would get the locks, do the job and unlocks(commit time). Then the other one would do the same.

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Dec 8, 2016

that makes sense.

the last question about merge.
what happens if tx1 gets the value for the key (just basic get without locking),
and tx2 gets the value for the key. tx1 commits. and tx2 commits next.

what happens afterwards? how hazelcast knows what fields to merge and how conflicts resolved?

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 8, 2016

I assume you mean tx1 gets the value and changes it then put it back. In this case some data will be lost. For example both transactions get the value, they increment it and put it back. While putting it hazelcast locks the value, and while committing it does the actual put (and unlock). So first transaction will put value+1 and the other one will block, the second transaction will put value+ too. You've lost an increment, that's why we've introduced getForUpdate

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Dec 8, 2016

I was afraid of that...
in this case, what is the purpose of having transactions without always locking the data on get?
if there is no way for client to know if value underneath has been changed (kind of optimistic locking)

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 9, 2016

it depends on your logic, for example a transaction may just remove some items, or just add them. Another case, put to a queue and map in the same transaction. These scenarios does not need a lock.

@jerrinot
Copy link
Contributor

@jerrinot jerrinot commented Dec 13, 2016

@gurbuzali: is any work needed here? the exception should be wrapped I presume?

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 13, 2016

yes, exception should be wrapped

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Dec 13, 2016

Thanks for explanation. It would be helpful to have them in the docs and in Mastering Hazelcast handbook as it's not obvious.

also, another strange fact: looks like getForUpdate() does not prevent entry processor to update same key. My assumption that the lock should work same way regardless of access, but looks like that's not the case. is this by design?

@gurbuzali
Copy link
Member

@gurbuzali gurbuzali commented Dec 13, 2016

do you have a reproducer ? entry-processor should't be able to update if the key is locked

@bokken
Copy link

@bokken bokken commented Dec 18, 2016

@dmitrymz, depending on your keys, you could address this by sorting the keys before attempting to obtain locks. This way parallel threads always attempt to obtain locks in the same order and do not deadlock each other.

mmedenjak added a commit to mmedenjak/hazelcast that referenced this issue Jan 17, 2017
Previously TransactionTimeout was thrown, now throw TransactionTimedOutException. Update the javadoc to point out that this exception can be thrown from the methods that do mutation.

Affected issue : hazelcast#9374
@mmedenjak mmedenjak self-assigned this Jan 19, 2017
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Jan 20, 2017

The thrown exception has been changed to TransactionTimedOutException in 3.8
@dmitrymz do you have any details on the entry processor changing a locked key or some code to reproduce it? This should not be happening.

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Jan 21, 2017

I think it was by using getForUpdate() to lock the record, and then not to release lock properly. But I don't have a reproducer

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Feb 6, 2017

@dmitrymz I have tried the scenario that you have described (running the entry processor for a locked key) and have studied the code but unfortunately I haven't been able to reproduce the issue. It has possibly been fixed by some other change in the code or there is some detail that I haven't been able to notice. If you are still getting the same behaviour, please write details about it here or add a reproducer. Thank you.

@jerrinot jerrinot modified the milestones: 3.9, 3.8 Feb 6, 2017
@mmedenjak mmedenjak changed the title Transaction couldn't obtain lock for the key [lock] Transaction couldn't obtain lock for the key Jul 11, 2017
@mmedenjak mmedenjak changed the title [lock] Transaction couldn't obtain lock for the key [lock] [entry-processor] Transaction couldn't obtain lock for the key Jul 11, 2017
@jerrinot jerrinot modified the milestones: 3.10, 3.9 Jul 21, 2017
@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Aug 1, 2017

@dmitrymz we have detected that the issue is when you are running an entry processor for multiple keys. These entry processors are not blocked by locked keys. Since 3.9 there is a workaround. We have added a LockAware interface and the IMap interface describes it's usage:

  • There are however following methods that run the EntryProcessor on more than one entry. These operations are not lock-aware.
  • The EntryProcessor will process the entries no matter if they are locked or not.
  • The user may however check if an entry is locked by casting the {@link java.util.Map.Entry} to ]
  • {@link LockAware} and invoking the {@link LockAware#isLocked()} method.

Does this work for you? Can we close the issue? I am afraid further improvements will need to wait at least until 3.10.

@dmitrymz
Copy link
Author

@dmitrymz dmitrymz commented Aug 1, 2017

I solved this issue in a different way - completely abandoned using transactions. There were so many issues with locking so EntryProcessor seems to be the proper way. I am glad that hazelcast started investing in improving EntryProcessor infrastructure such as offloadable, read-only interfaces; and of course dynamic code loading. Would be great to add more flexibility for IMap as well (I opened another request for that).

@mmedenjak
Copy link
Contributor

@mmedenjak mmedenjak commented Aug 2, 2017

Ok great, I'm closing this one then.
I think this LockAware is a good start, the alternative would be parking the operation and rescheduling it once the locked key was unlocked, or collecting all locked keys and making new entry processor operations for those but all of that introduces much more complexity. Maybe we add something like this in a future release.
As for the Offloadable and ReadOnly, I think they are great interfaces and we would like to see them included in other parts of hazelcast, not just entry processors or map. But then again, it's a process, we will need time to get there...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants