-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock when executing entry processor within entry processor #3146
Comments
Currently accessing multiple entries in an |
@mdogan IMHO, in order to fix this you don't need a PartitionProcessor, you need to make a more granular lock structure (per key) instead for partitions. The current one thread per partition is hampering performance unnecessarily IMHO. |
"a more granular lock structure", even a lock free concurrent model per key or per data structure per partition is definitely required for performance reasons. But using an EntryProcessor for multiple entries and/or data structures is a very bad idea, very bad api design. I think we definitely need a PartitionProcessor or a similar, more intuitive api to solve this issue. |
I'd consider this a bug rather than a request for a new API. If you would agree this is a bug it is fixed by one of the following:
A PartitionProcessor might solve some problems, but solving this particular problem efficiently is a different problem: Google "Partition-Level Operations" for what I mean. |
This is a bug, because a deadlock happens. But note that I know current threading model has problems, we tried different models during development of 3.x and we are planning to develop/introduce a better model in future releases, but not in 3.3 or 3.4. I understand that your threading model suggestions also fix this issue, but in current situation what @torkeld tries to achieve is not supported and (AFAIK) not documented anywhere. We are looking at the issue from different perspectives. You see this as a threading problem, I agree this is also a threading problem. But since what he's trying to do is to process multiple data structures atomically, main solution here's introduce a better/working/useful api, not solve problems of a hacky usage. I know coherence users are used to be use |
@mdogan Well, I was working with the assumption that #964 was confirmed as feasible. We have explained what we were trying to do to all Hazelcast representatives we have met over the past year and have not received this feedback before. This unit test seems to indicate that this is intentional too: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/test/java/com/hazelcast/map/EntryProcessorTest.java It would be great if what I want could be achieved with a nice API, It would be interesting to see what you would propose? These would be my thoughts on requirements for the API:
As an alternative to acquiring locks in step 2 and 3, EP:s for those keys could be executed with their implicit lock. Any unrelated key must be fair game for any other concurrently executing EP. |
Yes I was naively thinking that allowing operations falling in the same partition can be allowed to implement a multi- |
Nevertheless, we would really appreciate a "short term" fix or workaround for this behavior. How can I tune the threadcount to be one thread per partition? |
Hi @ChristerF, if you really want to have 1 thread per partitions (so 271 partitions -> 271 threads), you can have a look at the 'hazelcast.operation.thread.count' property in the GroupProperties. If you set that to the same as the partition count, then each partition will have its own thread. I think what eventually will happen is that we use a different threadpool for entry processors; you don't want them to hog the operation thread. Any form of alien functionality should not be allowed to be executed on the partition-thread. |
@mdogan what should we do with this issue? I think it requires some fundamental rearchitecting and wont fit in the 3.3 release. |
Yeah right, I forgot to postpone milestone... |
@pveentjer The fundamental design flaw is that Hazelcast is treating Partition operations and map operations as the same type of operations. To achieve better performance you need to have more granular locking. Even the current assignment of thread to partition is suboptimal to having a lock per partition and a threadpool executing operations. Lots of room for improvement here... Having a separate pool for EPs is the wrong way of thinking about it IMHO. A separate pool for any map operation is what you need, that + per key locking might make Hazelcast competitive from a performance standpoint. |
Hi @ChristerF I agree with you to a certain level. I do not believe in fine grained locked in combination with high performance. You can get better fairness if different threads can process a partition over time; but you will get latency and lower throughput due to additional synchronization overhead. Imagine you use a regular executor with a blocking queue; then there will be a lot of contention on the head and tail of that queue. So I have a hard time believing that this solution will provide a better performance (whatever that may be). And a lower latency is something that needs to be benchmarked since you will add a lot more access conflicts. The advantage of the current approach is that there are no central components and therefore it will scale a lot better than a central executor (with a contended queue). I'm even willing to day that we are not even going far enough: by providing exclusive access within a partition, you can get rid of ALL concurrency control during the execution of an operation and you can do a huge amount of work in a single thread without wasting time on dealing with synchronization primitives (even something as 'simple' as a cas). So instead of taking the speed out of a cpu by locks/cas etc.. it can do everything full speed. I would like to get rid of the operation queue completely and replace it by a ringbuffer; queue's suck due to contention. Perhaps we'll be able to add this in 3.4 when backpressure is added. I'm sure we'll move towards a disruptor approach. The per key locking is an idea but it has drawbacks. Certain operations like partition migration need exclusive access in a partition. Your entry processor could run in parallel with a get, so they can be shared. So you would end up with some kind of read/write lock and this can cause a lot of contention. Personally I believe that offloading to a dedicated threadpool is the best thing; we'll apply the locking behind the screen. But most of the operations are just basic read/writes.. so lets optimize for them... |
Currently the main factor that is determining the performance is the sync nature of calls. We can't get the buffers full so threads are stalling all the time. In a POC (Project-X) I wrote a month ago I'm doing millions of asynchronous map.put a second on a 4 node cluster and I'm sure I can get it above the 10M ops/second. It is fast because the buffers are full, so there is a lot of batching that amortizes IO/concurrency-overhead, and there is no concurrency control apart from setting the request on the connection to be send later. Also there is almost no litter. In sync mode however, the POC is roughly 50% faster than regular HZ.... just because of the sync calls.. And with sync mode I'm stuck at roughly a million ops/second. The point I'm trying to make: if you can prevent doing synchronization, you can have huge throughput.. |
This issues is related to the use case described in #964.
When an entry processor on one map executes an entry processor on a related map in the same partition (using partition aware keys) there is a chance that a deadlock occurs when the backup processor is executing since that's a blocking operation that runs in the processor thread. This happens if two different entries are being processed on two different node, A and B, and the keys, K1 and K2 respectively, are mapped to the same thread. The processor on node A will execute the inner processor on key K1 (same partition -> local call) and its backup processor will then be executed on node B in the same thread as the entry processor is currently executing on key K2.
A reproducer can be found here: https://github.com/torkeld/hazelcast-issue-reproducer/blob/master/src/test/java/test/DeadlockTest.java
This reproducer will fail 50% of the times, when the two keys end up on different nodes (I wasn't able to find a way to deterministically make them run on different nodes).
The root cause of this seems to be that Hazelcast locks not only the partition, but a set of partitions that are mapped to the same thread, when executing an entry processor. This way two keys in different partitions (running on different nodes) can end up on the same thread when the backup processor is executed. Running backup processors in a separate thread pool would probably solve the issue, however a better solution would be to send all backup processors in batch after the processor has finished executing and released the thread.
The text was updated successfully, but these errors were encountered: