Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock when executing entry processor within entry processor #3146

Open
torkeld opened this issue Aug 4, 2014 · 15 comments
Open

Deadlock when executing entry processor within entry processor #3146

torkeld opened this issue Aug 4, 2014 · 15 comments

Comments

@torkeld
Copy link

torkeld commented Aug 4, 2014

This issues is related to the use case described in #964.
When an entry processor on one map executes an entry processor on a related map in the same partition (using partition aware keys) there is a chance that a deadlock occurs when the backup processor is executing since that's a blocking operation that runs in the processor thread. This happens if two different entries are being processed on two different node, A and B, and the keys, K1 and K2 respectively, are mapped to the same thread. The processor on node A will execute the inner processor on key K1 (same partition -> local call) and its backup processor will then be executed on node B in the same thread as the entry processor is currently executing on key K2.
A reproducer can be found here: https://github.com/torkeld/hazelcast-issue-reproducer/blob/master/src/test/java/test/DeadlockTest.java
This reproducer will fail 50% of the times, when the two keys end up on different nodes (I wasn't able to find a way to deterministically make them run on different nodes).

The root cause of this seems to be that Hazelcast locks not only the partition, but a set of partitions that are mapped to the same thread, when executing an entry processor. This way two keys in different partitions (running on different nodes) can end up on the same thread when the backup processor is executed. Running backup processors in a separate thread pool would probably solve the issue, however a better solution would be to send all backup processors in batch after the processor has finished executing and released the thread.

@mdogan
Copy link
Contributor

mdogan commented Aug 14, 2014

Currently accessing multiple entries in an EntryProcessor is not officially supported. Although it's not forbidden to access entries in the same partition, it can cause issues like this one. We are planning to introduce new type of processor to support accessing multiple entries and data structures within the same partition: #3259

@mdogan mdogan added this to the 3.3 milestone Aug 14, 2014
@mdogan mdogan self-assigned this Aug 14, 2014
@ChristerF
Copy link

@mdogan IMHO, in order to fix this you don't need a PartitionProcessor, you need to make a more granular lock structure (per key) instead for partitions. The current one thread per partition is hampering performance unnecessarily IMHO.

@mdogan
Copy link
Contributor

mdogan commented Aug 18, 2014

"a more granular lock structure", even a lock free concurrent model per key or per data structure per partition is definitely required for performance reasons.

But using an EntryProcessor for multiple entries and/or data structures is a very bad idea, very bad api design. I think we definitely need a PartitionProcessor or a similar, more intuitive api to solve this issue.

@ChristerF
Copy link

I'd consider this a bug rather than a request for a new API. If you would agree this is a bug it is fixed by one of the following:

  1. Have a dedicated thread per partition. (One thread dedicated to more than one partition is the real issue here)
  2. Make a lock per partition and have a thread pool working on partitions.
  3. Make a per key lock and have a thread pool working on keys simultaneously regardless of partition (would make partition transfers more complicated but probably worth it)

A PartitionProcessor might solve some problems, but solving this particular problem efficiently is a different problem: Google "Partition-Level Operations" for what I mean.

@mdogan
Copy link
Contributor

mdogan commented Aug 19, 2014

This is a bug, because a deadlock happens. But note that EntryProcessor is meant to be process a single entry per call, so processing multiple entries and data structure in an EntryProcessor is not supported and we should forbid it internally.

I know current threading model has problems, we tried different models during development of 3.x and we are planning to develop/introduce a better model in future releases, but not in 3.3 or 3.4.

I understand that your threading model suggestions also fix this issue, but in current situation what @torkeld tries to achieve is not supported and (AFAIK) not documented anywhere.

We are looking at the issue from different perspectives. You see this as a threading problem, I agree this is also a threading problem. But since what he's trying to do is to process multiple data structures atomically, main solution here's introduce a better/working/useful api, not solve problems of a hacky usage. I know coherence users are used to be use EntryProcessors like this, but still I find this counter-intuitive. Also I don't say we don't solve threading model issues but that's something else.

@ChristerF
Copy link

@mdogan Well, I was working with the assumption that #964 was confirmed as feasible. We have explained what we were trying to do to all Hazelcast representatives we have met over the past year and have not received this feedback before. This unit test seems to indicate that this is intentional too: https://github.com/hazelcast/hazelcast/blob/master/hazelcast/src/test/java/com/hazelcast/map/EntryProcessorTest.java

It would be great if what I want could be achieved with a nice API, It would be interesting to see what you would propose? These would be my thoughts on requirements for the API:

  1. While in an EP or equivalent;
  2. Lock other related keys in the same partition. Attempts to lock keys from other partitions should fail. Failure to acquire the lock will block until the lock is available.
  3. Modify the locked values.
  4. Store the modifications as one set of modifications. One backup for all modifications.

As an alternative to acquiring locks in step 2 and 3, EP:s for those keys could be executed with their implicit lock.

Any unrelated key must be fair game for any other concurrently executing EP.

@mdogan
Copy link
Contributor

mdogan commented Aug 19, 2014

Yes I was naively thinking that allowing operations falling in the same partition can be allowed to implement a multi-EntryProcessor and I added that test too. But overtime we saw that this is wrong and we need introduce a nice api for this. Hence we never officially said it's supported or allowed.

@ChristerF
Copy link

Nevertheless, we would really appreciate a "short term" fix or workaround for this behavior. How can I tune the threadcount to be one thread per partition?

@ChristerF
Copy link

@mdogan a related question: If I used the approach outlined in #3003 would backups be sent for potentially modified records?

@pveentjer
Copy link
Contributor

Hi @ChristerF, if you really want to have 1 thread per partitions (so 271 partitions -> 271 threads), you can have a look at the 'hazelcast.operation.thread.count' property in the GroupProperties. If you set that to the same as the partition count, then each partition will have its own thread.

I think what eventually will happen is that we use a different threadpool for entry processors; you don't want them to hog the operation thread. Any form of alien functionality should not be allowed to be executed on the partition-thread.

@pveentjer
Copy link
Contributor

@mdogan what should we do with this issue? I think it requires some fundamental rearchitecting and wont fit in the 3.3 release.

@mdogan
Copy link
Contributor

mdogan commented Aug 26, 2014

Yeah right, I forgot to postpone milestone...

@mdogan mdogan modified the milestones: 3.3, 3.4 Aug 26, 2014
@mdogan mdogan removed their assignment Aug 26, 2014
@ChristerF
Copy link

@pveentjer The fundamental design flaw is that Hazelcast is treating Partition operations and map operations as the same type of operations. To achieve better performance you need to have more granular locking. Even the current assignment of thread to partition is suboptimal to having a lock per partition and a threadpool executing operations. Lots of room for improvement here...

Having a separate pool for EPs is the wrong way of thinking about it IMHO. A separate pool for any map operation is what you need, that + per key locking might make Hazelcast competitive from a performance standpoint.

@pveentjer
Copy link
Contributor

Hi @ChristerF

I agree with you to a certain level. I do not believe in fine grained locked in combination with high performance. You can get better fairness if different threads can process a partition over time; but you will get latency and lower throughput due to additional synchronization overhead. Imagine you use a regular executor with a blocking queue; then there will be a lot of contention on the head and tail of that queue. So I have a hard time believing that this solution will provide a better performance (whatever that may be). And a lower latency is something that needs to be benchmarked since you will add a lot more access conflicts. The advantage of the current approach is that there are no central components and therefore it will scale a lot better than a central executor (with a contended queue).

I'm even willing to day that we are not even going far enough: by providing exclusive access within a partition, you can get rid of ALL concurrency control during the execution of an operation and you can do a huge amount of work in a single thread without wasting time on dealing with synchronization primitives (even something as 'simple' as a cas). So instead of taking the speed out of a cpu by locks/cas etc.. it can do everything full speed. I would like to get rid of the operation queue completely and replace it by a ringbuffer; queue's suck due to contention. Perhaps we'll be able to add this in 3.4 when backpressure is added. I'm sure we'll move towards a disruptor approach.

The per key locking is an idea but it has drawbacks. Certain operations like partition migration need exclusive access in a partition. Your entry processor could run in parallel with a get, so they can be shared. So you would end up with some kind of read/write lock and this can cause a lot of contention. Personally I believe that offloading to a dedicated threadpool is the best thing; we'll apply the locking behind the screen. But most of the operations are just basic read/writes.. so lets optimize for them...

@pveentjer
Copy link
Contributor

Currently the main factor that is determining the performance is the sync nature of calls. We can't get the buffers full so threads are stalling all the time. In a POC (Project-X) I wrote a month ago I'm doing millions of asynchronous map.put a second on a 4 node cluster and I'm sure I can get it above the 10M ops/second. It is fast because the buffers are full, so there is a lot of batching that amortizes IO/concurrency-overhead, and there is no concurrency control apart from setting the request on the connection to be send later. Also there is almost no litter.

In sync mode however, the POC is roughly 50% faster than regular HZ.... just because of the sync calls.. And with sync mode I'm stuck at roughly a million ops/second.

The point I'm trying to make: if you can prevent doing synchronization, you can have huge throughput..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants