ISPN-14115 Potential deadlock in index writer threads under heavy load #10292

fax4ever · 2022-09-07T11:40:35Z

https://issues.redhat.com/browse/ISPN-14115

Without the commit ac8ab84 the reproducer hangs running the script on my local machine:

#!/bin/bash

mvn clean install -pl query -am -DskipTests
mvn clean install -pl query

let me know if it is the same for you.

to express the idea that this is true if and only if we're replacing the type of the entity

At this stage we know that the remove and the add are performed on different indexes, so we can execute them concurrently

fax4ever · 2022-09-07T11:44:58Z

@yrodiere please review it when you have time

yrodiere

LGTM, see comments below.

yrodiere · 2022-09-07T12:00:20Z

query/src/main/java/org/infinispan/query/backend/QueryInterceptor.java

+            // We don't need to wait for a possible removeFromIndexes operation,
+            // since if it exists, the oldValue is UNKNOWN or replacedWithADifferentEntity is true,
+            // which implies that the delete and the add operations are related to different indexes


Interesting. Did you get confirmation that this was indeed the case for the person who reported the problem?

From my test and looking at the code, in particular see the commit 3c8c832, it seems so.
I asked the original issue reporter on the original issue Jira page.

So given this change it makes me think that the removeFromIndexes operation cannot complete successfully until the updateIndexes invocation has occurred. Can we not just invoke updateIndexes first and then call removeFromIndexes?

it makes me think that the removeFromIndexes operation cannot complete successfully until the updateIndexes invocation has occurred

I'm not sure why you think that?

The two operations are unrelated since they apply to two different indexes.
They literally affect two different files on the filesystem.

They can execute in any order, so I'm not sure how what you're suggesting would change anything....

It makes me think that because having the second index update wait on the first causes neither to complete. If they were indeed independent operations not interdependent upon each other it shouldn't matter what order you had them wait on each other at all. It just seems fishy to me.

Before the change the only operation submitted to the indexer is removeFromIndexes. The other operation never does, because removeFromIndexes never finished. If that hangs by itself, how would adding additional operations cause it to not hang? The only way is if the operations were interdependent. That is why I am saying something must be going on between the two operations. Either that or this change wouldn't fix the issue.

The thread dumps provided in the original issue support the theory that the problem has to do with the queue of operations, not the execution of operations themselves.

My theory (explained in the JIRA ticket) was that this happens:

Client thread: needs to delete an existing value of type A to replace it with a value of the same type

Client thread: submits delete some entity of type A to the queue for type A.

Background thread for type A: picks delete some entity of type A from the queue

Background thread for type A: executes delete some entity of type A

Background thread for type A: marks the future as complete

Background thread for type A: executes thenCompose

Background thread for type A: adds the update some entity of type A to the queue for type A... but the queue is full, because someone else added many other operations! (let's assume the server is under heavy load)

Background thread: blocks, waiting for a free slot in the queue. But this background thread is the only thread able to consume the queue!

Deadlock: the background thread is waiting for an operation that only itself can perform.

If that's actually the problem, then no, the order of operations is not the problem. What matters is that we're introducing an artificial dependency between the two operations via thenCompose.

That being said, you're right that there's something suspicious: if, like Fabio suggested, the two operations are related to two different entity types, then they should be submitted to two different queues (I just checked the Hibernate Search code: they really should). So my theory wouldn't hold if Fabio is right.

In that case, I suppose it would be a slightly more complicated scenario with two background threads:

Client thread 1: needs to delete an existing entry of type A to replace it with a value of type B

Client thread 1: submits delete some entity of type A to the queue for type A

Background thread for type A: picks delete some entity of type A from the queue for type A

Background thread for type A: starts executing delete some entity of type A

Client thread 2: needs to delete an existing entry of type B to replace it with a value of type A

Client thread 2: submits delete some entity of type B to the queue for type B

Background thread for type B: picks delete some entity of type B from the queue for type B

Background thread for type B: starts executing delete some entity of type B

Background thread for type A: finishes executing delete some entity of type A

Background thread for type A: marks the future as complete

Background thread for type A: executes thenCompose

Background thread for type A: adds the update some entity of type B to the queue for type B... but the queue is full, because someone else added many other operations! (let's assume the server is under heavy load)

Background thread for type A: blocks, waiting for a free slot in the queue for type B.

Background thread for type B: finishes executing delete some entity of type B

Background thread for type B: marks the future as complete

Background thread for type B: executes thenCompose

Background thread for type B: adds update some entity of type A to the queue for type A... but the queue is full, because someone else added many other operations! (let's assume the server is under heavy load)

Background thread for type B: blocks, waiting for a free slot in the queue for type A.

Deadlock: background thread for type B is waiting for an an operation that only the background thread for type A can perform, and vice-versa.

While complex, I believe this scenario is possible, and gets more likely as an Infinispan node gets hammered with many requests that change the type of values.

And the fix is simple: since operations applied to type A and B require different resources (different indexes), we don't need to use thenCompose, which introduces an artificial (and unnecessary) inter-dependency. We can just execute the operations in parallel, and then there's no chance of deadlock.

@fax4ever , you confirm that you weren't able to reproduce the problem when using only one type? And that your patch did fix the problem exhibited in your test:

@fax4ever , you confirm that you weren't able to reproduce the problem when using only one type? And that your patch did fix the problem exhibited in your test

Yes I do and yes it did it

yrodiere · 2022-09-07T12:01:01Z

query/src/test/java/org/infinispan/query/api/PutAllTest.java

@@ -159,6 +160,21 @@ public void testOverwriteIndexedValue() {
      StaticTestingErrorHandler.assertAllGood(cache);
   }

+   @Test
+   public void testOverwriteIndexedValue_heavyLoad() {


I didn't check this locally, so I'll trust that you checked this failed before your patch.

actually it does not fail, it hangs

fax4ever · 2022-09-07T12:18:49Z

LGTM, see comments below.

Thanks for the review @yrodiere!

tristantarrant · 2022-09-08T07:47:43Z

Merged, thanks

fax4ever · 2022-09-08T16:52:38Z

Thank you!

fax4ever added 3 commits September 7, 2022 13:07

ISPN-14115 Rename the should remove method

3c8c832

to express the idea that this is true if and only if we're replacing the type of the entity

ISPN-14115 Fix deadlock in index writer threads under heavy load

ac8ab84

At this stage we know that the remove and the add are performed on different indexes, so we can execute them concurrently

ISPN-14115 Reproduce deadlock in index writer threads under heavy load

67f7995

yrodiere approved these changes Sep 7, 2022

View reviewed changes

fax4ever mentioned this pull request Sep 7, 2022

[13.0.x] ISPN-14115 Potential deadlock in index writer threads under heavy load #10293

Merged

wburns added the Question label Sep 7, 2022

tristantarrant merged commit ec9a3a5 into infinispan:main Sep 8, 2022

fax4ever deleted the ISPN-14115 branch September 8, 2022 16:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISPN-14115 Potential deadlock in index writer threads under heavy load #10292

ISPN-14115 Potential deadlock in index writer threads under heavy load #10292

fax4ever commented Sep 7, 2022 •

edited

fax4ever commented Sep 7, 2022

yrodiere left a comment

yrodiere Sep 7, 2022

fax4ever Sep 7, 2022 •

edited

wburns Sep 7, 2022

yrodiere Sep 8, 2022

wburns Sep 8, 2022

wburns Sep 8, 2022

yrodiere Sep 9, 2022 •

edited

fax4ever Sep 9, 2022

yrodiere Sep 7, 2022

fax4ever Sep 7, 2022

fax4ever commented Sep 7, 2022

tristantarrant commented Sep 8, 2022

fax4ever commented Sep 8, 2022

ISPN-14115 Potential deadlock in index writer threads under heavy load #10292

ISPN-14115 Potential deadlock in index writer threads under heavy load #10292

Conversation

fax4ever commented Sep 7, 2022 • edited

fax4ever commented Sep 7, 2022

yrodiere left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fax4ever Sep 7, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yrodiere Sep 9, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fax4ever commented Sep 7, 2022

tristantarrant commented Sep 8, 2022

fax4ever commented Sep 8, 2022

fax4ever commented Sep 7, 2022 •

edited

fax4ever Sep 7, 2022 •

edited

yrodiere Sep 9, 2022 •

edited