try to avoid deadlock #280

plucury · 2016-09-19T12:03:41Z

For avoid possible deadlock, I just add a _OrderedRLock object which will hold its inner lock when you try to acquire it. That's to say if you hold _write_lock you hold _read_lock too or if you hold _lock you hold _write_lock and _read_lock too.

I'm not sure if this is overdoing it a little bit. What do you think about this? @Lukasa

Lukasa · 2016-09-20T08:47:02Z

test/test_integration.py

+        except TimeoutError:
+            assert False
+
+        self.tear_down()


I'd love to see a comment on this test that points out that it's fundamentally probabilistic: it'll only fail transiently, and that we need to accept that. That will help us take it more seriously if it does transiently fail.

Lukasa · 2016-09-20T08:51:30Z

hyper/http20/connection.py

        self._read_lock = threading.RLock()
+        self._write_lock = _OrderedRLock(self._read_lock)
+        self._lock = _OrderedRLock(self._write_lock)


I think this might be backwards. The rule is that _read_lock requires that you hold _write_lock and acquire _write_lock first, but this code gets it backwards. The same is true for _lock, which always needs to be held first.

This means that I think the correct declaration is:

self._lock = threading.RLock() self._write_lock = _OrderedRLock(self._lock) self._read_lock = _OrderedRLock(self._write_lock)

Lukasa · 2016-09-20T08:54:05Z

hyper/http20/connection.py

        self._read_lock = threading.RLock()
+        self._write_lock = _OrderedRLock(self._read_lock)
+        self._lock = _OrderedRLock(self._write_lock)


There's a bigger problem here, which we need to address. That's the fact that this necessarily means that now to take the _read_lock you have to take all the locks: that means that our outer lock, which normally exists to act as the minimum level of contention, is now impossible to hold when someone is holding the write lock. That rather defeats the point of having this level of lock granularity.

I think we need another approach here. =(

Lukasa

Changes requested inline, but the biggest concern I have is that this destroys our lock granularity.

plucury · 2016-09-20T12:43:47Z

@Lukasa Yes, granularity, that's what I concern too. I declared locks just like what you suggested above when I start this PR, then I thought maybe it can be backward.

lock aquired	forward	backward
read_lock	hold all three lock	hold read_lock only
write_lock	hold both write_lock and lock	hold write_lock and read_lock
lock	hold lock only	hold all three lock

According to above table, If we only consider about the final state of acquiring operation, you will find the backward declaration is more approach the original idea of these locks.

However, I think the key point is that we can't know if it needs acquire its outer lock when we try to acquiring a lock. For example, when a process acquiring the read_lock to read socket, it doesn't know if it is necessary to acquiring the write_lock at that time since it depends on what it read. Thus we can't always avoid acquiring read_lock before write_lock and maybe it's a good idea to make sure a process can acquire read_lock before it acquiring write_lock?

Lukasa · 2016-09-20T12:49:29Z

Annoyingly, there are sort of four locks here which need synchronising. Frankly, at a certain point we start getting altogether quite tricky. @plucury are you proposing that the lock inversion shown in this patch is sufficient?

plucury · 2016-09-20T13:02:09Z

@Lukasa Yes, I think we can't ensure these locks always be acquired in order. But we can avoid deadlock even they were acquired in wrong order.

Lukasa · 2016-09-21T09:49:24Z

Hrm, I feel like the lock inversion here is an awkward middle ground between having fine-grained locks and just choosing to have a single lock that protects the entire connection object. If we're going to have the complexity of managing a three-level lock hierarchy, I think we should do our best to avoid needing to acquire all three locks any time we want to read. On the other hand, if we do require that then we should consider whether we just want each connection to be locked behind a single lock, which has the advantage of giving us much stronger guarantees on behaviour.

Lukasa · 2016-09-21T09:49:38Z

Oh, and @tbetbetbe may well have opinions here, I'll tag him in!

plucury · 2016-10-14T04:27:29Z

@Lukasa I just have another try. Instead of using specific lock, I'm trying to keep acquiring locks in order this time. What do you think about it?

Lukasa · 2016-10-14T07:57:21Z

hyper/http20/connection.py

@@ -266,8 +266,9 @@ def request(self, method, url, body=None, headers=None):
        # If threads interleave these operations, it could result in messages
        # being sent in the wrong order, which can lead to the out-of-order
        # messages with lower stream IDs being closed prematurely.
+        stream_id = self.putrequest(method, url)
+        self.connect()


Sadly, this won't work either, as noted in the comment above.

putrequest obtains a stream ID, but doesn't send a HEADERS frame. That means that hyper thinks the stream ID is used, but the remote peer doesn't think that yet.

This whole problem is sufficiently troublesome that I suspect it reflects a more profound problem with the way this library is structured. It is extremely not amenable to being thread-safe except at an extremely coarse level.

I wonder if we need to investigate rewriting this library with threads. =(

@Lukasa Ah, I see. That's true. So how about simply use the single lock model at the moment?

I think the single lock model is likely to be the most usable approach at this time.

plucury · 2016-11-18T02:52:10Z

@Lukasa I just transfer connection to single lock model. But I'm not sure why it breaks tests. Would you please have a look? Thanks a lot!

nateprewitt · 2016-11-18T04:30:00Z

@plucury this is a bug in hyper-h2==2.5.0 (#291) which has been patched (python-hyper/h2#382) but isn't in a released version. I don't believe that it's related to your changes.

plucury · 2016-11-18T06:19:37Z

Ah, I see. Thanks for your information @nateprewitt .

vinnyspb · 2017-05-30T07:28:03Z

Hi guys,

Any reason not to merge this pull request with single lock?
I can confirm that deadlock issue is absolutely real and we've hit this problem in production.

Also I can confirm that the single lock fixes the problem.

Please let me know if I could help somehow.

Lukasa · 2017-05-30T08:06:14Z

@vinnyspb Right now the reason is "because it breaks the tests" =)

plucury · 2017-06-10T02:51:20Z

@Lukasa Just make it up to date. Please review again.

Lukasa

Cool, LGTM.

Lukasa reviewed Sep 20, 2016

View reviewed changes

Lukasa suggested changes Sep 20, 2016

View reviewed changes

Lukasa reviewed Oct 14, 2016

View reviewed changes

plucury force-pushed the development branch 2 times, most recently from 50bb3d2 to de8a82d Compare November 18, 2016 02:38

plucury force-pushed the development branch 2 times, most recently from de8a82d to f5b69b3 Compare June 10, 2017 02:05

Using single lock model in http20 connection

0e2fca8

plucury force-pushed the development branch from f5b69b3 to 0e2fca8 Compare June 10, 2017 02:40

Lukasa approved these changes Jun 10, 2017

View reviewed changes

Lukasa merged commit 23a1555 into python-hyper:development Jun 10, 2017

try to avoid deadlock #280

try to avoid deadlock #280

Uh oh!

Conversation

plucury commented Sep 19, 2016

Uh oh!

Lukasa Sep 20, 2016

Choose a reason for hiding this comment

Uh oh!

Lukasa Sep 20, 2016

Choose a reason for hiding this comment

Uh oh!

Lukasa Sep 20, 2016

Choose a reason for hiding this comment

Uh oh!

Lukasa left a comment

Choose a reason for hiding this comment

Uh oh!

plucury commented Sep 20, 2016

Uh oh!

Lukasa commented Sep 20, 2016

Uh oh!

plucury commented Sep 20, 2016

Uh oh!

Lukasa commented Sep 21, 2016

Uh oh!

Lukasa commented Sep 21, 2016

Uh oh!

plucury commented Oct 14, 2016

Uh oh!

Lukasa Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

plucury Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

Lukasa Oct 14, 2016

Choose a reason for hiding this comment

Uh oh!

plucury commented Nov 18, 2016

Uh oh!

nateprewitt commented Nov 18, 2016

Uh oh!

plucury commented Nov 18, 2016

Uh oh!

vinnyspb commented May 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lukasa commented May 30, 2017

Uh oh!

plucury commented Jun 10, 2017

Uh oh!

Lukasa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vinnyspb commented May 30, 2017 •

edited

Loading