-
Notifications
You must be signed in to change notification settings - Fork 195
try to avoid deadlock #280
Conversation
except TimeoutError: | ||
assert False | ||
|
||
self.tear_down() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to see a comment on this test that points out that it's fundamentally probabilistic: it'll only fail transiently, and that we need to accept that. That will help us take it more seriously if it does transiently fail.
hyper/http20/connection.py
Outdated
self._read_lock = threading.RLock() | ||
self._write_lock = _OrderedRLock(self._read_lock) | ||
self._lock = _OrderedRLock(self._write_lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be backwards. The rule is that _read_lock
requires that you hold _write_lock
and acquire _write_lock
first, but this code gets it backwards. The same is true for _lock
, which always needs to be held first.
This means that I think the correct declaration is:
self._lock = threading.RLock()
self._write_lock = _OrderedRLock(self._lock)
self._read_lock = _OrderedRLock(self._write_lock)
hyper/http20/connection.py
Outdated
self._read_lock = threading.RLock() | ||
self._write_lock = _OrderedRLock(self._read_lock) | ||
self._lock = _OrderedRLock(self._write_lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a bigger problem here, which we need to address. That's the fact that this necessarily means that now to take the _read_lock
you have to take all the locks: that means that our outer lock, which normally exists to act as the minimum level of contention, is now impossible to hold when someone is holding the write lock. That rather defeats the point of having this level of lock granularity.
I think we need another approach here. =(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested inline, but the biggest concern I have is that this destroys our lock granularity.
@Lukasa Yes, granularity, that's what I concern too. I declared locks just like what you suggested above when I start this PR, then I thought maybe it can be backward.
According to above table, If we only consider about the final state of acquiring operation, you will find the backward declaration is more approach the original idea of these locks. However, I think the key point is that we can't know if it needs acquire its outer lock when we try to acquiring a lock. For example, when a process acquiring the read_lock to read socket, it doesn't know if it is necessary to acquiring the write_lock at that time since it depends on what it read. Thus we can't always avoid acquiring read_lock before write_lock and maybe it's a good idea to make sure a process can acquire read_lock before it acquiring write_lock? |
Annoyingly, there are sort of four locks here which need synchronising. Frankly, at a certain point we start getting altogether quite tricky. @plucury are you proposing that the lock inversion shown in this patch is sufficient? |
@Lukasa Yes, I think we can't ensure these locks always be acquired in order. But we can avoid deadlock even they were acquired in wrong order. |
Hrm, I feel like the lock inversion here is an awkward middle ground between having fine-grained locks and just choosing to have a single lock that protects the entire connection object. If we're going to have the complexity of managing a three-level lock hierarchy, I think we should do our best to avoid needing to acquire all three locks any time we want to read. On the other hand, if we do require that then we should consider whether we just want each connection to be locked behind a single lock, which has the advantage of giving us much stronger guarantees on behaviour. |
Oh, and @tbetbetbe may well have opinions here, I'll tag him in! |
@Lukasa I just have another try. Instead of using specific lock, I'm trying to keep acquiring locks in order this time. What do you think about it? |
hyper/http20/connection.py
Outdated
@@ -266,8 +266,9 @@ def request(self, method, url, body=None, headers=None): | |||
# If threads interleave these operations, it could result in messages | |||
# being sent in the wrong order, which can lead to the out-of-order | |||
# messages with lower stream IDs being closed prematurely. | |||
stream_id = self.putrequest(method, url) | |||
self.connect() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly, this won't work either, as noted in the comment above.
putrequest
obtains a stream ID, but doesn't send a HEADERS frame. That means that hyper thinks the stream ID is used, but the remote peer doesn't think that yet.
This whole problem is sufficiently troublesome that I suspect it reflects a more profound problem with the way this library is structured. It is extremely not amenable to being thread-safe except at an extremely coarse level.
I wonder if we need to investigate rewriting this library with threads. =(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lukasa Ah, I see. That's true. So how about simply use the single lock model at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the single lock model is likely to be the most usable approach at this time.
50bb3d2
to
de8a82d
Compare
@Lukasa I just transfer connection to single lock model. But I'm not sure why it breaks tests. Would you please have a look? Thanks a lot! |
@plucury this is a bug in hyper-h2==2.5.0 (#291) which has been patched (python-hyper/h2#382) but isn't in a released version. I don't believe that it's related to your changes. |
Ah, I see. Thanks for your information @nateprewitt . |
Hi guys, Any reason not to merge this pull request with single lock? Also I can confirm that the single lock fixes the problem. Please let me know if I could help somehow. |
@vinnyspb Right now the reason is "because it breaks the tests" =) |
de8a82d
to
f5b69b3
Compare
@Lukasa Just make it up to date. Please review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, LGTM.
Ref #275
For avoid possible deadlock, I just add a
_OrderedRLock
object which will hold its inner lock when you try to acquire it. That's to say if you hold_write_lock
you hold_read_lock
too or if you hold_lock
you hold_write_lock
and_read_lock
too.I'm not sure if this is overdoing it a little bit. What do you think about this? @Lukasa