Skip to content
This repository was archived by the owner on Jan 13, 2021. It is now read-only.

try to avoid deadlock #280

Merged
merged 1 commit into from
Jun 10, 2017
Merged

Conversation

plucury
Copy link
Contributor

@plucury plucury commented Sep 19, 2016

Ref #275

For avoid possible deadlock, I just add a _OrderedRLock object which will hold its inner lock when you try to acquire it. That's to say if you hold _write_lock you hold _read_lock too or if you hold _lock you hold _write_lock and _read_lock too.

I'm not sure if this is overdoing it a little bit. What do you think about this? @Lukasa

except TimeoutError:
assert False

self.tear_down()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to see a comment on this test that points out that it's fundamentally probabilistic: it'll only fail transiently, and that we need to accept that. That will help us take it more seriously if it does transiently fail.

self._read_lock = threading.RLock()
self._write_lock = _OrderedRLock(self._read_lock)
self._lock = _OrderedRLock(self._write_lock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be backwards. The rule is that _read_lock requires that you hold _write_lock and acquire _write_lock first, but this code gets it backwards. The same is true for _lock, which always needs to be held first.

This means that I think the correct declaration is:

self._lock = threading.RLock()
self._write_lock = _OrderedRLock(self._lock)
self._read_lock = _OrderedRLock(self._write_lock)

self._read_lock = threading.RLock()
self._write_lock = _OrderedRLock(self._read_lock)
self._lock = _OrderedRLock(self._write_lock)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a bigger problem here, which we need to address. That's the fact that this necessarily means that now to take the _read_lock you have to take all the locks: that means that our outer lock, which normally exists to act as the minimum level of contention, is now impossible to hold when someone is holding the write lock. That rather defeats the point of having this level of lock granularity.

I think we need another approach here. =(

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested inline, but the biggest concern I have is that this destroys our lock granularity.

@plucury
Copy link
Contributor Author

plucury commented Sep 20, 2016

@Lukasa Yes, granularity, that's what I concern too. I declared locks just like what you suggested above when I start this PR, then I thought maybe it can be backward.

lock aquired forward backward
read_lock hold all three lock hold read_lock only
write_lock hold both write_lock and lock hold write_lock and read_lock
lock hold lock only hold all three lock

According to above table, If we only consider about the final state of acquiring operation, you will find the backward declaration is more approach the original idea of these locks.

However, I think the key point is that we can't know if it needs acquire its outer lock when we try to acquiring a lock. For example, when a process acquiring the read_lock to read socket, it doesn't know if it is necessary to acquiring the write_lock at that time since it depends on what it read. Thus we can't always avoid acquiring read_lock before write_lock and maybe it's a good idea to make sure a process can acquire read_lock before it acquiring write_lock?

@Lukasa
Copy link
Member

Lukasa commented Sep 20, 2016

Annoyingly, there are sort of four locks here which need synchronising. Frankly, at a certain point we start getting altogether quite tricky. @plucury are you proposing that the lock inversion shown in this patch is sufficient?

@plucury
Copy link
Contributor Author

plucury commented Sep 20, 2016

@Lukasa Yes, I think we can't ensure these locks always be acquired in order. But we can avoid deadlock even they were acquired in wrong order.

@Lukasa
Copy link
Member

Lukasa commented Sep 21, 2016

Hrm, I feel like the lock inversion here is an awkward middle ground between having fine-grained locks and just choosing to have a single lock that protects the entire connection object. If we're going to have the complexity of managing a three-level lock hierarchy, I think we should do our best to avoid needing to acquire all three locks any time we want to read. On the other hand, if we do require that then we should consider whether we just want each connection to be locked behind a single lock, which has the advantage of giving us much stronger guarantees on behaviour.

@Lukasa
Copy link
Member

Lukasa commented Sep 21, 2016

Oh, and @tbetbetbe may well have opinions here, I'll tag him in!

@plucury
Copy link
Contributor Author

plucury commented Oct 14, 2016

@Lukasa I just have another try. Instead of using specific lock, I'm trying to keep acquiring locks in order this time. What do you think about it?

@@ -266,8 +266,9 @@ def request(self, method, url, body=None, headers=None):
# If threads interleave these operations, it could result in messages
# being sent in the wrong order, which can lead to the out-of-order
# messages with lower stream IDs being closed prematurely.
stream_id = self.putrequest(method, url)
self.connect()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sadly, this won't work either, as noted in the comment above.

putrequest obtains a stream ID, but doesn't send a HEADERS frame. That means that hyper thinks the stream ID is used, but the remote peer doesn't think that yet.

This whole problem is sufficiently troublesome that I suspect it reflects a more profound problem with the way this library is structured. It is extremely not amenable to being thread-safe except at an extremely coarse level.

I wonder if we need to investigate rewriting this library with threads. =(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa Ah, I see. That's true. So how about simply use the single lock model at the moment?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the single lock model is likely to be the most usable approach at this time.

@plucury plucury force-pushed the development branch 2 times, most recently from 50bb3d2 to de8a82d Compare November 18, 2016 02:38
@plucury
Copy link
Contributor Author

plucury commented Nov 18, 2016

@Lukasa I just transfer connection to single lock model. But I'm not sure why it breaks tests. Would you please have a look? Thanks a lot!

@nateprewitt
Copy link
Member

@plucury this is a bug in hyper-h2==2.5.0 (#291) which has been patched (python-hyper/h2#382) but isn't in a released version. I don't believe that it's related to your changes.

@plucury
Copy link
Contributor Author

plucury commented Nov 18, 2016

Ah, I see. Thanks for your information @nateprewitt .

@vinnyspb
Copy link

vinnyspb commented May 30, 2017

Hi guys,

Any reason not to merge this pull request with single lock?
I can confirm that deadlock issue is absolutely real and we've hit this problem in production.

Also I can confirm that the single lock fixes the problem.

Please let me know if I could help somehow.

@Lukasa
Copy link
Member

Lukasa commented May 30, 2017

@vinnyspb Right now the reason is "because it breaks the tests" =)

@plucury plucury force-pushed the development branch 2 times, most recently from de8a82d to f5b69b3 Compare June 10, 2017 02:05
@plucury
Copy link
Contributor Author

plucury commented Jun 10, 2017

@Lukasa Just make it up to date. Please review again.

Copy link
Member

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, LGTM.

@Lukasa Lukasa merged commit 23a1555 into python-hyper:development Jun 10, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants