Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and implement Murmur locking strategy #2276

Closed
wants to merge 0 commits into from

Conversation

@mkrautz
Copy link
Member

commented May 16, 2016

[I am ready to have a few more eyes on this PR now... Don't mind the refactoring commits -- I can split them out later, but for now, the later locking changes depend on them. Also don't mind the CMakeLists.txt -- it is only there to help you if you want to review using CLion... It will be removed before landing. It's not in a state where it can be landed at the moment.]

This PR documents Murmur's locking strategy in docs/MurmurLocking.md.

It also implements the locking scheme described in docs/MurmurLocking.md in places where Murmur was not living up to the requirements.

The best way to approach this for review is to read docs/MurmurLocking.md first.

Then, look through each commit that adds locking to a part of Murmur, and ensure it is correct according to the MurmurLocking.md document.

Let us, as an example, use the commit that adds a write lock to addChannel/removeChannel/link/unlink in the main thread. To review this commit, you would ensure that all code paths in the main thread that write to the data used by those methods take a write lock on qrwlVoiceThread. You could, for example, use "Find usages" in CLion, or equivalents from other IDEs.

To properly review the whole thing, you would need to:

  • Read the docs/MurmurLocking.md document, and note the methods that the voice thread consists of.
  • Go through all methods uesd in Murmur's voice thread, and take notes about what kind of data the methods access that is potentially a data race. (My notes are at https://gist.github.com/mkrautz/952fddc88e764a4b7179efd4be95f7b6 -- this is what became MurmurLocking.md -- but it's more helpful if you make your own, in case I left something out.)
  • Check that all the data accesses that you found are properly documented in docs/MurmurLocking.md. If not, raise it as an issue in the PR. That document is meant to cover everything that can be touched by both the voice thread and and the main thread at the same time.
  • Then, go through all data mentioned in docs/MurmurLocking.md and ensure that the locking for that data is correct.
@mkrautz

This comment has been minimized.

Copy link
Member Author

commented May 16, 2016

docs/MurmurLocking.md is missing some crucial details about how the voice thread uses qrwlVoiceThread: it holds a read lock to it during most of its operation.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented May 16, 2016

@hacst @Kissaki @fwaggle

Anyone up for reviewing this?

We'll probably need to do multiple iterations...

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch 2 times, most recently from 7f5beeb to 475dd1e May 17, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented May 17, 2016

docs/MurmurLocking.md is missing some crucial details about how the voice thread uses qrwlVoiceThread: it holds a read lock to it during most of its operation.

Fixed.

(u->qsName.toLower() == uSource->qsName.toLower())) {
uOld = u;
break;
foreach(ServerUser *u, qhUsers) {

This comment has been minimized.

Copy link
@hacst

hacst May 22, 2016

Member

Seems like indendation got a bit messed up here.

This comment has been minimized.

Copy link
@mkrautz

mkrautz Jun 3, 2016

Author Member

Fixed.

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch 2 times, most recently from cf3ed46 to ef18c56 Jun 3, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2016

efbd24d has a lot of unintended whitespace changes... I will fix. (I blame CLion.)

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch 3 times, most recently from 29f75aa to 5f5f85f Jun 3, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 3, 2016

Fixed the aforementioned whitespace issues.

PTAL.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 6, 2016

Anything in particular I can do to help to make this easier to review? :(

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 6, 2016

Nag me more? Honestly. It is know to work on me ;)

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2016

OK! :-)

@hacst Ping!

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 7, 2016

It's starting to work. Keep at it!!11 ;) Let's set me a deadline for this weekend. I do realize that made me delay you landing this for ages but I won't be able to get a big enough block of free time to really tackle such mindbeding stuff till then. Hope that's ok.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 7, 2016

Sure thing!

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 11, 2016

@hacst Ping?

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 12, 2016

Hi. Sorry for the wait. The only things that stood out to me locking/threading wise were:

  • I think bRunning in Server.h ought to be an atomic as it is read from the voice thread (::run) but set from the main thread. In theory (though unlikely in practice) I think this could confuse the optimizer.
  • In Server.cpp there is one case where the voice thread takes the qrwlVoiceThread for writing using qrwlVoiceThread.lockForWrite(). This confused me a bit since from the documentation I assumed only the main thread is allowed to do that. The use itself looked safe to me though.

Apart from those I couldn't find anything violating the rules you set/derived (which sound good to me).

Note that I didn't look too closely at the GRPC related code. I haven't built murmur with that yet and it confused CLion when I tried to enable it. I assume this isn't that big of a deal since it's somewhat separate and not built in by default at this time. However it does create its own thread so it might warrant a closer look before we ship it as a "stable" thing.

TL;DR: LGTM apart from the two bullet points above ;)

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

Regarding bRunning: Yep, that's true. But that's also true for Mumble. We use it a lot in Mumble.

Qt 5 has its own atomic implementation in QThread (isInterruptionRequested, requestInterruption), but I really think we should do something about those as well...

Thanks for the review.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

@hacst isn't the case where qrwlVoiceThread is locked for writing in the voice thread covered by 5f5f85f#diff-a39263b917c72193a89943cd67f5e226R165?

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 12, 2016

Hm. The location is unrelated to qmTargetCache though? Isn't it. I'm talking about the code around foreach(ServerUser *usr, qhHostUsers.value(ha)).

Re-reading my comment I forgot an important part about that use of qrwlVoiceThread.lockForWrite(). I couldn't see where it was released in case a race had occured and the user was no longer present.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

Hm. The location is unrelated to qmTargetCache though? Isn't it. I'm talking about foreach(ServerUser *usr, qhHostUsers.value(ha)).

Thanks!

This section:
5f5f85f#diff-a39263b917c72193a89943cd67f5e226R129

is missing qhHostUsers and qhPeerUsers.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

Re-reading my comment I forgot an important part about that use of qrwlVoiceThread.lockForWrite(). I couldn't see where it was released in case a race had occured and the user was no longer present.

Do you mean this code?

https://github.com/mumble-voip/mumble/blob/master/src/murmur/Server.cpp#L1106-L1113

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

Re-reading my comment I forgot an important part about that use of qrwlVoiceThread.lockForWrite(). I couldn't see where it was released in case a race had occured and the user was no longer present.

Do you mean this code?

https://github.com/mumble-voip/mumble/blob/master/src/murmur/Server.cpp#L1106-L1113

I guess you mean this:

https://github.com/mumble-voip/mumble/blob/master/src/murmur/Server.cpp#L828

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 12, 2016

^- that case does look suspicious... I'll have a look tomorrow.

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch from 5f5f85f to b289c17 Jun 19, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 19, 2016

@hacst PTAL

I have not fixed the bRunning issue, as it also has ties to Mumble. I think we should fix that separately.

I have added qhPeerUsers and qhHostUsers to MurmurLocking.md (amended to b289c17)
I have added a commit to fix the issue you raised regarding locking in the unknown peer path in Server::run: 3dc8b56

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 20, 2016

The classification for qhPeerUsers and qhHostUsers still looks wrong to me. qhPeerUsers is shared data as it is inserted into from Server::run using qhPeerUsers.insert(key, u); and Server::connectionClosed removes from it from the main thread. For qhHostUsers while the container itself is (hopefully, hashmap auto-creation semantics are dangerous) not modified by the voice thread it does modify the contained data which assume ought to share the ownership? In that case it would also be shared data.

The unlock seems safe now.

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 20, 2016

Agreed. I put them in the wrong section...

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch from b289c17 to b56522d Jun 23, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 23, 2016

Moved them to shared section.

PTAL @hacst.

@hacst

This comment has been minimized.

Copy link
Member

commented Jun 23, 2016

LGTM.

@mkrautz mkrautz closed this Jun 23, 2016

@mkrautz mkrautz force-pushed the mkrautz:murmur-locking-v12 branch from b56522d to 18ccbf1 Jun 23, 2016

@mkrautz

This comment has been minimized.

Copy link
Member Author

commented Jun 23, 2016

Thanks for the review!

I removed the CMake commit and merged it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.