Use Go in a more idiomatic way #126

daniel-abramov · 2023-02-06T17:52:01Z

This is a result of working on #120 which did not quite work as expected due to some bugs and issues with Pion.

So currently it's more a preparation to properly implement a #120 and to get ready for #117

This PR changes the following things:

The way we work with the Go channels. It removes some helpers that we used to use in the past since they were not generic enough and their implementation would not work as expected if used as a generic channel implementation. We also use unbounded channels more than before that simplifies the backpressure between the router and the conference. The only part that is left that does not look super idiomatic is pkg/channel/sink.go (might get improved in the future).
We now use a bit more idiomatic package structure, i.e. this PR removes the usage of a common package as it seems to be an antipattern that is used to overcome Go limitations and which ends up having everything that does not belong to other packages.
Prepare for Make each publisher send RTP packets directly to its subscribers #117

Note that there is one significant change in the behavior: currently, the router process the messages from the Matrix SDk and sends them to the conference via an unbounded channel. This means that if one conference freezes, it will affect all other conferences! It's not a problem for now since we only primarily use a single conference and the conferences should never freeze (so it helps to check if our implementation is sound), but ideally we probably need to get the buffer back for the conference, otherwise we won't be able to make them all independent.

There was no point requesting the key frame by sending a message to the conference as the only thing that the conference did was calling a single function from the peer, so it was completely wasting the CPU time. Instead, we could directly call this function and write to the peer connection. The function is thread-safe since all mutable state is protected by a mutex and the peer connection has a mutex inside.

This solves the problem with managing the queue size on the receiver.

It's ok if the matrix SDK blocks until we're ready to accept new messages.

Use unbounded channels for that. We can't quite use it for the multi-conference set ups as a single conference will affect others, but currently we don't have these problems.

Usage of `common` is an anti-pattern.

Package `common` is an antipattern.

The `common` package is an antipattern.

This is a temporary measure until we can workaround bugs in Pion that lead to the deadlocks.

dbkr

I think this looks good, but this is a huge PR with a lot of unrelated changes, making it very hard to review. Moving things out of common, in particular, feels like it doesn't belong here.

daniel-abramov · 2023-02-07T19:33:24Z

I think this looks good, but this is a huge PR with a lot of unrelated changes, making it very hard to review. Moving things out of common, in particular, feels like it doesn't belong here.

Yeah, sorry, perhaps the common part should have been moved into its own PR.

I tried to separate changes by commits, so it's easier to review the PR by checking each individual commit as one commit did one single thing (I tried to rebase them before creating a PR so that it's easier to follow the logic and the changes). I.e. if you review it by commit, I think it should be more manageable as they are mostly small.

EnricoSchw · 2023-02-07T20:33:07Z

cmd/sfu/main.go


 	// Start matrix client sync. This function will block until the sync fails.
 	matrixClient.RunSyncing(func(e *event.Event) {
-		routerChannel <- e
+		matrixEvents <- e
 	})


matrixClient.RunSyncing running endless until a panic happen. Is that right?

We could avoid this if we change the signature like this:

matrixEvents, errc := s.matrixClient.RunSyncing() routing.RunRouter(matrixClient, connectionFactory, matrixEvents, config.Conference) if err := <-errc; err != nil { log(err) }

Ah, I see. Do I get it right that the advantage is that we don't panic from inside of a package, but do it in main instead?

We could decide that by what kind of error occurred. I think we should avoid panic if not needed.

Yeah, makes sense, though we only have one type of error. But please confirm that my understanding is correct: the reason why we're doing this is that it would allow us to avoid panic inside a package? - If so, I agree, but couldn't we solve it easier by just returning this error instead of panicking? I.e. I can slightly change the RunSyncing() so that it does not panic once syncing stopped, but instead just returns the error back to the caller.

RunSyncing() acts as the main loop. errc is a kind of done channel that could interrupt the loop gracefully. Additionally errc contains the reason for the termination. With nil it was an intentional abort and with error an abort because an issue.

Right. But we could simply return an error instead of a panic which would be the same effect, right?

Anyway, I pushed a couple of new changes, so that:

We don't panic anymore from inside the signaling package, we return an error instead (to be consistent with the API of the Matrix SDK there; it seems like a separate errc is not required in that particular case, but could be added on top of it if we ever need it?).

I don't store the done inside a conference state anymore (as you pointed out, we only use it from the processMessages() loop to close it once the loop is over, so I just pass it as a local variable. That way no other function can [by accident] close a channel that it's not supposed to close 👍

Ptal 🙂

EnricoSchw · 2023-02-07T20:50:19Z

pkg/channel/sink.go

+		return ErrSinkSealed
+	case s.messageSink <- messageWithSender:
+		return nil
+	}


You do not know which of the two cases is executed at runtime. With 'default' you make sure that the case is checked first.

select { case <-s.sealed: return ErrSinkSealed default: s.messageSink <- messageWithSender: return nil }

But is it not different semantically? I.e. what if the sender gets to the s.messageSink <- messageWithSender part and blocks there waiting for the reader to get ready? We must ensure that once the reader is not ready to accept new messages (e.g. not interested in them anymore), then we don't continue waiting on the channel. Essentially that's what I tried to describe, i.e. it's semantically I wanted this

select { case <-done: return OkRecipientDoesNotExpectNewMessages case s.messageSink <- messageWithSender: return nil }

Though your concern is valid, I tried to catch this case by checking the atomic variable at the beginning of the function (this does not guarantee that those who called Send() just before Close() won't send a value though).

Ok got it better send a message more than blocking.

EnricoSchw · 2023-02-07T21:34:17Z

pkg/conference/start.go

 	}

 	participantID := participant.ID{UserID: userID, DeviceID: inviteEvent.DeviceID, CallID: inviteEvent.CallID}
 	if err := conference.onNewParticipant(participantID, inviteEvent); err != nil {
-		return nil, err
+		return nil, nil
 	}

 	// Start conference "main loop".
 	go conference.processMessages()


Maybe if processMessages has its own done channel that it closes, then we can wait here for the processMessages done and clean up if necessary.

go func() { defer close(done) // close conference after processMessages is done pmDone := conference.processMessages() <- pmDone // ... do some clean ups, currently not needed }()

I thought only because the conference done is distributed in the sinks.
And it would have the advantage that we don't have to bind done it to the conference structure.
What do you think?

I think your point about storing done inside the conference is valid. This could be solved if I pass done to the processMessages (so that we won't need to store it in the conference and rename it to signalDone instead of done so that it does not give a notion as if it's something that we need to listen to).

The rest I think is identical (we start go-routine for processMessages that then cleans up things), though we don't need to wait on done inside StartConference(), since that done channel is returned for the upstream jobs, so that they know when the conference is closed.

I.e. we return the done, so that the caller which sends messages to the conference knows when to stop producing new values (when to stop sending values to the conference).

That way when the sender sends a matrix message to the conference, the sender's code looks like this:

select { case conferenceSink <- message: // message sent to the conference, cool case <-conferenceDone: // conference closed, ok, so we don't send messages to it anymore close(conferenceSink) }

I agree. At this time we are not running in a panic because the conference message channel will never be closed. With the clean up we could first close the sinks and then close the message channel from conference.

At this time we are not running in a panic because the conference message channel will never be closed.

Sorry, I did not quite get it. The channel between the router and a conference gets closed by the sender (router) once the sender spots that the Conference closed the done channel. The done channel is closed by the conference when the conference ends to inform the upstream stage (router) that no more new messages are expected. When the router spots that, the router closes the message channel that it used to send messages to the conference and removes the conference from the list.

I meant this select construct select to send to conference
could lead to a panic when this conference.peerMessages channel would be closed. Because every Sink writes to this channel.

Here in NewSink the channel conference.peerMessages is handed over. I couldn't find a place where we close conference.peerMessages.

Should we decide to close conference.peerMessages then here would be. a good place, after all sinks and processMessages have been closed.

Ah, I see, you mean a different place. Yes, this is a single leftover that is still not refactored (I mentioned it in the description), i.e. ideally that part should be rewritten as a fan-in thing where we have a single sender for each peer and then a merge function. Though we would need to use Select from the reflect since the number of peers changes over time.

So it better reflects the semantics (that the function does not block).

daniel-abramov added 14 commits February 6, 2023 18:49

message_sink: clarify the meaning of sealed

8d0a14e

conference: remove a buffer in peer messages sink

4386f4d

message_sink: refactor the usage of seal logic

11d83f6

subscription: reduce the buffer on worker

36b3fc7

This solves the problem with managing the queue size on the receiver.

router: remove the buffer on incoming channel

5376a7a

It's ok if the matrix SDK blocks until we're ready to accept new messages.

heartbeat: refactor heartbeat channel usage

b03eb1a

channel: get rid of last UnboundedChannelSize

9486575

router: refactor the router in a more idiomatic Go

9315323

Use unbounded channels for that. We can't quite use it for the multi-conference set ups as a single conference will affect others, but currently we don't have these problems.

track: move track info out of common

4ce8b07

Usage of `common` is an anti-pattern.

worker: move to its own package

974f469

Package `common` is an antipattern.

sink: move to the channel module

20b5085

The `common` package is an antipattern.

sink: handle an important edge-case

94d0b62

conference: get back buffers

4ae95a8

This is a temporary measure until we can workaround bugs in Pion that lead to the deadlocks.

daniel-abramov requested a review from a team February 6, 2023 17:52

daniel-abramov changed the title ~~Use channels in a more idiomatic way~~ Use Go in a more idiomatic way Feb 6, 2023

dbkr approved these changes Feb 7, 2023

View reviewed changes

EnricoSchw reviewed Feb 7, 2023

View reviewed changes

daniel-abramov added 3 commits February 8, 2023 19:19

router: rename RunRouter to StartRouter

eff418b

So it better reflects the semantics (that the function does not block).

signaling: don't panic on errors inside a package

b40d6ad

conference: don't store done channel in state

7c4ea8f

EnricoSchw approved these changes Feb 9, 2023

View reviewed changes

daniel-abramov merged commit ddabfc5 into main Feb 9, 2023

daniel-abramov deleted the optimise-the-usage-of-channels branch February 9, 2023 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use Go in a more idiomatic way #126

Use Go in a more idiomatic way #126

daniel-abramov commented Feb 6, 2023 •

edited

Loading

dbkr left a comment

daniel-abramov commented Feb 7, 2023 •

edited

Loading

EnricoSchw Feb 7, 2023

daniel-abramov Feb 7, 2023

EnricoSchw Feb 8, 2023

daniel-abramov Feb 8, 2023

EnricoSchw Feb 8, 2023

daniel-abramov Feb 8, 2023

EnricoSchw Feb 7, 2023

daniel-abramov Feb 7, 2023

EnricoSchw Feb 8, 2023

EnricoSchw Feb 7, 2023

daniel-abramov Feb 7, 2023 •

edited

Loading

EnricoSchw Feb 8, 2023

daniel-abramov Feb 8, 2023

EnricoSchw Feb 8, 2023

daniel-abramov Feb 8, 2023

EnricoSchw Feb 8, 2023

Use Go in a more idiomatic way #126

Use Go in a more idiomatic way #126

Conversation

daniel-abramov commented Feb 6, 2023 • edited Loading

dbkr left a comment

Choose a reason for hiding this comment

daniel-abramov commented Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-abramov Feb 7, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

daniel-abramov commented Feb 6, 2023 •

edited

Loading

daniel-abramov commented Feb 7, 2023 •

edited

Loading

daniel-abramov Feb 7, 2023 •

edited

Loading