Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OnWriteComplete triggers after automatic track republish #423

Open
SijmenHuizenga opened this issue Mar 15, 2024 · 1 comment
Open

OnWriteComplete triggers after automatic track republish #423

SijmenHuizenga opened this issue Mar 15, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@SijmenHuizenga
Copy link
Contributor

SijmenHuizenga commented Mar 15, 2024

I am observing that the LocalReaderTrack ReaderTrackWithOnWriteComplete callback is triggering after the room has successfully restarted. This is unexpected, because the LocalReaderTrack has not finished writing, it just restarted writing.

My current theory of why this happens is as follows:

  1. Connection is restarted
  2. New connection is established, livekit sdk re-publishes the tracks
  3. pion webrtc call track.OnBind, calling LocalTrack.StartWrite, calling LocalTrack.writeWorker
  4. writeWorker stops the previous writeWorker context (see cancelWrite). This means that the defer onComplete() triggers
  5. The onComplete is linked to ReaderTrackWithOnWriteComplete
@SijmenHuizenga
Copy link
Contributor Author

To verify the hypothesis, I added some extra logging in the sdk and had it run on production for a few days.

diff --git a/localtrack.go b/localtrack.go
index 8d5a1d7..29a4f46 100644
--- a/localtrack.go
+++ b/localtrack.go
@@ -526,7 +526,9 @@ func (s *LocalTrack) rtcpWorker(rtcpReader interceptor.RTCPReader) {
 }

 func (s *LocalTrack) writeWorker(provider SampleProvider, onComplete func()) {
+       logger.Infow("LocalTrack.writeWorker()")
        if s.cancelWrite != nil {
+               logger.Infow("LocalTrack writeWorker() cancelWrite")
                s.cancelWrite()
        }
        var ctx context.Context
@@ -534,7 +536,10 @@ func (s *LocalTrack) writeWorker(provider SampleProvider, onComplete func()) {
        ctx, s.cancelWrite = context.WithCancel(context.Background())
        s.lock.Unlock()
        if onComplete != nil {
-               defer onComplete()
+               defer func() {
+                       logger.Infow("LocalTrack writeWorker() onComplete")
+                       onComplete()
+               }()
        }

this results in the following logs. Note that the logs with "ts"=" originate from livekit-sdk. Other logs are from our application.

2024/03/18 23:01:34 INF [livekit] [room robot-zora-02] Livekit room reconnecting
2024/03/18 23:01:34 INF [livekit] "ts"="2024-03-18 23:01:34.811670" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="restarting connection..." "reconnectCount"=0
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.190951" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="published track" "name"="front" "source"="CAMERA"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.213022" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="published track" "name"="back" "source"="CAMERA"
2024/03/18 23:01:36 INF [livekit] [room robot-zora-02] Livekit room reconnected
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.408438" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack.writeWorker()"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.408510" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack writeWorker() cancelWrite"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.408651" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack.writeWorker()"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.408699" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack writeWorker() cancelWrite"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.814192" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack writeWorker() onComplete"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.814344" "caller"={"file"="logger.go" "line"=362} "msg"="could not write sample" "error"="io: read/write on closed pipe"
2024/03/18 23:01:36 INF [livekit] "ts"="2024-03-18 23:01:36.814433" "caller"={"file"="logger.go" "line"=351} "level"=0 "msg"="LocalTrack writeWorker() onComplete"
2024/03/18 23:01:36 INF [livekit] [room robot-zora-02] [back] unpublishing track TR_VCkwvNJLtzP2Cc
2024/03/18 23:01:36 INF [livekit] [room robot-zora-02] [back] track finished

The problem we see is that onComplete is called after full reconnect. This bubbles up into ReaderTrackWithOnWriteComplete, which is wrong because the track writer not complete.

Our application code uses the callback of ReaderTrackWithOnWriteComplete to unpublishes any tracks with the trackname, as an extra cleanup step. So in this case, the newly published track is immediately unpublished.

@dennwc dennwc added the bug Something isn't working label Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants