Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keep publish segment file to cloud even source stream dropped #105

Closed
wants to merge 6 commits into from

Conversation

khmak3
Copy link

@khmak3 khmak3 commented Sep 12, 2022

Keep Publish egress output for any source disconnect.
Also change state to completed instead Active state

@frostbyte73
Copy link
Member

hey @khmak3, can you describe the scenario/problem that this addresses? If a participant leaves, we would normally want a track/track composite egress to end (explicit egress stop is not required)

@khmak3
Copy link
Author

khmak3 commented Sep 13, 2022

This code diff address following cases when participant share video, and there are egress track on that video

  1. There are disconnect issue between participant and livekit and participant rejoin (It can reproduce by refresh the browser session https://example.livekit.io). The egress will fail sometime and no file will be publish
  2. There are network error issue (webrtc packet drop) between participant with livekit. The egress will fail instead of complete (It can reproduce by using netem (with 0.001% packet drop) between participant and livekit)

For second case, I believe the egress end/complete due to error stream is better than fail.
Since egress user can decide to record again. And they can do some manual work to fix the error video es afterward.

This may introduce behavior different from egress track with container. Since some of the container packetization may fail for video access unit with error.
But keep origin video elementary file (with some error) and user can fix the error (could be trim out the error part) later. Should be more descrise behavior for any user cases. Since drop the egress file we can't fix anything in future.

@khmak3
Copy link
Author

khmak3 commented Sep 14, 2022

After introduce multiqueue, there are some case for my diff can't handle. I will add some more to handle those

@khmak3
Copy link
Author

khmak3 commented Sep 15, 2022

@frostbyte73 PR updated to incorporate multi queue pipeline frozen case

@frostbyte73
Copy link
Member

frostbyte73 commented Sep 15, 2022

@khmak3 thanks for the update. From your scenarios I think this means there's something wrong with our appsrc or appwriter (they shouldn't be disconnecting that quickly), and it should definitely be able to handle 0.001% packet loss. I'll be able to dig in more tomorrow

@khmak3
Copy link
Author

khmak3 commented Sep 16, 2022

I don't think likekit code has something wrong for 0.001% error. Percent is not an issue for video codec. Just 1 packet dropped at i frame with information about resolution. It will trigger error inside Video es parser in gstreamer webmux plug-in. The parser think resolution change and webm can't encapsulate the file. So, It will throw error. In previous egress code will treat those as fatal error and failed the job (throw away the egress file). I think we should keep publish the file (even with error). User may fix it (crop the error part) afterward.

@frostbyte73
Copy link
Member

If that's the case, this should already be fixed in v1.4.1 - can you see if you can reproduce it on the latest version?
That update includes a gstreamer fork to allow caps changes in webmmux: https://github.com/livekit/gstreamer/commit/b7b1187a7ac132e25f16c74c1b9404c4b7521932

@khmak3
Copy link
Author

khmak3 commented Sep 19, 2022

After update to 1.4.1, the issue introduced by error stream gone. :)
On the other hand, if the participant drop the connection while egress on active,
The pipeline may throw error either either streaming stopped or Caps change are not supported.
And the egress section will change to Fail cases and no file will be publish.
We my code diff, the egress section will be end as EGRESS_COMPLETE.

@khmak3
Copy link
Author

khmak3 commented Sep 23, 2022

@frostbyte73 Hi, Any concern for this PR? Or LiveKit wants egress job fail when participant drop?
Thanks

@nikrainev
Copy link

I think this is very helpful. Now I use livekit-egress for creating HLS stream, for viewers (one to many stream), and when streamer lose connection (refresh page), egress ended, if I create new egress, when streamer gets the connection back, chunks start writing from the first index (new chunks overwrite old), also playlist reset too. So stream history lost.

@frostbyte73
Copy link
Member

This should be fixed in our sdk and gstreamer in v1.4.2

@khmak3
Copy link
Author

khmak3 commented Sep 29, 2022

I've try v1.4.2
EgressTrack still fail for following cases

  1. Participant leave the room before egress stop. -
  2. Participant reconnect after network disconnect. - connect use https://example.livekit.io/#/ with video. And refresh the browser instance

And this code diff still fix

@Shadowfaxenator
Copy link

I think this is very helpful. Now I use livekit-egress for creating HLS stream, for viewers (one to many stream), and when streamer lose connection (refresh page), egress ended, if I create new egress, when streamer gets the connection back, chunks start writing from the first index (new chunks overwrite old), also playlist reset too. So stream history lost.

I have the same issue with LiveKit Cloud. Could you please help to solve it?
When a streamer disconnects due to network issue (or other short interrupt) egress still works, but HLS files or MP4 recording becomes corrupted after stream is stoped

@nikrainev
Copy link

nikrainev commented Oct 31, 2022

I have the same issue with LiveKit Cloud. Could you please help to solve it? When a streamer disconnects due to network issue (or other short interrupt) egress still works, but HLS files or MP4 recording becomes corrupted after stream is stoped.

I use livekit egress on own server so i create own build of egress, in which change logic of appending new HLS segmenets to playlist. Egress check if playlist exists in current folder and if it exists egress does not create new playlist, but appends new segments to current. So if egress crash i run new. In you issue #160 you said that, RoomCompositeEgressRequest works correctly, do you have one streamer or many ? In my case i have one streamer and when he diconnect egress stops.

@Shadowfaxenator
Copy link

I have the same issue with LiveKit Cloud. Could you please help to solve it? When a streamer disconnects due to network issue (or other short interrupt) egress still works, but HLS files or MP4 recording becomes corrupted after stream is stoped.

I use livekit egress on own server so i create own build of egress, in which change logic of appending new HLS segmenets to playlist. Egress check if playlist exists in current folder and if it exists egress does not create new playlist, but appends new segments to current. So if egress crash i run new. In you issue #160 you said that, RoomCompositeEgressRequest works correctly, do you have one streamer or many ? In my case i have one streamer and when he diconnect egress stops.

I have only one streamer, and if he manually stops streaming, yes egress stops, but if he has some temporary problems (network etc) egress is waiting for him to return

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants