Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to issue full reconnect on a publication error. #1214

Merged
merged 2 commits into from
Dec 6, 2022

Conversation

boks1971
Copy link
Contributor

@boks1971 boks1971 commented Dec 6, 2022

Leaving the publication error timeout at 30 seconds as there are some publications taking long. Also, there are cases where the peer connection fails after 30 seconds. The peer connection failure happens after publication error is detected. But, 30 seconds is a good amount of time for publication to establish.

Leaving the publication error timeout at 30 seconds as there
are some publications taking long. Also, there are cases
where the peer connection fails after 30 seconds. The peer
connection failure happens after publication error is detected.
But, 30 seconds is a good amount of time for publication to establish.
@boks1971 boks1971 requested review from davidzhao and cnderrauber and removed request for davidzhao December 6, 2022 08:55
@boks1971 boks1971 merged commit 6bd5504 into master Dec 6, 2022
@boks1971 boks1971 deleted the raja_reconnect_on_pub_failure branch December 6, 2022 09:17
Codeacious added a commit to Invisv-Privacy/livekit that referenced this pull request Dec 7, 2022
* Split out mediatransportutil (livekit#1071)

* Cleanup pass through logging (livekit#1073)

* added filtering for noisy pion logs
* demoted some logs to debug
* using consistent trackID / participant / publisher / subscriber terminology
* removed ice candidate log lines, deferring to combined log

* Allow TCP fallback on multiple failures. (livekit#1077)

* Update protocol (livekit#1075)

* update protocol

* auto egress ensure unique filename

* fix tests

* update pion/ice webrtc for udpmux fix (livekit#1081)

* Cache RTPStats and seed on re-use (livekit#1080)

* Cache RTPStats and seed on re-use

When a cached down track is re-used, RTPStats was not cached.
This caused sender reports getting out-of-sync with the remote side.
Cache RTPStats and seed it on re-use.

* staticcheck

* Fix simulcast codec block track close (livekit#1082)

* Update module github.com/urfave/cli/v2 to v2.19.2 (livekit#1076)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update github.com/livekit/mediatransportutil digest to 7440725 (livekit#1072)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* v1.2.4 (livekit#1084)

* Add default handler to 404 (livekit#1088)

* Support track level stereo and red setting (livekit#1086)

* Support track level stereo and red setting

* fix test client

* Prevent multiple transport fallback in same session. (livekit#1090)

* Update module github.com/urfave/cli/v2 to v2.20.2 (livekit#1089)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Log Service API requests (livekit#1091)

* Refine nat 1to1 mapping setting (livekit#1094)

Now only set mapping when user_external_ip enabled or node_ip is
explicitly set. If multiple local address resolved to same external
ip, only the first one will be mapped to external, avoid candidate
conflict between different clients.

* Add server-sdk-kotlin to docs (livekit#1095)

* remove record check on CreateRoom (livekit#1096)

* update protocol (livekit#1097)

* Misc clean up (livekit#1099)

* Use pointers for packetMeta in sequencer to save memory. (livekit#1100)

Padding packets do not need the full structure. They just
need a placeholder in the sequencer array. So, use pointers
(with padding slots filled by nil) to save some memory.

Also, don't need padding for audio (yet). As padding packets
are used only for probing and we do not probe using audio tracks (yet).

* Warn on out-of-order RTCP RR. (livekit#1101)

Have been seeing a few instances of "too many packets expected in delta"
when trying to generate RTCP SR on down track. Actual sequence numbers
indicate that start is after the end.

As down track RTPStats are driven by receiver report, wondering if we
are getting RTCP_RR out-of-order somehow causing this to happen.
Cannot find any other reason for this.

So, accepting RTCP_RR based update only if the sequence number is higher
than existing and also logging a warning with sequence numbers if they
look out-of-order.

* Accept same highest sequence number as a puased track sends that (livekit#1102)

* Log level should be `warn` instead of `warning` (livekit#1104)

* Flag to bypass strict config enforcement (livekit#1105)

Useful with the Helm charts, where other keys can be declared together.

* v1.2.5 (livekit#1106)

scrapped 1.2.4 release and will release 1.2.5 instead

* Some misc changes (livekit#1107)

- ticker.Stop always
- clean up timer func (if they are added) on participant close
- sequencer test enhancement to add a real packet after a pdding packet

* Allocate packetMeta up front to reduce number of allocations. (livekit#1108)

* Do not log duplicate packet error. (livekit#1116)

Also, use bucket.Err* as it has been moved to mediatransportutil

* Notify max layer taking into account overshoot. (livekit#1117)

* Notify max layer taking into account overshoot.

An attempt to handle case of FF stopping layer 0, but not layer 1.
When max subscribed layer is layer 0, server allows overshoot to layer 1
to ensure continued streaming when the channel is not congested.

But, dynacast could have reported maximum subscribed layer as layer 0.

This is a very simple attempt to address that by taking overshoot
into account. Needs testing if this works well or not.

NOTE: When subsriber/down track is unmuted, it will report
the max subscribed layer as the max required layer. In those cases,
if the client does not start layer 0, there will still be an issue.
IOW, server is not keeping track of client behaviour that the client
has stopped layer 0 and publishing only layer 1. Server is just
accommodating overshoot with this change.

* Fix a couple of tests and change reflect.DeepEqual -> require.Equal as
much as possible

* Populate memory load in node stats. (livekit#1121)

* Add fps calculator for VP8 and DependencyDescriptor (livekit#1110)

* Add fps calculator for VP8 and DependencyDescriptor

* clean code

* unit test

* clean code

* solve comment

* Consolidate getMemoryStats (livekit#1122)

* Consolidate getMemoryStats

* Avoid divide-by-0

* Remove named returns from room service. (livekit#1124)

* Update module github.com/urfave/cli/v2 to v2.20.3 (livekit#1115)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Generated CLI Flags (livekit#1112)

* web egress (livekit#1126)

* baseFlags do not need to be set (livekit#1127)

* Use ingress specific grants (livekit#1125)

* Remove DD extension when AV1 not preferred (livekit#1129)

* Seed snapshots (livekit#1128)

* Seed snapshots

- For one cycle after seeding, delta snap shot can get a huge gap
because of snapshot iitializing from start if not present. Not
a huge deal sa it should not affect functionality, but saving/restoring
(at least with down track) snap shot is a big deal. So just do it.
- Have been seeing a bunch of cases of delta stats getting a lot of
packets due to out-of-order (what seems like) receiver report. So,
save the receiver report and log it when out-of-order is detected
to understand if they are closely spaced or something else could be
happening.

* Remove comment that does not apply anymore

* log current time and RR

* fix activeRecording (livekit#1132)

* fix activeRecording

* also check if p is nil

* experiment fallback to tcp when udp unstable (livekit#1119)

* fallback to tcp when udp unstable

* Don't collect external address for ip filterd out (livekit#1135)

* don't assume uname in /usr/bin/ (livekit#1138)

uname exists in /bin on Ubuntu.  As such, the installation fails.               
                                                                                
The script determines the OS without specifying the complete path to uname.  Do 
the same in order to determine ARCH.

* Update module golang.org/x/sync to v0.1.0 (livekit#1136)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/urfave/cli/v2 to v2.23.0 (livekit#1133)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Add red encoding for opus only publisher (livekit#1137)

* Add red encodings for opus only publisher

* Add test case for red receiver

* Recover lost packet from red encoding for opus only client (livekit#1139)

* Recover lost packet from red encoding for opus only client

* Avoid divide-by-zero (livekit#1141)

* revert canclose method of red receivers (livekit#1142)

* Update module github.com/urfave/cli/v2 to v2.23.2 (livekit#1144)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/prometheus/client_golang to v1.13.1 (livekit#1140)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/urfave/cli/v2 to v2.23.4 (livekit#1145)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Make sure client get participant info before track fired (livekit#1147)

* Set track level codec settings for all pending tracks (livekit#1148)

* Use the redis.UniversalClient interface instead of *redis.Client when interacting with go-redis (livekit#1149)

* Use the redis.UniversalClient interface instead of *redis.Client when interacting with go-redis

* Update protocol to v1.2.1

* RedisStore: make UnlockRoom atomic (livekit#1044)

Co-authored-by: David Zhao <dz@livekit.io>

* Update module github.com/prometheus/client_golang to v1.14.0 (livekit#1151)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Version 1.3.0 (livekit#1152)

Co-authored-by: David Zhao <dz@livekit.io>

* Fix YAML failure when logging config is declared (livekit#1154)

* Fix YAML failure when logging config is declared

* version bump

* CreateRoom API to actually create the room on an RTC node (livekit#1155)

Previously, CreateRoom only created the room in storage, but did not
hydrate it on an RTC node. This has caused strange behaviors such as
emptyTimeout not working correctly (livekit#1109).

Also reduced room reap worker to consistently reap rooms. Fixes livekit#241

* Confirm room creation prior to returning from CreateRoom (livekit#1157)

* Demote some stable logs to Debugw (livekit#1158)

* Demote some stable logs to Debugw

* Add 'discard message from' to ignore list

* A few more log tweaks (livekit#1159)

* Update module github.com/urfave/cli/v2 to v2.23.5 (livekit#1153)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Check for ignored with fully formatted log message (livekit#1163)

* Lmit failed send to specific error type (livekit#1164)

* fix memory leak on long run room/participant (livekit#1169)

* Fix memory leak

* update pion

* clean code

* Do not append the stream key to the ingress URL for rtmp (livekit#1167)

* command to show help with hidden generated flags (livekit#1171)

* Force full reconnect when there is no previous answer during migration. (livekit#1168)

* Fix potential ssrc collision between participants (livekit#1173)

* Update module github.com/livekit/protocol to v1.2.3 (livekit#1176)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Prevent rtx buffer and forwarding path colliding (livekit#1174)

* Prevent rtx buffer and forwarding path colliding

Received packets are put into RTX buffer which is
a circular buffer and the packet (sequence number) is
queued for forwarding. If the RTX buffer fills up
and cycles before forwarding happens, forwarding
would pick the wrong packet (as it is holding a
reference to a byte slice in the RTX buffer) to forward.

Prevent it by moving reading from RTX buffer just
before forwarding. Adds an extra copy from RTX buffer
-> temp buffer for forwarding, but ensures that forwarding
buffer is not used by another go routine.

* Revert some changes from previous commit

Details:
- Do all forward processing as before.
- One difference is not load raw packet into ExtPacket.
- Load raw packet into provided buffer when module that reads
using ReadExtended calls that function. If the packet is
not there in the retransmission buffer, that packet will be
dropped. This is the case we are trying to fix, i. e. the RTX
buffer has cycled before ReadExtended could pull the packet.
This makes a copy into the provided buffer so that the data
does not change underneath.

* Remove debug comment

* Oops missed a function call

* rename hidden-help to help-verbose (livekit#1180)

* Set forceRelay to unset by default (livekit#1184)

Setting it to `DISABLED` could cause clients to override user preference to use relay.

* Update module github.com/pion/ice/v2 to v2.2.12 (livekit#1183)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Support redis cluster mode (livekit#1181)

* use redisConfig of protocol instead of redisConfig and use redis of protocol to create redis client to support redis cluster mode too

* Prevent subscription after participant close. (livekit#1182)

* Add enable loopback candidate option (livekit#1185)

* Force 'allowed; even on subscription.' (livekit#1187)

Also not holding lock while resolving by sid. Not pretty.
But, we should deprecate SID based permissions.

Also, wire changes, possibly due to redis cluster PR?

* set stereo max bitrates to 510000 (livekit#1188)

* Promote dyancast logs for debugging (livekit#1190)

* Check forwarder started when seeing. (livekit#1191)

When switching from local -> remote or remote -> local,
the forwarder state is cached and restored after the switch
to ensure continuity in sequence number /time stamp.
But, if the forwarder had not started before the switch,
the sequence number always starts at 1 because of seeding.
So, do not see unless forwarder was started before the switch.

* Do not forward media till peer connection is connected. (livekit#1194)

There were some failures with missing media. The only thing I could
see between working and non-working case is when media forwarding
starts. So, delay media forwarding till peer connection is connected.

Also, add a subscribe op only if a subscribe/unsubscribe queuing is
successful. There was a recent change to not queue a subscribe when
the participant is closed/disconnected. This got the subscribe op
counter out of whack.

* Fix rtcp lost for downtrack used incorrect buffer factory (livekit#1195)

* Fix rtcp lost for downtrack used incorrect buffer factory

In buffer factory change(livekit#1173), every pariticipant has its own
buffer factory, can't use publisher's bufferfactory to create
DownTrack

* clean code

* Encoding primary packet only if red encoding don't have enough space (livekit#1196)

* Encoding primary packet only if red encoding don't have enough space

* clean code

* node type prometheus metric labels (livekit#1197)

* Add stats for data channel and signal (livekit#1198)

* Add stats for data channel and signal

* Solve comment

* Suppress a few additional Pion logs (livekit#1199)

* Suppress a few additional Pion logs

* remove dupe

* Update module go.uber.org/zap to v1.24.0 (livekit#1200)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/hashicorp/golang-lru to v0.6.0 (livekit#1165)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/twitchtv/twirp to v8.1.3+incompatible (livekit#1120)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Revert "Update module go.uber.org/zap to v1.24.0 (livekit#1200)"

This reverts commit bf0bbc3.

* Improve logging messages with RoomService (livekit#1203)

* Configurable RoomService execution timeout (livekit#1206)

* API execution timeout is now configurable

In certain environments, it can take longer than the default 2s to
fully execute API requests. Making execution timeout a configurable option.

* do not expose api to YAML. internal for now.

* Create response channel before sending StartSession (livekit#1208)

* Fixed single-node routing breakage. (livekit#1209)

* Fixed single-node routing breakage.

Due to a regression of a previous change, Redis was always enabled even
when no configuration was provided.

* updated go modules

* Update module github.com/urfave/cli/v2 to v2.23.6 (livekit#1207)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/pion/turn/v2 to v2.0.9 (livekit#1201)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update module github.com/pion/transport to v0.14.1 (livekit#1202)

Generated by renovateBot

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Added logging fields for Ingress & Egress services (livekit#1205)

* Don't output error log if upstream closed while binding (livekit#1211)

* Add option to issue full reconnect on a publication error. (livekit#1214)

* Add option to issue full reconnect on a publication error.

Leaving the publication error timeout at 30 seconds as there
are some publications taking long. Also, there are cases
where the peer connection fails after 30 seconds. The peer
connection failure happens after publication error is detected.
But, 30 seconds is a good amount of time for publication to establish.

* prevent recursive lock

* Close migration muted track which is not fired (livekit#1215)

Co-authored-by: Raja Subramanian <raja.gobi@tutanota.com>
Co-authored-by: David Zhao <dz@livekit.io>
Co-authored-by: David Colburn <xero73@gmail.com>
Co-authored-by: cnderrauber <zengjie9004@gmail.com>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Samuel Humeau <samuel.humeau@gmail.com>
Co-authored-by: davidliu <davidliu@deviange.net>
Co-authored-by: Mathew Kamkar <578302+matkam@users.noreply.github.com>
Co-authored-by: Benjamin Pracht <benjamin@livekit.io>
Co-authored-by: Mohit Agarwal <mohit@sdf.org>
Co-authored-by: MaxnSter <maxnster20@gmail.com>
Co-authored-by: Tom Xiong <tomxiongzh@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants