Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed incorrect state with resume then reconnect #595

Merged
merged 3 commits into from
Mar 3, 2023

Conversation

davidzhao
Copy link
Member

When a resume sequence fails, a full reconnect is attempted. There were a few issues there with that sequence

  • We did not always fire EngineEvent.Restarting, so Room missed tearing down existing participants
  • With selective subscriptions, when existing isDesired isn't cleared, it will not send a subscribe request when requested
  • New tracks were not republished successfully when reconnected (due to sender not being reset early enough)

When a resume sequence fails, a full reconnect is attempted. There were
a few issues there with that sequence
- We did not always fire EngineEvent.Restarting, so Room missed tearing down existing participants
- With selective subscriptions, when existing `isDesired` isn't cleared, it will not send a subscribe request when requested
- New tracks were not republished successfully when reconnected (due to sender not being reset early enough)
@changeset-bot
Copy link

changeset-bot bot commented Mar 3, 2023

🦋 Changeset detected

Latest commit: 3d8169e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
livekit-client Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

delay,
);
};

private async attemptReconnect(signalEvents: boolean = false, reason?: ReconnectReason) {
private async attemptReconnect(reason?: ReconnectReason) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for signalEvents was really difficult to follow. The reason why it was there is so that Room doesn't have to fire Reconnecting multiple times.

It makes more sense to enforce at the Room layer, instead of here.

RTCEngine always fires Resuming or Restarting events, so that we do not miss the opportunity to unwind

@@ -883,6 +879,11 @@ export default class RTCEngine extends (EventEmitter as new () => TypedEventEmit
}
this.emit(EngineEvent.SignalResumed);

if (this.shouldFailNext) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for testing with simulateScenario('resume-reconnect')

trackInfo.sender = undefined;
}
}
track.simulcastCodecs.clear();
}
} catch (e) {
log.warn('failed to unpublish track', { error: e, method: 'unpublishTrack' });
} finally {
await this.engine.negotiate();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this await caused track.sender to not get unset in time. instead other logic ran and threw an exception using the wrong track sender.

@@ -246,7 +245,7 @@ export default class RemoteParticipant extends Participant {
(publishedTrack) => publishedTrack.source === publication?.source,
);
if (existingTrackOfSource && publication.source !== Track.Source.Unknown) {
log.warn(
log.debug(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible to use tracks of multiple Source, so we'll stop sounding the alarm bells about it.

@davidzhao
Copy link
Member Author

davidzhao commented Mar 3, 2023

Tagging everyone here (my apologies for the wide distribution). since this is a pretty tricky case and important to get right (resume and reconnect attempts), wanted to make sure we get this right here as well as across other SDKs.

Copy link
Contributor

@lukasIO lukasIO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great findings!

Copy link
Contributor

@davidliu davidliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 !

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I'll check that for the Rust SDK 👌
Is it the issue I had at our last meeting?

@davidzhao davidzhao merged commit 75776b8 into main Mar 3, 2023
@davidzhao davidzhao deleted the dz-fix-resume-reconnect branch March 3, 2023 17:35
@github-actions github-actions bot mentioned this pull request Mar 3, 2023
@davidzhao
Copy link
Member Author

This PR actually had a regression as well. addressed in a follow up #600

lukasIO added a commit that referenced this pull request Mar 16, 2023
* Allow for background timers to be overriden (#556)

* Allow for background timers to be overriden for platform specific implementations

* Add changeset

* Formatting

* More timers converted and renamed to CriticalTimers

* Avoid using class scope for timers (#558)

* avoid using class scope for timers

* fix copy paste error

* protocol

* Update dependency ua-parser-js to v1.0.33 (#559)

* Update dependency ua-parser-js to v1.0.33 [SECURITY]

* Create tall-cycles-judge.md

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: lukasIO <mail@lukasseiler.de>

* Version Packages (#554)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Export CheckStatus type (#566)

* Export CheckStatus

* Create afraid-plants-grab.md

---------

Co-authored-by: lukasIO <lukas.seiler@neesh.de>

* Add getDisplayMedia support check (#562)

* Add getDisplayMedia support check

* Create friendly-trainers-yawn.md

---------

Co-authored-by: lukasIO <lukas.seiler@neesh.de>

* Handle unknown signal messages from the server (#568)

When we switched to using oneof in generated protobufs, it broke the ability to handle unknown signal messages from the server. This is due to the protobuf parser returning undefined for message field, versus a new value for $case.

* Drop local participant updates with wrong sid (#570)

* drop participant updates with wrong sid

* Create smooth-students-poke.md

* Remove redundant sid check (#571)

* Update devDependencies (non-major) (#563)

* Update devDependencies (non-major)

* rerun prettier after update

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: Lukas <mail@lukasseiler.de>

* Emit `Participant.PermissionChanged` event also for remote participants (#569)

* emit Participant.PermissionChanged event also for remote participants

* fix typo

* cleanup

* update protocol with source permissions

* better compare

* Create angry-pugs-repair.md

* Version Packages (#567)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add reconnect reason and signal rtt calculation (#573)

* Add reconnect reason and signal rtt caculation

* changeset

* Reset ping when receiving pong instead of clearing (#578)

* Provide more context to ConnectionError when connecting to a room (#575)

* Provide more context to ConnectionError when connecting to a room

* Fix prettier issues

* Create mighty-boats-wonder.md

---------

Co-authored-by: lukasIO <lukas.seiler@neesh.de>

* Only restart track after permission change if not muted (#581)

* only restart track after permission change if not muted

* fix implicit restart case

* Create rotten-bottles-work.md

* restart audio track on unmute if readyState is ended

* Version Packages (#577)

* Update docstring for RecordingStatusChanged (#583)

* Prevent concurrent setParameter call to avoid exception (#585)

if multiple get/setParameter are called concurrently, certain timing of
events could lead to the browser throwing an exception in `setParameter`,
due to a missing `getParameter` call.

* Version Packages (#591)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Update devDependencies (non-major) (#586)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update dependency rollup-plugin-filesize to v10 (#588)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Update dependency gh-pages to v5 (#587)

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>

* Disable red encoding by default for stereo track (#593)

* Disable red encoding by default for stereo track

* changeset

* prettier

* Update src/room/track/options.ts

Co-authored-by: David Zhao <dz@livekit.io>

* Update src/room/track/options.ts

Co-authored-by: David Zhao <dz@livekit.io>

---------

Co-authored-by: David Zhao <dz@livekit.io>

* Track.streamState defaults to active (#596)

StreamState is used to communicate when congestion controller pauses.
It does not make sense to initialize this value to paused since in most
cases, congestion controller has not paused the track and we'd want it to
avoid flickering.

* Fixed incorrect state with resume then reconnect (#595)

When a resume sequence fails, a full reconnect is attempted. There were
a few issues there with that sequence
- We did not always fire EngineEvent.Restarting, so Room missed tearing down existing participants
- With selective subscriptions, when existing `isDesired` isn't cleared, it will not send a subscribe request when requested
- New tracks were not republished successfully when reconnected (due to sender not being reset early enough)

* Always fire EngineEvent.Resuming (#600)

* always fire EngineEvent.Resuming

* changed RTCEngine.connectedServerAddress to an async getter getConnectedServerAddress()

* Version Packages (#594)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Add support for experimental getDisplayMedia options (#592)

* add support for experimental getDisplayMedia options

* reset formatting

* Create .changeset/bright-dancers-worry.md

* Replace terser plugin with official rollup terser plugin (#603)

* Update to ReconnectReason enum (#606)

* update REASON to RR enum

* update protocol

* update proto stubs

* Update outdated slack community link (#607)

* Receive remote participant disconnected updates while reconnecting (#605)

* Add support for topics on data messages (#597)

* Add support for setting topics on data messages

* make destination comment clearer

* update protocol to current main

* update REASON to RR enum

* Create .changeset/metal-otters-give.md

* Defer publishing of tracks during reconnection (#608)

* defer track publishing during reconnection

* keep track of pending publications

* Create .changeset/cuddly-pumpkins-tie.md

* make sure pending promise is always deleted

* guard against multiple reconnect events emitted

* Version Packages (#601)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Reject publish future if engine disconnects (#609)

* Reject publish future if engine disconnects

* Create .changeset/gold-tips-complain.md

* set future to undefined

* Fix wrong videoSimulcastLayer default in docstring (#613)

* Allow to specify exact constraint for device switch (#612)

* Only trigger AudioPlaybackFailed when it's `NotAllowed` (#615)

Other errors could be raised when playing back an audio track.
For example: a user could attach to another track, aborting the current
attempt.

In those cases, we do not want to fire AudioPlaybackFailed incorrectly.

* dont ratchet automatically within cryptor

* fix import key

* add salt as constant

* Version Packages (#610)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* use string key

* map rtp payload type to codec

* try to detect h264 frames and fallback to vp8

* avoid unnecessary steps

* log sdp answer

* use key material in keyprovider

* expose method to ratchet key on keyProvider

* move worker files into dedicated folder

* build e2ee worker in separate build step

---------

Co-authored-by: davidliu <davidliu@deviange.net>
Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: otakueadwine <otaku.eadwine@gmail.com>
Co-authored-by: Daniel Kilimnik <mail@kilimnik.de>
Co-authored-by: David Zhao <dz@livekit.io>
Co-authored-by: cnderrauber <zengjie9004@gmail.com>
Co-authored-by: Herman Bilous <herman.belous@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants