Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix peer creation when wrong user list was sent to the other peer #1533

Conversation

danxuliu
Copy link
Member

@danxuliu danxuliu commented Feb 13, 2019

In some circumstances (for example, if the signaling server provides an incorrect list of users) the peer that created the offer can disconnect from the peer that received the offer without causing the peer that received the offer to remove the peer that created the offer.

In that situation, if the peer that created the offer then tried to connect again with the other peer the other peer created a new Peer object that was used instead of the old one (because the peer that created the offer removed its original "Peer" object and created a new one, so although the id of the peer is the same, its sid is not). However, as the old Peer object was not removed, its video element (which was no longer updated) was kept visible, hiding the video element linked to the new "Peer" object (due to the new video element being prepended to the container) and making it look frozen.

It is not possible to ensure that the peer that received the offer will always remove the peer that created the offer if the peer that created the offer disconnects from the peer that received the offer, so this is handled instead by removing the stale Peer objects when new offers are received.

The bug just described (and fixed in the first commit) happens even if the peer that received the offer disconnects from, but does not remove, the peer that created the offer. The next bug (fixed in the second commit) only happens if the peer that created the offer does not disconnect from the peer that received the offer when the peer that received the offer disconnects from the peer that created the offer.

With latest browser versions when a peer closes the connection with other peer the ICE connection of that other peer is changed to the closed state, so those browsers are not affected by this second bug (although they are by the first). However, with older browser versions (at least in Firefox 52 and Chromium 62; I have not checked which versions changed the behaviour) when a peer closes the connection with other peer the other peer is not aware of that; no event or state change happens in the connection, nor the media stream, nor the data channel... anywhere (at least, as far as I could see).

A peer only tried to connect with another peer if its session ID was larger than the session ID of the other peer; otherwise it just waited for the other peer to start the connection instead. This is done that way to prevent overloading a peer joining a room.

However, in the scenario above, the peer that received the offer just waited for the peer that created the offer to start the connection again. As the peer that created the offer was not aware of the disconnection it did not try to connect again, and thus they never reconnected.

It is not possible to ensure that the peer that created the offer will always disconnect from the peer that received the offer if the peer that received the offer disconnects from the peer that created the offer, so this is handled instead by making that peers with smaller session IDs also start a connection if the peer with the larger session ID did not in a reasonable time.

Besides all those things I have noticed a bug (video is frozen) with pure ICE reconnections (without creating new peer objects) in Firefox (which would happen if testing the second scenario with latest Firefox release; the bug fixed by the second commit would not happen, but this other one would); however, I suspect that it may be related to the shim used by SimpleWebRTC, so I will dig into it again once the change to the new library is done.

@mario @Ivansss You may want (or maybe not :-P ) to check in the apps the forced misbehaviours of the signaling server below to ensure that they work as expected.

How to test (scenario 1):

	$sessionId = $this->session->getSessionForRoom($room->getToken());
	if ($participant->getSessionId() < $sessionId && !rand(0, 3)) {
		continue;
	}

This will make that, sometimes, the participant that sent the WebRTC offer to another participant disconnects from that participant

  • Join a call with user A
  • Join the same call with user B
  • Wait

Result with this pull request:
After several minutes, the peers will have disconnected and reconnected again several times, but the call will still be working.

Result without this pull request:
After several minutes, the peers will have disconnected and reconnected again several times; the video for one of the peers will appear to be frozen, but if the DOM is checked it can be seen that there are instead several video elements. Removing all except the first one will show again a working video.

How to test (scenario 2):

	$sessionId = $this->session->getSessionForRoom($room->getToken());
	if ($participant->getSessionId() > $sessionId && !rand(0, 3)) {
		continue;
	}

This will make that, sometimes, the participant that received the WebRTC offer from another participant disconnects from that participant (opposite case from scenario 1)

  • Join a call with user A
  • Join the same call with user B
  • Wait

Result with this pull request:
After several minutes, the peers will have disconnected and reconnected again several times, but the call will still be working.

Result without this pull request:
After several minutes, one of the peers will have disconnected from the other peer; the peer that disconnected will be waiting for the other peer to reconnect, and the other peer will still be "connected" to that peer but with a frozen video and audio.

In some circumstances (for example, if the signaling server provides an
incorrect list of users) the peer that created the offer can disconnect
from the peer that received the offer without causing the peer that
received the offer to remove the peer that created the offer.

In that situation, if the peer that created the offer then tried to
connect again with the other peer the other peer created a new "Peer"
object that was used instead of the old one. However, as the old one was
not removed, its video element (which was no longer updated) was kept
visible, hiding the video element linked to the new "Peer" object and
making it look frozen.

It is not possible to ensure that the peer that received the offer will
always remove the peer that created the offer if the peer that created
the offer disconnects from the peer that received the offer, so this is
handled instead by removing the stale "Peer" objects when new offers are
received.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
In some circumstances (for example, if the signaling server provides an
incorrect list of users) the peer that received the offer can disconnect
from the peer that created the offer without causing the peer that
created the offer to disconnect from the peer that received the offer
too.

A peer only tried to connect with another peer if its session ID was
larger than the session ID of the other peer; otherwise it just waited
for the other peer to start the connection instead. This is done that
way to prevent overloading a peer joining a room.

However, in the scenario above, the peer that received the offer just
waited for the peer that created the offer to start the connection
again. As the peer that created the offer was not aware of the
disconnection it did not try to connect again, and thus they never
reconnected.

It is not possible to ensure that the peer that created the offer will
always disconnect from the peer that received the offer if the peer that
received the offer disconnects from the peer that created the offer, so
this is handled instead by making that peers with smaller session IDs
also start a connection if the peer with the larger session ID did not
in a reasonable time.

Signed-off-by: Daniel Calviño Sánchez <danxuliu@gmail.com>
@danxuliu
Copy link
Member Author

/backport to stable15

@nickvergessen nickvergessen merged commit 8331d43 into master Feb 13, 2019
@nickvergessen nickvergessen deleted the fix-peer-creation-when-wrong-user-list-was-sent-to-the-other-peer branch February 13, 2019 14:42
@backportbot-nextcloud
Copy link

backport to stable15 in #1542

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants