Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trickle Example #763

Merged
merged 2 commits into from Jul 31, 2019
Merged

Trickle Example #763

merged 2 commits into from Jul 31, 2019

Conversation

Sean-Der
Copy link
Member

This cherry-picks @hugoArregui work and adds an example

I had to update the gathering code to only start when signaling is in a stable enough state. If someone emits candidates before the other side is ready it causes Chrome/Pion to fail

@codecov-io
Copy link

codecov-io commented Jul 31, 2019

Codecov Report

Merging #763 into master will increase coverage by 67.83%.
The diff coverage is 60%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #763       +/-   ##
===========================================
+ Coverage     8.6%   76.44%   +67.83%     
===========================================
  Files          58       58               
  Lines        4090     4097        +7     
===========================================
+ Hits          352     3132     +2780     
+ Misses       3699      702     -2997     
- Partials       39      263      +224
Impacted Files Coverage Δ
ice_go.go 88.88% <ø> (+88.88%) ⬆️
peerconnection.go 72.9% <16.66%> (+72.9%) ⬆️
icecandidate.go 70.58% <78.57%> (+70.58%) ⬆️
pkg/rtcerr/errors.go 10% <0%> (+10%) ⬆️
rtptransceiver.go 20% <0%> (+20%) ⬆️
stats_go.go 27.77% <0%> (+27.77%) ⬆️
internal/util/util.go 80% <0%> (+33.33%) ⬆️
internal/mux/endpoint.go 62.96% <0%> (+37.03%) ⬆️
settingengine.go 44% <0%> (+44%) ⬆️
internal/mux/muxfunc.go 55.88% <0%> (+55.88%) ⬆️
... and 43 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1464ad4...9be2a9f. Read the comment docs.

@Sean-Der Sean-Der force-pushed the trickle-example branch 2 times, most recently from 6832627 to 209b5da Compare July 31, 2019 08:13
Hugo Arregui and others added 2 commits July 31, 2019 01:20
With this change we can always exchange ICECandidateInit when signaling
Demonstrate how to use Trickle ICE when communicating pion-to-pion
@hugoArregui hugoArregui merged commit 7124c22 into master Jul 31, 2019
@hugoArregui hugoArregui deleted the trickle-example branch July 31, 2019 12:44
hugoArregui referenced this pull request Sep 4, 2019
With this change we can always exchange ICECandidateInit when signaling
@cohosh
Copy link
Contributor

cohosh commented Sep 4, 2019

Hey!

I was pointed to this PR while debugging some problems we recently started having that were caused by commit 7c18bbc.

The changes there delay gathering of ICE candidates for the trickle method to when the peer is preparing an SDP answer. The reason your trickle example works, is that you're relying on the other peer to send you your candidates directly: 9be2a9f#diff-f70c62dbae11cfb1979e93d74673e9c3R73
and not on the gathering of ICE candidates provided in your library. In this example, only one side is actually trickling candidates, the other is adding them manually.

In fact, if you were to modify your example to rely on STUN, the offer peer would never connect because Gather is never being called on their side of the connection.

This is problematic I think not just for us, but also for others who are trying to use WebRTC with STUN or TURN. We rely on STUN and TURN for the design of our system and on a centralized signaling broker. Without the ability for a client to gather candidates for an offer, we'd have to add an additional round trip which would cause a lot of additional engineering and headache.

I'd like to suggest a fix for this example that we've also implemented in our code to prevent premature sharing of trickled candidates. We wait for the ICE gathering state to change to completed before sending candidates to peers: https://github.com/cohosh/snowflake/blob/pion/client/lib/webrtc.go#L183

@Sean-Der
Copy link
Member Author

Sean-Der commented Sep 5, 2019

Hey @cohosh

So I think the right thing to do here is match whatever libwebrtc/Chromium does (I am not exactly sure what they/where I might have accidentally diverged)

The nice thing is we have WASM tests! So we can write Go code and test them via libwebrtc, this makes compliance a lot easier :)


For Trickle to work each side should set OnICECandidate and send the candidates to the remote peer. You can get away having only one side send because WebRTC will create peer reflexive candidates.

If you are waiting for ICEGathererStateComplete I don't think trickle is actually providing you any value. When trickle is off we basically don't return until ICEGathererStateComplete


Hopefully I am understanding the issue! Sorry it has been a little bit since I was in this area, so may be missing a few points. Please fire away and I will make sure to get everything fixed up ASAP.

@cohosh
Copy link
Contributor

cohosh commented Sep 5, 2019

So I think the right thing to do here is match whatever libwebrtc/Chromium does (I am not exactly sure what they/where I might have accidentally diverged)

The nice thing is we have WASM tests! So we can write Go code and test them via libwebrtc, this makes compliance a lot easier :)

Agreed :) I'll take a look at the issue you reported above and see if I can figure out why sending candidates was breaking things. Thanks for pointing out these tests, I'll be sure to use them when creating a patch

For Trickle to work each side should set OnICECandidate and send the candidates to the remote peer. You can get away having only one side send because WebRTC will create peer reflexive candidates.

If you are waiting for ICEGathererStateComplete I don't think trickle is actually providing you any value. When trickle is off we basically don't return until ICEGathererStateComplete

Yeah, we use it in a weird way here. The reason being that one of the callbacks we rely on has not been implemented in this library for the old non-trickle method. Perhaps the best way for us to move forward is to submit a patch implementing it. You can see relevant comments here: https://trac.torproject.org/projects/tor/ticket/28942#comment:28

Regardless, I think there's a better way to go about solving the issues you were having than the changes that were merged in this PR, due to the breaks outlined above where peers can no longer collect ICE candidates when crafting offers. I'll take a look at the bugs you saw with Chromium and see if I can suggest a way to fix them that also allows you to use the trickle method as intended.

Thanks again for the quick response and your help with this!

@cohosh cohosh mentioned this pull request Sep 6, 2019
@cohosh
Copy link
Contributor

cohosh commented Sep 6, 2019

TL;DR is that the problem you were seeing with your example before merging ab0c239 was not a result of bugs in the implementation of SetLocalDescription and SetRemoteDescription, but rather because the implementation of your example allowed the answering peer to add ICE candidates before receiving an initial offer.


Okay, after digging into this farther, it seems that the problems you were seeing stem from the pion-to-pion-trickle example itself, which is implemented in a way that is unlikely to happen in practice. The issue you referenced above is caused by the offering peer sending their ice candidates to the answering peer before an initial offer has been provided and the fact that the answering peer accepted these candidates before accepting an offer. This (rightfully) throws an error because the answering peer calls AddICECandidate before making a call to SetRemoteDescription with the initial offer.

The code in answer specifically allows this behaviour, which should probably not happen in practice. Here's the RFC for trickle ice: https://tools.ietf.org/html/draft-ietf-mmusic-trickle-ice-02

Note:

   Trickled Candidates:  Candidates that a trickle ICE agent is sending
      subsequently to but within the context defined by an offer or an
      answer.  Trickled candidates can be sent in parallel with
      candidate harvesting and connectivity checks

the answering candidate has to process an initial offer and then call SetRemoteDescription(offer) before subsequently receiving trickled candidates and adding them via AddICECandidate. More information on initial offers here: https://tools.ietf.org/html/draft-ietf-mmusic-trickle-ice-02#section-5

In what I suspect would be a typical use case of trickle ice, the answering peer would wait for an initial offer before creating a PeerConnection and would record some kind of session id for that offer, making sure that the receipt of trickled ice candidates happens after the receipt of the offer and comparing the session id to make sure they are from the same remote peer. This example is constructed to make the assumption that there is one connecting peer, and in fact running the offer program twice causes it to fail.

The root of the problem for this example is therefore in the application, and not in the implementation of the WebRTC API, and so did not warrant the changes above. There are two problems:

  1. The answering peer should not add trickled candidates before receiving an offer, and
  2. The offering peer should hold off sending trickled candidates until they POST their offer.
    I've included a patched version of the pion-to-pion-trickle example that makes use of synchronization primitives to solve the issues you were seeing here: Fix763 #816

Going back to your earlier point that we should do whatever the libwebrtc implementation does, you'll note that in the google libwebrtc implementation:

Your comment above that our method of signaling is not the intended use case for trickle ice is true, I'm going to look into changing back to the old method of gathering candidates. However in the meantime, I think it's important to revert the changes you've made to the core API in this PR for the reasons outlined above. Because of our use case, Snowflake requires a fair amount of stability in the API. Should we expect to see fundamental changes like this being made for the purpose of single use cases (in this case a constructed example)? As far as I can tell, this example uses the webRTC API in a very unusual way that violates the specification of trickle ICE.

@Sean-Der
Copy link
Member Author

Sean-Der commented Sep 6, 2019

Hey @cohosh, traveling today! I will merge and give you a full answer as soon as I am in my hotel.

Because of our use case, Snowflake requires a fair amount of stability in the API. Should we expect to see fundamental changes like this being made for the purpose of single use cases (in this case a constructed example)? As far as I can tell, this example uses the webRTC API in a very unusual way that violates the specification of trickle ICE.

This was a mistake on my part. I didn't intend to break compatibility with WebRTC (I actually thought my commit was fixing it!) I didn't do enough research/write tests and that caused this. In the future I will make sure to be more deliberate! It is never ok for us to break downstream, so I really want to make this always work for you!

I am really grateful Tor is using Pion. My intention is to have zero divergence from W3C/IETF WebRTC anytime we do is entirely accidental.

Sean-Der added a commit that referenced this pull request Sep 7, 2019
Assert that SetLocalDescription always starts ICE candidate
gathering. #763 broke this behavior (and diverted from the RFC)
by starting gathering in SetRemoteDescription in some cases.

Fixes #763
@cohosh
Copy link
Contributor

cohosh commented Sep 8, 2019

Thank you for all your help and for the feedback on our use of the library. I'm going to take a look at your comments on our pion trac ticket and make some fixes on our end. I think you're right that we shouldn't be using trickle. I'll back to you if we have PRs for anything else we might need there.

BTW, nice GopherCon talk!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants