Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE Connection with relay candidates using coTURN failed #65

Closed
enobufs opened this issue Jun 7, 2019 · 13 comments
Closed

ICE Connection with relay candidates using coTURN failed #65

enobufs opened this issue Jun 7, 2019 · 13 comments

Comments

@enobufs
Copy link
Member

enobufs commented Jun 7, 2019

Your environment.

  • Version: pion/webrtc@v2.0.19
  • Browser: n/a
  • Other Information - stacktraces, related issues, suggestions how to fix, links for us to have context

What did you do?

  • Attempted to establish one datachannel between two nodes using coTURN (relay)
  • I have set ICETransportPolicy to webrtc.ICETransportPolicyRelay (assuming, it uses relay candidate only)
  • coTURN ver: Version Coturn-4.5.0.3 'dan Eider'

What did you expect?

Both ends would connect with each other through a relay candidate

What happened?

Gathering relay (TURN) candidates are successful, but ICE connection state goes to "failed" in about 10 sec

Logs

Answerer side

$ SGRTC_LOG_TRACE=all go run main.go -epID server
is acceptor
sgrtc DEBUG: 23:01:33.468397 api.go:163: factory.Writer: &{0xc0000ac000}
sgrtc DEBUG: 23:01:33.468891 api.go:164: factory.DefaultLogLevel: Trace
sgrtc DEBUG: 23:01:33.468905 api.go:165: factory.ScopeLevels: map[]
signaling INFO: 2019/06/07 23:01:33 NewSignaling for server
sgrtc INFO: 2019/06/07 23:01:33 new endpoint: server
sgrtc DEBUG: 23:01:33.469448 api.go:317: Signaling started on server
sgrtc INFO: 2019/06/07 23:01:33 api run loop started
signaling DEBUG: 23:01:33.742738 signaling.go:96: sig(server): ev status
signaling DEBUG: 23:01:33.742766 signaling.go:109: sig(server): ev - PNConnectedCategory
sgrtc TRACE: 23:01:33.742797 api.go:997: signaled to select, but no waiter
signaling DEBUG: 23:01:54.668238 signaling.go:116: sig(server): ev message
signaling DEBUG: 23:01:54.668262 signaling.go:121: sig(server) received on: server
sgrtc DEBUG: 23:01:54.668394 api.go:797: signal received: from=client to=server
sgrtc DEBUG: 23:01:54.668405 api.go:798: signal received: body={"type":"offer","sdp":"v=0\r\no=- 572401240 1559973714 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=fingerprint:sha-256 C5:75:D0:B0:E0:CD:73:0B:F3:70:DD:FD:2C:38:7D:83:39:BC:97:41:CE:E9:3A:BB:02:B8:49:71:44:67:3B:DD\r\na=group:BUNDLE 0\r\nm=application 9 DTLS/SCTP 5000\r\nc=IN IP4 0.0.0.0\r\na=setup:active\r\na=mid:0\r\na=sendrecv\r\na=sctpmap:5000 webrtc-datachannel 1024\r\na=ice-ufrag:dCZLYxUikOOSvHee\r\na=ice-pwd:ddqUmulNHWQRLRsqAZTXBUtbqpwpZVdJ\r\na=candidate:foundation 1 udp 16777215 18.237.97.24 10134 typ relay raddr 10.0.0.135 rport 51583 generation 0\r\na=candidate:foundation 2 udp 16777215 18.237.97.24 10134 typ relay raddr 10.0.0.135 rport 51583 generation 0\r\na=end-of-candidates\r\na=setup:actpass\r\n"}
sgrtc DEBUG: 23:01:54.668415 api.go:856: api: new session detected
sgrtc TRACE: 23:01:54.668422 api.go:997: signaled to select, but no waiter
sgrtc DEBUG: 23:01:54.685386 api.go:412: Accept() on socket 0
sgrtc DEBUG: 23:01:54.685429 api.go:320: Endpoint server exists
session INFO: 2019/06/07 23:01:54 using ICE server: turn:ec2-18-237-97-24.us-west-2.compute.amazonaws.com (1560060033, YbOw+gzFEZYE87Yo5nS1nKR/xnw=)
session INFO: 2019/06/07 23:01:54 new session from server to peer client
pc INFO: 2019/06/07 23:01:54 signaling state changed to have-remote-offer
session INFO: 2019/06/07 23:01:54 signaling state has changed: have-remote-offer
ice DEBUG: 23:01:54.793277 agent.go:367: Started agent: isControlling? false, remoteUfrag: "dCZLYxUikOOSvHee", remotePwd: "ddqUmulNHWQRLRsqAZTXBUtbqpwpZVdJ"
ice INFO: 2019/06/07 23:01:54 Setting new connection state: Checking
ice TRACE: 23:01:54.793692 selection.go:205: pinging all candidates
pc INFO: 2019/06/07 23:01:54 ICE connection state changed: checking
session INFO: 2019/06/07 23:01:54 ICE connection state has changed: checking
ice TRACE: 23:01:54.794043 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
pc INFO: 2019/06/07 23:01:54 signaling state changed to stable
session INFO: 2019/06/07 23:01:54 signaling state has changed: stable
signaling DEBUG: 23:01:54.794223 signaling.go:165: signal sent: from=server to=client
signaling DEBUG: 23:01:54.794240 signaling.go:166: signal sent: body={"type":"answer","sdp":"v=0\r\no=- 609153572 1559973714 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=fingerprint:sha-256 7D:26:06:62:77:0D:26:7F:ED:D3:75:6B:55:D6:E9:63:1B:C9:E2:A5:BD:F4:2C:38:B3:9E:72:F8:A0:29:46:50\r\na=group:BUNDLE 0\r\nm=application 9 DTLS/SCTP 5000\r\nc=IN IP4 0.0.0.0\r\na=setup:active\r\na=mid:0\r\na=sendrecv\r\na=sctpmap:5000 webrtc-datachannel 1024\r\na=ice-ufrag:UhTUjahXcZjlMKwx\r\na=ice-pwd:MvLRRbgxrDUeulFtGkaNVixfVfOxspQV\r\na=candidate:foundation 1 udp 16777215 18.237.97.24 13077 typ relay raddr 10.0.0.135 rport 49389 generation 0\r\na=candidate:foundation 2 udp 16777215 18.237.97.24 13077 typ relay raddr 10.0.0.135 rport 49389 generation 0\r\na=end-of-candidates\r\n"}
ice TRACE: 23:01:56.794084 selection.go:205: pinging all candidates
ice TRACE: 23:01:56.794176 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:01:58.793789 selection.go:205: pinging all candidates
ice TRACE: 23:01:58.793869 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:00.793577 selection.go:205: pinging all candidates
ice TRACE: 23:02:00.793652 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:02.793638 selection.go:205: pinging all candidates
ice TRACE: 23:02:02.793723 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:04.793656 selection.go:205: pinging all candidates
ice TRACE: 23:02:04.793741 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:06.793512 selection.go:205: pinging all candidates
ice TRACE: 23:02:06.793599 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:08.793586 selection.go:205: pinging all candidates
ice TRACE: 23:02:08.793678 agent.go:706: ping STUN from relay 18.237.97.24:13077 related 10.0.0.135:49389 to relay 18.237.97.24:10134 related 10.0.0.135:51583
ice TRACE: 23:02:10.793532 selection.go:205: pinging all candidates
ice TRACE: 23:02:10.793724 agent.go:423: max requests reached for pair prio 72057589759737855 (local, prio 16777215) relay 18.237.97.24:13077 related 10.0.0.135:49389 <-> relay 18.237.97.24:10134 related 10.0.0.135:51583 (remote, prio 16777215), marking it as failed

Offerer side

$ SGRTC_LOG_TRACE=all go run main.go -epID client -peerID server
is initiator
sgrtc DEBUG: 23:01:54.223865 api.go:163: factory.Writer: &{0xc0000b4000}
sgrtc DEBUG: 23:01:54.224410 api.go:164: factory.DefaultLogLevel: Trace
sgrtc DEBUG: 23:01:54.224433 api.go:165: factory.ScopeLevels: map[]
signaling INFO: 2019/06/07 23:01:54 NewSignaling for client
sgrtc INFO: 2019/06/07 23:01:54 new endpoint: client
sgrtc DEBUG: 23:01:54.224869 api.go:317: Signaling started on client
sgrtc INFO: 2019/06/07 23:01:54 api run loop started
signaling DEBUG: 23:01:54.478936 signaling.go:96: sig(client): ev status
signaling DEBUG: 23:01:54.478972 signaling.go:109: sig(client): ev - PNConnectedCategory
sgrtc TRACE: 23:01:54.479024 api.go:997: signaled to select, but no waiter
session INFO: 2019/06/07 23:01:54 using ICE server: turn:ec2-18-237-97-24.us-west-2.compute.amazonaws.com (1560060033, YbOw+gzFEZYE87Yo5nS1nKR/xnw=)
session INFO: 2019/06/07 23:01:54 new session from client to peer server
session DEBUG: 23:01:54.492700 session.go:234: [client] adding data channel info for label 'data'
session DEBUG: 23:01:54.492734 session.go:306: ep(client): dc label=data reliable ordered
pc INFO: 2019/06/07 23:01:54 signaling state changed to have-local-offer
session INFO: 2019/06/07 23:01:54 signaling state has changed: have-local-offer
signaling DEBUG: 23:01:54.565490 signaling.go:165: signal sent: from=client to=server
signaling DEBUG: 23:01:54.565522 signaling.go:166: signal sent: body={"type":"offer","sdp":"v=0\r\no=- 572401240 1559973714 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=fingerprint:sha-256 C5:75:D0:B0:E0:CD:73:0B:F3:70:DD:FD:2C:38:7D:83:39:BC:97:41:CE:E9:3A:BB:02:B8:49:71:44:67:3B:DD\r\na=group:BUNDLE 0\r\nm=application 9 DTLS/SCTP 5000\r\nc=IN IP4 0.0.0.0\r\na=setup:active\r\na=mid:0\r\na=sendrecv\r\na=sctpmap:5000 webrtc-datachannel 1024\r\na=ice-ufrag:dCZLYxUikOOSvHee\r\na=ice-pwd:ddqUmulNHWQRLRsqAZTXBUtbqpwpZVdJ\r\na=candidate:foundation 1 udp 16777215 18.237.97.24 10134 typ relay raddr 10.0.0.135 rport 51583 generation 0\r\na=candidate:foundation 2 udp 16777215 18.237.97.24 10134 typ relay raddr 10.0.0.135 rport 51583 generation 0\r\na=end-of-candidates\r\na=setup:actpass\r\n"}
signaling DEBUG: 23:01:54.913179 signaling.go:116: sig(client): ev message
signaling DEBUG: 23:01:54.913216 signaling.go:121: sig(client) received on: client
sgrtc DEBUG: 23:01:54.913290 api.go:797: signal received: from=server to=client
sgrtc DEBUG: 23:01:54.913302 api.go:798: signal received: body={"type":"answer","sdp":"v=0\r\no=- 609153572 1559973714 IN IP4 0.0.0.0\r\ns=-\r\nt=0 0\r\na=fingerprint:sha-256 7D:26:06:62:77:0D:26:7F:ED:D3:75:6B:55:D6:E9:63:1B:C9:E2:A5:BD:F4:2C:38:B3:9E:72:F8:A0:29:46:50\r\na=group:BUNDLE 0\r\nm=application 9 DTLS/SCTP 5000\r\nc=IN IP4 0.0.0.0\r\na=setup:active\r\na=mid:0\r\na=sendrecv\r\na=sctpmap:5000 webrtc-datachannel 1024\r\na=ice-ufrag:UhTUjahXcZjlMKwx\r\na=ice-pwd:MvLRRbgxrDUeulFtGkaNVixfVfOxspQV\r\na=candidate:foundation 1 udp 16777215 18.237.97.24 13077 typ relay raddr 10.0.0.135 rport 49389 generation 0\r\na=candidate:foundation 2 udp 16777215 18.237.97.24 13077 typ relay raddr 10.0.0.135 rport 49389 generation 0\r\na=end-of-candidates\r\n"}
pc INFO: 2019/06/07 23:01:54 signaling state changed to stable
session INFO: 2019/06/07 23:01:54 signaling state has changed: stable
ice DEBUG: 23:01:54.946044 agent.go:367: Started agent: isControlling? true, remoteUfrag: "UhTUjahXcZjlMKwx", remotePwd: "MvLRRbgxrDUeulFtGkaNVixfVfOxspQV"
ice INFO: 2019/06/07 23:01:54 Setting new connection state: Checking
ice TRACE: 23:01:54.946098 selection.go:90: pinging all candidates
pc INFO: 2019/06/07 23:01:54 ICE connection state changed: checking
session INFO: 2019/06/07 23:01:54 ICE connection state has changed: checking
ice TRACE: 23:01:54.946148 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:01:56.950079 selection.go:90: pinging all candidates
ice TRACE: 23:01:56.950190 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:01:58.949974 selection.go:90: pinging all candidates
ice TRACE: 23:01:58.950056 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:02:00.950202 selection.go:90: pinging all candidates
ice TRACE: 23:02:00.950325 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:02:02.949902 selection.go:90: pinging all candidates
ice TRACE: 23:02:02.949993 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:02:04.949762 selection.go:90: pinging all candidates
ice TRACE: 23:02:04.949901 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:02:04.950027 selection.go:35: check timeout reached and no valid candidate pair found, marking connection as failed
ice INFO: 2019/06/07 23:02:04 Setting new connection state: Failed
pc INFO: 2019/06/07 23:02:04 ICE connection state changed: failed
session INFO: 2019/06/07 23:02:04 ICE connection state has changed: failed
ice TRACE: 23:02:06.947587 selection.go:90: pinging all candidates
ice TRACE: 23:02:06.947659 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389
ice TRACE: 23:02:08.945718 selection.go:90: pinging all candidates
ice TRACE: 23:02:08.945797 agent.go:706: ping STUN from relay 18.237.97.24:10134 related 10.0.0.135:51583 to relay 18.237.97.24:13077 related 10.0.0.135:49389

coTURN log

89459: handle_udp_packet: New UDP endpoint: local addr 0.0.0.0:3478, remote addr 98.207.205.254:51583
89459: session 001000000000000028: realm <sgcoturn.xxx.com> user <>: incoming packet message processed, error 401: Unauthorized
89459: IPv4. Local relay addr: 172.31.23.244:10134
89459: session 001000000000000028: new, realm=<sgcoturn.xxx.com>, username=<1560060033>, lifetime=600
89459: session 001000000000000028: realm <sgcoturn.xxx.com> user <1560060033>: incoming packet ALLOCATE processed, success
89459: handle_udp_packet: New UDP endpoint: local addr 0.0.0.0:3478, remote addr 98.207.205.254:49389
89459: session 001000000000000029: realm <sgcoturn.xxx.com> user <>: incoming packet message processed, error 401: Unauthorized
89459: IPv4. Local relay addr: 172.31.23.244:13077
89459: session 001000000000000029: new, realm=<sgcoturn.xxx.com>, username=<1560060033>, lifetime=600
89459: session 001000000000000029: realm <sgcoturn.xxx.com> user <1560060033>: incoming packet ALLOCATE processed, success
89459: session 001000000000000029: peer 0.0.0.0 lifetime updated: 300
89459: session 001000000000000029: realm <sgcoturn.xxx.com> user <1560060033>: incoming packet CREATE_PERMISSION processed, success
89460: session 001000000000000028: peer 0.0.0.0 lifetime updated: 300
89460: session 001000000000000028: realm <sgcoturn.xxx.com> user <1560060033>: incoming packet CREATE_PERMISSION processed, success
@enobufs
Copy link
Member Author

enobufs commented Jun 8, 2019

I have updated the above logs (correctly showing TRACE level logs).

With the -V option with turnserver, I can see:

73: session 001000000000000002: realm <sgcoturn.xxx.com> user <1560060033>: incoming packet SEND processed, success

This must be the "Send Indication" request received on port 3478. The log says "processed, success", but the remote end does not see the packet.

@Sean-Der, now, I have two questions:

  1. If the allocate port on the SAME coTURN server running on EC2 can can communicate with each other. These ports are on the public IP address assigned for the EC2 instance. Do these public ports do hairpinning?
  2. Even when ICETransportPolicyRelay is used, shouldn't the connectivity check be performed from non-relay local candidates if the remote candidate is of type relay?

@Sean-Der
Copy link
Member

Sean-Der commented Jun 8, 2019

1.) That makes sense to me, but I think that is something that will need to be solved on the coTURN side. I don't think WebRTC clients can do anything to influence this behavior (not for sure though)

2.) https://www.w3.org/TR/webrtc/#rtcicetransportpolicy-enum

ICETransportPolicyRelay effects the local gathering, so the only candidates that will exist locally are relay. The remote is still free to add any candidate types it wants.

@enobufs
Copy link
Member Author

enobufs commented Jun 8, 2019

Right, (1) is totally server side issue. Specifically, AWS's NAT issue. If hairpin routing is not possible, then we have to make sure both ends use different relay servers, I believe.

@enobufs
Copy link
Member Author

enobufs commented Jun 8, 2019

@Sean-Der I was able to run coTURN on my local PC. Data channel connection was successfully established using ICETransportPolicyRelay!

But... answerer side (being an echo server) does not seem to receive data (dc.OnMessage does not seem to be called) although the handshake at SCTP level and below succeeded. (it has bidirectional connectivity up to SCTP layer, specifically DCEP layer)

If I turn off ICETransportPolicyRelay option, echo comes back. It does not really make sense to me but do you have any thoughts on this? (How ICETransportPolicyRelay option affect OnMessage callback?)

Note: I am using today's latest webrtc, ice and turnc

@Sean-Der
Copy link
Member

Sean-Der commented Jun 8, 2019

@enobufs nice!

That is really strange, I was unable to reproduce it unfortunately :( I patched pion-to-pion like this

If you do a tcpdump do you see the messages coming through?

@enobufs
Copy link
Member Author

enobufs commented Jun 9, 2019

@Sean-Der

  • pion-to-pion example works with local coturn
  • pion-to-pion example also fails with cotrun on EC2 (definitely a server side issue)

Open issue:

  • Why my test's OnMessage() did not get called when ICETransportPolicyRelay is used
  • Issue with coturn on EC2 (non-pion issue)

@enobufs
Copy link
Member Author

enobufs commented Jun 9, 2019

@Sean-Der quick question... could you tell me why we have two a=candidate lines with the same transport address?

v=0
o=- 12549196 1560120562 IN IP4 0.0.0.0
s=-
t=0 0
a=fingerprint:sha-256 C7:1F:12:BC:27:D0:D3:E8:FE:F2:45:DF:AD:F7:65:27:7C:AC:FA:7B:6C:60:9C:BD:B5:7A:48:B6:B5:D0:0A:63
a=group:BUNDLE 0
m=application 9 DTLS/SCTP 5000
c=IN IP4 0.0.0.0
a=setup:active
a=mid:0
a=sendrecv
a=sctpmap:5000 webrtc-datachannel 1024
a=ice-ufrag:BhcimdpcpjPYygCW
a=ice-pwd:dmUZPNqUevdzpNCniScTulhdeJdjmcAG
a=candidate:foundation 1 udp 16777215 192.168.86.220 15864 typ relay raddr 127.0.0.1 rport 63492 generation 0
a=candidate:foundation 2 udp 16777215 192.168.86.220 15864 typ relay raddr 127.0.0.1 rport 63492 generation 0
a=end-of-candidates

above is from the pion-to-pion example.

@enobufs
Copy link
Member Author

enobufs commented Jun 9, 2019

I am still searching for reasons why my test does not receive application data via a datachannel. I added the printfs around a.netConn.Read(buffer) in the sctp association.go.

In my test, both ends' ice conn state go to "connected", then it successfully finishes DCEP handshake. But after that, the Read() method never returns although sending side is sending data to the data channel (OnMessage() does not get called, because, SCTP's readLoop stops receiving data) ... so weird.

@Sean-Der
Copy link
Member

I fixed a bug last night in master where DTLS/SCTP would improperly stop. Might be relevant?

Is ICE getting the incoming data? The readLoop for that is in candidate_base.go

I will setup coturn on AWS tonight and start debugging! I am doing errands today sorry

@enobufs
Copy link
Member Author

enobufs commented Jun 10, 2019

@Sean-Der I think that is not related. I found something interesting.

When the data channel message is greater than 888 bytes, the remote does not receive it. I am still not sure whether that is dropped by turnc or coturn. I modified the pion-to-pion example to send the text longer than 888, and that problem happens.

The message longer than 888 bytes reach CandidateRelay, permission.Write(). FYI.

@enobufs
Copy link
Member Author

enobufs commented Jun 10, 2019

Looking at wireshark, a 1000-byte message is transmitted to coTurn as a "Send Indication" packet (which contains 1065 bytes of DATA), then it returns "Data Indication" packet (which includes the identical DATA) to the remote candidate. So, coTURN is routing the large data. But I do not see any log indicating reciving the data on the receiving end...

@Sean-Der
Copy link
Member

We fixed the issue of small MTU when using TURN with b28205a

@enobufs any other researching/fixing we should do? Or good to close

@enobufs
Copy link
Member Author

enobufs commented Jun 10, 2019

Good to close! Let me close...

@enobufs enobufs closed this as completed Jun 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants