Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with TCP ICE #1356

Closed
jech opened this issue Aug 8, 2020 · 20 comments
Closed

Issues with TCP ICE #1356

jech opened this issue Aug 8, 2020 · 20 comments

Comments

@jech
Copy link
Contributor

jech commented Aug 8, 2020

Here are the results of my experiments with TCP ICE in the Unnamed SFU.

  1. When the browser is the offerer, and all media traffic is in the client->server direction, the connection works and then switches to either the "disconnected" or "failed" state after a few seconds (I'm not sure which). No data traffic goes through. The logging console consoles itself with "sendRR: failed to send packet: io: read/write on closed pipe", whih is the expected behaviour when ICE is failed. (@Sean-Der, by the way, I find this error value suprising.)

  2. When the server is the offerer, the connection works if UDP is functional. If I firewall away all UDP traffic, then the connection never reaches a stable state. Perhaps it's a dependency on DNS (by firewalling UDP, I also disable DNS).

@jeremija
Copy link
Member

jeremija commented Aug 9, 2020

@jech regarding issue no 2, could you check if you see any candidates gathered? I find it easiest to look at about:webrtc in Firefox:
image

@Sean-Der
Copy link
Member

Sean-Der commented Aug 9, 2020

@jech Yes we REALLY need to fix errors bubbling up pion/ice#252 it happened as a side effect of the ICE Agent be able to go to failed. Before the sockets would never be closed so we didn't have the issues.

I will be able to fix this next week hopefully! I am on vacation and promised my wife no programming :p if you want to send a PR I would love to merge it! I am just on my phone for a while.

IMO the API should always let you push video (even if you are disconnected) and then you respond to NACK/Receiver Reports/ICE Connection State.

@jech
Copy link
Contributor Author

jech commented Aug 11, 2020

I'm unable to get anything to work with Firefox. Perhaps its due to testing with prive IPv4 addresses (I'm testing on my LAN).

When trying to receive a stream from the server (the browser is the answerer), I get three connected TCP pairs (IPv4) and a bunch of failed TCP pairs (for IPv6).

When trying to send a stream to the server (the broswer is the offerer), I get a single IPv4 pair which is in the state inprogress, nominated=false, selected=false, plus a bunch of IPv6 pairs, all of them failed. After a few minutes, the IPv6 pairs switch to inprogress.

@jech
Copy link
Contributor Author

jech commented Aug 11, 2020

(comment removed, this is no longer true)

@jech
Copy link
Contributor Author

jech commented Aug 11, 2020

Testing with the remote server, I'm seeing loss rates on the order of 75% over TCP (in both directions). (This causes the congestion controller to back off until the stream is unusable, and occasional ICE disconnects.)

@jech
Copy link
Contributor Author

jech commented Aug 11, 2020

Oh, and just in case I'm doing something wrong, here's the code I use to enable TCP:

		if tcpListener != nil {
			mux := webrtc.NewICETCPMux(nil, tcpListener, 8)
			s.SetICETCPMux(mux)
			s.SetNetworkTypes([]webrtc.NetworkType{
				webrtc.NetworkTypeUDP4,
				webrtc.NetworkTypeUDP6,
				webrtc.NetworkTypeTCP4,
				webrtc.NetworkTypeTCP6,
			})
		}

@jeremija
Copy link
Member

jeremija commented Aug 15, 2020

Hi @jech, I had a busy week. Wanted to try the test link you provided, but I see you removed it. I assume you're still having issues with Firefox?

One thing I noticed when using Firefox locally is that I sometimes have to use IP address 127.0.0.1 instead of localhost for the ICE candidates to show up, but this might be unrelated to your problem.

When the server is the offerer, the connection works if UDP is functional. If I firewall away all UDP traffic, then the connection never reaches a stable state. Perhaps it's a dependency on DNS (by firewalling UDP, I also disable DNS).

Can you reproduce this on https://peercalls.com ? Here offers are only ever created by the SFU, both UDP and TCP network types are enabled, and the host TCP candidates currently have a higher priority (this bug was fixed in #1358), so I'm just curious if you'll be able to connect using TCP?

@jech
Copy link
Contributor Author

jech commented Aug 15, 2020

Wanted to try the test link you provided, but I see you removed it.

Sorry. I made quite a few changes to the SFU these last weeks (notably implemented ICE restarts), and I've deployed the master branch instead of the ice-tcp branch. I'm on holidays and on very slow Internet right now, I'll see if I find the time to deploy ice-tcp tomorrow.

I assume you're still having issues with Firefox?

The main reason I cannot merge ice-tcp into master is the amount of packet loss: it's on the order of 30% to 70%, which makes the congestion controller fall back to rates that are too low to be usable. The loss rate is similar in both directions (client->server and server->client), so it's not an issue in my loss rate estimator, since Chrome gives me similar results. It's not an issue with my network, since TURN over TCP has 0% packet loss (and the TURN server is colocated with the SFU).

@jech
Copy link
Contributor Author

jech commented Aug 16, 2020

I've now put the ICE-TCP branch of the SFU on https://vps-63c87489.vps.ovh.net:8444/. (The master branch is running on port 8443.)

I'm using the following firewall configuration for testing:

#!/bin/sh
ipv4=51.210.14.2
ipv6=2001:41d0:404:200::62ef
turn=3479

iptables -I INPUT -s $ipv4 -p udp -j DROP
ip6tables -I INPUT -s $ipv6 -p udp -j DROP
iptables -I INPUT -s $ipv4 -p tcp --sport $turn -j DROP
ip6tables -I INPUT -s $ipv6 -p tcp --sport $turn -j DROP

The symptoms are less visible now that ICE restarts are implemented. However, you should easily be able to check the TCP flow's extremely high drop rate.

@jeremija
Copy link
Member

jeremija commented Aug 16, 2020

Thanks, I'm going to check this out later today.

In the meantime, would you mind trying to reproduce this issue on peercalls.com?

Can you reproduce this on https://peercalls.com ? Here offers are only ever created by the SFU, both UDP and TCP network types are enabled, and the host TCP candidates currently have a higher priority (this bug was fixed in #1358), so I'm just curious if you'll be able to connect using TCP?

@jech
Copy link
Contributor Author

jech commented Aug 17, 2020

In the meantime, would you mind trying to reproduce this issue on peercalls.com?

I'm not at home this week, and I'm connected over an IPv4-only 4G link. FWIW, it looks like Chromium is connecting over TCP while Firefox is connecting over UDP.

@jeremija
Copy link
Member

jeremija commented Aug 18, 2020

Hmm, very weird. I blocked all outgoing UDP traffic except DNS and all outgoing TCP traffic to STUN/TURN servers:

Currently only have a MacBook handy so this is the pfctl config I'm using:

block out proto udp all
block out proto tcp to any port 3479
block out proto tcp to any port 5349

pass out proto udp to any port 53

On peercalls.com I'm able to get a local prflx candidate in Firefox with my public IP, whereas I don't see this candidate on your SFU: I only see the candidates with my LAN IP address.

Here are the candidate pairs from peercalls.com:

image

And here's what I get on your SFU:

image

Eventually all of these inprogress candidates change their state to failed. I'm 99% sure that this is the reason for connectivity issues in Firefox, I'm just not sure why there is no prflx candidate. Actually I'm confused about where does the prflx candidate even come from since I have blocked the STUN/TURN ports.

@jech
Copy link
Contributor Author

jech commented Aug 18, 2020

Eventually all of these inprogress candidates change their state to failed.

Given the packet loss I'm seeing, I'm not suprised that ICE connectivity checks eventually fail.

Actually I'm confused about where does the prflx candidate even come from since I have blocked the STUN/TURN ports.

I don't fully undestand ICE, but isn't prflx a candidate obtained when the peer acts as a STUN server? A candidate obtained from the STUN server acting as a STUN server (duh) is srflx, I think.

@jeremija
Copy link
Member

jeremija commented Aug 19, 2020

After reading the RFC 5245 and some other docs, I think I finally understand it. The prflx candidates are obtained during connectivity checks. So I guess it all starts with the host candidates, and then the server should realize the peer's external (NAT) address once it receives (I assume) the initial binding request from peer, and let the peer know of it's external address. It's my understanding that this is how a prflx candidate is created. If anyone knows better, please correct me!

I found this article informative: http://wiki.innovaphone.com/i.php?8246

@jech Do you perhaps have any special firewall rules set on your server? I assume your server is directly exposed to the internet?

Have you tried to enable ICE debug logging?

@jech
Copy link
Contributor Author

jech commented Aug 19, 2020 via email

@jeremija
Copy link
Member

jeremija commented Aug 19, 2020

I analyzed the traffic with wireshark and I see a lot of Binding Requests from the browser, but never seem to receive a Binding Response from the server. However, I see a bunch of Binding Requests from the server, and about the same amount of Binding Success Responses from the browser.

Can you share the arguments you provided to NewICETCPMux?

@jech
Copy link
Contributor Author

jech commented Aug 19, 2020 via email

@jech
Copy link
Contributor Author

jech commented Oct 2, 2020

I've just tried with beta.6, and passive TCP is still not useful.

@jech
Copy link
Contributor Author

jech commented Aug 6, 2021

I confirm that ICE-TCP is still not working for me in v3.0.32. If I enable logging, I get a continuous stream of

ice WARNING: 2021/08/06 22:27:32 Discarded message from [xxxx:xxxx...]:8081, not a valid remote candidate

where the value between brackets is the local IPv6 address. If I disable IPv6 on the remote host, the problem persists. If I disable IPv6 on both the remote and the local host, the problem goes away, but the connection establishment in the server->client direction takes a very long time (on the order of 10s).

Setting the listener to use "tcp4" does not fix the issue.

@Sean-Der Sean-Der added this to the 3.2.0 milestone Jan 23, 2022
Sean-Der added a commit to pion/ice that referenced this issue Feb 20, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Resolves pion/webrtc#2125
Resolves pion/webrtc#1356
Sean-Der added a commit to pion/ice that referenced this issue Feb 20, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Resolves pion/webrtc#2125
Resolves pion/webrtc#1356
Sean-Der added a commit to pion/ice that referenced this issue Feb 21, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Resolves pion/webrtc#2125
Resolves pion/webrtc#1356
pionbot pushed a commit to pion/ice that referenced this issue Feb 21, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Resolves pion/webrtc#2125
Resolves pion/webrtc#1356
Sean-Der added a commit to pion/ice that referenced this issue Feb 21, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Relates to pion/webrtc#2125
Relates to pion/webrtc#1356
pionbot pushed a commit to pion/ice that referenced this issue Feb 21, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Relates to pion/webrtc#2125
Relates to pion/webrtc#1356
Sean-Der added a commit to pion/ice that referenced this issue Feb 21, 2022
TCPMux before would create one internal connection per ufrag. This could
cause remote IPv6 traffic to be dispatched to a local IPv4 handler (or
the inverse). The ice.Agent would then discard the traffic since a
candidate pair must be the same IP version.

This commit now creates two connections per ufrag. When requesting a
connection for a ufrag the user must specify if they want IPv4 or IPv6.

Relates to pion/webrtc#2125
Relates to pion/webrtc#1356
Sean-Der added a commit to pion/ice that referenced this issue Feb 21, 2022
A controlled Agent would discard incoming Binding Requests if it didn't
cause the pair to be selected. For UDP Candidate this would be
interpreted as packet loss. For TCP Candidates not responding with a
Binding Success could be interpreted as a failure.

Firefox's ICE Agent would disconnect TCP Candidates because of this
behavior.

Resolves to pion/webrtc#2125
Resolves to pion/webrtc#1356
See https://bugzilla.mozilla.org/show_bug.cgi?id=1756460
Sean-Der added a commit to pion/ice that referenced this issue Feb 21, 2022
A controlled Agent would discard incoming Binding Requests if it didn't
cause the pair to be selected. For UDP Candidate this would be
interpreted as packet loss. For TCP Candidates not responding with a
Binding Success could be interpreted as a failure.

Firefox's ICE Agent would disconnect TCP Candidates because of this
behavior.

Resolves to pion/webrtc#2125
Resolves to pion/webrtc#1356
See https://bugzilla.mozilla.org/show_bug.cgi?id=1756460
Sean-Der added a commit to pion/ice that referenced this issue Feb 22, 2022
A controlled Agent would discard incoming Binding Requests if it didn't
cause the pair to be selected. For UDP Candidate this would be
interpreted as packet loss. For TCP Candidates not responding with a
Binding Success could be interpreted as a failure.

Firefox's ICE Agent would disconnect TCP Candidates because of this
behavior.

Resolves to pion/webrtc#2125
Resolves to pion/webrtc#1356
See https://bugzilla.mozilla.org/show_bug.cgi?id=1756460
@Sean-Der
Copy link
Member

ICE-TCP now works on FireFox. Getting this to work required fixing two distinct issues.

Make TCPMux IPv6 Aware

The TCPMux before would create one virtual candidate, and then pass all traffic through it. pion/ice.Agent was unable to handle IPv6 traffic with this design. It would appear that a local IPv4 candidate got a IPv6 request. pion/ice would discard the traffic. The TCPMux now creates a IPv4 and IPv6 candidate and everything works.

Don't drop Binding Requests in Controlled Agent

A controlled Agent would discard incoming Binding Requests if it didn't
cause the pair to be selected. For UDP Candidate this would be
interpreted as packet loss. For TCP Candidates not responding with a
Binding Success could be interpreted as a failure.

Firefox's ICE Agent would correctly disconnect TCP Candidates because of this
behavior.

The controlled Agent now correctly responds with a Binding Success message.


To test these behaviors I created https://github.com/Sean-Der/ice-tcp-test. It provides a Pion WebRTC server that supports IPv4/IPv6 ICE-TCP. You can then connect to it as an offerer or answer. If anyone is trying to reproduce issues this would be a good place to start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants