Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TCP Fastopen (TFO) doesn't work reliably in China Mobile cellular network #1669

Open
rlei opened this Issue Sep 7, 2017 · 13 comments

Comments

Projects
None yet
7 participants
@rlei
Copy link

rlei commented Sep 7, 2017

What version of shadowsocks-libev are you using?

Client: ss-libev Android 4.1.8
Server: 3.0.7

What operating system are you using?

Phone: Pixel XL with Android 8.0
Server: A KVM VPS running Ubuntu 16.04.3 LTS x64 with kernel 4.12.6-041206-generic

What did you do?

  1. enable two way TFO with sudo sysctl net.ipv4.tcp_fastopen=3 at the server side
  2. start ss-libev, logs observed:

2017-09-07 06:55:45 INFO: using tcp fast open
2017-09-07 06:55:45 INFO: UDP relay enabled
2017-09-07 06:55:45 INFO: initializing ciphers... salsa20
2017-09-07 06:55:45 INFO: using nameserver: 8.8.8.8
2017-09-07 06:55:45 INFO: using nameserver: 8.8.4.4
2017-09-07 06:55:45 INFO: tcp server listening at 0.0.0.0:nnnn
2017-09-07 06:55:45 INFO: udp server listening at 0.0.0.0:nnnn

  1. disable WiFi on the phone
  2. connect to ss server in the ss-libev app on the phone. app shows "Connected, tap to check connection"
  3. tap "... tap to check connection"

What did you expect to see?

The ss-libev Android app should display "Success: xxxms latency".

What did you see instead?

The app displayed "Internet Unavailable" and "Failed to detect internet connection: SSL handshake timed out".

What is your config in detail (with all sensitive info masked)?

  • client:
    • TCP Fast Open: On (it's on by default on Pixel XL and is unmodifiable)
    • Route: Bypass LAN & mainland China, GFW List (both tested)
  • server:
    • TFO enabled in ss-server
    • two way TFO enabled via sysctl net.ipv4.tcp_fastopen=3
  • Encrypt method: SALSA20, CHACHA20-IETF-POLY1305 (both tested)

More related details

This issue can only be reproduced with the following combination:

  • China Mobile cellular data
  • Pixel XL (which has TFO default to be on)
  • On server net.ipv4.tcp_fastopen is set to 3

I also tested other combinations and they all seem to work, e.g.:

  • Pixel XL + China Mobile cellular data + net.ipv4.tcp_fastopen=1
  • Pixel XL + China Telecom/Unicom broadband + net.ipv4.tcp_fastopen=3
  • Nexus 6P + China Telecom cellular data + net.ipv4.tcp_fastopen=3
  • Nexus 6P + China Telecom/Unicom broadband + net.ipv4.tcp_fastopen=3

Please also note Nexus 6P has TFO off at the phone side so the value of net.ipv4.tcp_fastopen doesn't matter at all.

I did a quick tcpdump and it shows that when TFO=3 at server side, ss-server did send back data to the phone side correctly. It's just the phone never receive those packets.

So, it looks like China Mobile's firewall does have issue(s) with TFO cookies.

I'm fully aware that this might not be a ss-libev's problem at all, but maybe this kind of quirks should be documented somewhere so that less "ss is being detected/blocked" cries shall be made?

@madeye madeye added the not a bug label Sep 7, 2017

@madeye

This comment has been minimized.

Copy link

madeye commented Sep 7, 2017

It's expected that TFO won't work behind some broken NAT device, e.g. China Mobile's WAN gate.

I marked this issue as "not a bug" in case anyone else report the same problem in the future.

@wongsyrone

This comment has been minimized.

Copy link

wongsyrone commented Sep 7, 2017

Android端测速是不是通过ss-tunnel走udp出去的?

@madeye

This comment has been minimized.

Copy link

madeye commented Sep 7, 2017

@wongsyrone Nope. It just counts the time of accessing www.google.com.

@wongsyrone

This comment has been minimized.

Copy link

wongsyrone commented Sep 7, 2017

好吧,当我没说。上游有个udp checksum的修复,有兴趣可以加上:ambrop72/badvpn@ffd16e2

@ankino17

This comment has been minimized.

Copy link

ankino17 commented Mar 1, 2018

Server:libev 3.1.3
Client:android 4.4.6
在我的Xperia(net.ipv4.tcp_fastopen=1)上,联通4G和电信宽带环境下都抓不到带有TFO-COOKIE的数据包。

这篇文章你可以阅读一下:
http://blog.donatas.net/blog/2017/03/09/tfo/

我将tcp_fastopen_blackhole_timeout_sec设为0,客户端需要较长时间才能返回数据,且退回为传统连接方式。

但是在国外的两台服务器通信的确有抓到cookie且第二次建立连接时client带上了cookie,说明ss-libev是没有问题的。

@triaqu

This comment has been minimized.

Copy link

triaqu commented Apr 2, 2018

我的电信4G信号也出现TFO下DNS数据返回的问题,wifi下使用正常。只好把服务器端TFO关闭。

试了下把 net.ipv4.tcp_fastopen=改为1,测试通过但延迟比关闭TFO要高些。

@PantherJohn

This comment has been minimized.

Copy link

PantherJohn commented Jun 1, 2018

Have anyone tested TCP Fast Open using Alibabacloud ECS yet? I got exactly the same problem here:
Client: ss-tunnel (modified: https://github.com/RethinkMax/shadowsocks-libev)
Server: ssserver (also as a sniproxy client: https://github.com/PantherJohn/sniproxydpl)
with fast_open enabled in conf and net.ipv4.tcp_fastopen set to 3 on both sides, fo cookie requested but never received by the remote side:

# ssserver
    TCPFastOpenActive: 7472
    TCPFastOpenActiveFail: 5147
    TCPFastOpenBlackhole: 8

note that the increase in TCPFastOpenActive is caused by the outbound connection to Google, Youtube etc (even local loopback can do that because all requests are forwarded by ssserver to sniproxy) . On client side I merely had:

TCPFastOpenCookieReqd: 29
<serveraddr> age 544.728sec rtt 162750us rttvar 44000us cwnd 18 metric_5 1302359 metric_6 176213 fo_mss 1460 fo_syn_drops 2/609.658sec ago fo_cookie 5b34b885cb57aecf

it’s not a surprise that NAT(probably?) is doing something nasty dropping inbound packets with unknown TCP options as well as fo cookies.

# SNIPROXY Server deployed on vultr
    TCPFastOpenActive: 38
    TCPFastOpenPassive: 31
    TCPFastOpenCookieReqd: 4

Middleware Issues

Some middleware, such as firewalls and NAT boxes may cause issues with the new TCP option. Additionally, because the Linux continues to set the TFO option to 254, which is the experimental kind, it maybe more likely to be dropped. It’s even been reported some middleware boxes, after detecting the TFO option in the initial SYN packet, drop subsequent SYN packets without the TFO option. Also, if a device is behind a Carrier Grade NAT (CGN) with many public IP addresses constantly changing, a cookie may become invalidated often, reducing the effectiveness of TFO. High latency mobile devices which benefit the most from TFO are also most likely to be affected by changing public IP addresses due to CGNs. I currently have no data on this.

@madeye madeye referenced this issue Jun 11, 2018

Closed

TCP Fast Open not work #1825

4 of 17 tasks complete
@ankino17

This comment has been minimized.

Copy link

ankino17 commented Aug 18, 2018

当带有TFO的包经过路由器 可能会被丢包 不同运营商的策略也不同
如果配置了NAT地址池 在第二次连接时可能会成功 取决于NAT表老化时间
客户端IP改变和NAT之后的公网IP改变都会影响TFO的正常使用
服务端收到不正确的包后依旧可以发送syn-ack 且退回为3WHS
https://www.juniper.net/documentation/en_US/junos/topics/task/configuration/tfo-configuring.html
https://tools.ietf.org/html/draft-cheng-tcpm-fastopen-02#section-5

Network Address Translation (NAT)
The hosts behind NAT sharing same IP address will get the same cookie to the same server. This will not prevent TFO from working. But on some carrier-grade NAT configurations where every new TCP connection from the same physical host uses a different public IP address, TFO does not provide latency benefit. However, there is no performance penalty either as described in Section "Client: Receiving SYN-ACK".

Client: Receiving SYN-ACK
The client SHOULD perform the following steps upon receiving the SYN-ACK:

  1. Update the cookie cache if the SYN-ACK has a Fast Open Cookie Option.
  2. Send an ACK packet. Set acknowledgment number to RCV.NXT and include the data after SND.UNA if data is available.
  3. Advance to the ESTABLISHED state.

Note that is no latency penalty if the server does not acknowledge the data in the original SYN packet. The client can retransmit it in the first ACK packet in step 2. The data exchange will start after the handshake like a regular TCP connection.

可以选择在服务端单方面将模式改为1或者客户端改为0 不影响服务端对外连接的性能

@Nick-Hopps

This comment has been minimized.

Copy link

Nick-Hopps commented Oct 6, 2018

I'm using LEDE 17.01.6 with "net.ipv4.tcp_fastopen = 3", so is my Ubuntu 18.04 server.
That's to say, I've already make TFO config on both server and router.
However, when I set tcp fast open to true on my router, the log shows :

Sat Oct  6 13:52:19 2018 daemon.info ss-redir[2915]: listening at 0.0.0.0:1234
Sat Oct  6 13:52:19 2018 daemon.info ss-redir[2915]: UDP relay enabled
Sat Oct  6 13:52:19 2018 daemon.info ss-redir[2915]: udp port reuse enabled
Sat Oct  6 13:52:19 2018 daemon.info ss-redir[2915]: TCP relay disabled
Sat Oct  6 13:52:19 2018 daemon.info ss-redir[2915]: running from root user
Sat Oct  6 13:52:22 2018 daemon.err ss-redir[2881]: failed to set TCP_FASTOPEN_CONNECT

By the way, both my router and server runs the newest version of shadowsocks-libev.
What's the matter?

@Nick-Hopps

This comment has been minimized.

Copy link

Nick-Hopps commented Oct 6, 2018

Once I degraded to 3.1.3-4 version, everything's fine.

@PantherJohn

This comment has been minimized.

Copy link

PantherJohn commented Oct 6, 2018

@Nick-Hopps seems they have removed TCP_FASTOPEN_CONNECT from linux kernel headers. On my machine (CentOS 7.4 Linux kernel 4.18) TCP_FASTOPEN_CONNECT is not defined. Recompiling shadowsocks-libev on your LEDE router may help.

@Nick-Hopps

This comment has been minimized.

Copy link

Nick-Hopps commented Oct 6, 2018

@PantherJohn Thank you, though I don't think my little router can withstand compiling anything haha.

@PantherJohn

This comment has been minimized.

Copy link

PantherJohn commented Oct 6, 2018

@Nick-Hopps @librehat After all according to your statement the release is not backward compatible -- this is the issue to be fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.