Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tcp_ecn causes my network downloads to fail or become _very_slow #9748

Closed
tYYGH opened this issue Jul 29, 2018 · 34 comments
Closed

tcp_ecn causes my network downloads to fail or become _very_slow #9748

tYYGH opened this issue Jul 29, 2018 · 34 comments
Labels
Milestone

Comments

@tYYGH
Copy link

@tYYGH tYYGH commented Jul 29, 2018

systemd version the issue has been seen with

239.0

Used distribution

Archlinux

Expected behaviour you didn't see

working downloads

Unexpected behaviour you saw

very slow, or non-working downloads (curl, wget, firefox, git clone/fetch…)

Steps to reproduce the problem
Install systemd 239.0
See here: https://bugs.archlinux.org/task/59473

Steps to fix the problem
sysctl net.ipv4.tcp_ecn=0

@phomes
Copy link
Contributor

@phomes phomes commented Jul 29, 2018

The man page for tcp says

When enabled, connectivity to some destinations could be affected due to older, misbehaving middle boxes along the path, causing connections to be dropped. However, to facilitate and encourage deployment with option 1, and to work around such buggy equipment, the tcp_ecn_fallback option has been introduced.

And it seems that you might be affected by that. Can you share some details about your network equipment and perhaps even check if there is an updated firmware for it?

To work around the problem I would suggest to use sysctl net.ipv4.tcp_ecn=2 (edit: updated pr next comment) instead of 0, as that is was the default in use before the change in #9143

@SimonIremonger

@SimonIremonger
Copy link

@SimonIremonger SimonIremonger commented Jul 29, 2018

Correction to above: sysctl net.ipv4.tcp_ecn=2 [negotiate ECN only on incoming connections] used to be the default, I think.
I any case, I agree tYYGH should sort-out what router / further network equipment in use and its' firmware... Could easily be an issue there!.

@poettering
Copy link
Member

@poettering poettering commented Jul 30, 2018

I wonder if we should revert #9143. We shouldn't really turn something on if it breaks people's connectivity. If routers are fucked, then this might very well be out of control of the person who uses the system, and issues like this are really not easy to debug and fix.

@poettering
Copy link
Member

@poettering poettering commented Jul 30, 2018

@SimonIremonger @enihcam @michich opinions on reverting?

@poettering poettering added this to the v240 milestone Jul 30, 2018
@tYYGH
Copy link
Author

@tYYGH tYYGH commented Jul 30, 2018

I can confirm the “not easy to debug” part! I literally spent days on Freenode##Networking, where I got great help, and no one came anywhere close to the real cause for this issue. We ran many tests (ping, traceroute, mtr, tcpdump…), and always we were disturbed by the fact that some things worked, and some not…

I had this issue on 2 machines (out of 2!) where I upgraded to latest systemd. I will report on these machines’ specifics (a regular self-assembled PC, and an Udoo X86) as soon as I get home.

Thanks a lot for systemd, by the way! It’s so much better than what we had before :-)

@SimonIremonger
Copy link

@SimonIremonger SimonIremonger commented Jul 30, 2018

I would strongly suggest a 3-prong'ed approach:-

Definitely encourage those who HAVE found STILL faulty network equipment to raise the issue (this NEEDS to happen somewhere!!). Is there a new wiki/location for ECN-hall-of-shame?

Advising linux-net and bufferbloat communities of the issue still persisting in some areas, and asking if a more aggressive ECN-fallback option can be implemented (like apple have been doing sucessfully).

Gauge the actual scale of the issue, It may be reverting for the next systemd revision is worthwhile, but this still "needs raising" somewhere. I would have expected this issue to appear in a FEW places but I really do get the impression its' increasingly-rare. Many of us have had tcp_ecn=1 for a decade now. These few, initial, complaints might suggest the problem is now more likely in "consumer routers" rather than in "public services" (latter appears to be much of a non-issue now)...
If the latter-point is true, a better adaptive-fallback in kernel would likely alleviate/workaround the slow connection establishment, at least.

@SimonIremonger
Copy link

@SimonIremonger SimonIremonger commented Jul 30, 2018

@tYYGH -- we don't need the machines' specifics as much as Router/firmware-on-router that you are connecting through, which is much more likely where the incompatibility lies.

@tYYGH
Copy link
Author

@tYYGH tYYGH commented Jul 30, 2018

Here is absolutely all I can tell about this ISP-provided router:

[root@sedentaire ~]# nmap -A 192.168.1.1
Starting Nmap 7.70 ( https://nmap.org ) at 2018-07-30 17:49 CEST
Nmap scan report for bbox.lan (192.168.1.1)
Host is up (0.00045s latency).
Not shown: 997 closed ports
PORT    STATE SERVICE  VERSION
53/tcp  open  domain   dnsmasq 2.75
| dns-nsid: 
|_  bind.version: dnsmasq-2.75
80/tcp  open  http     lighttpd
|_http-server-header: Lighttpd
443/tcp open  ssl/http lighttpd
| ssl-cert: Subject: commonName=Bbox/organizationName=Bouygues Telecom/stateOrProvinceName=France/countryName=FR
| Not valid before: 2013-05-27T08:50:31
|_Not valid after:  2023-05-25T08:50:31
|_ssl-date: TLS randomness does not represent time
MAC Address: D0:84:B0:18:55:FC (Sagemcom Broadband SAS)
Device type: general purpose
Running: Linux 2.6.X
OS CPE: cpe:/o:linux:linux_kernel:2.6
OS details: Linux 2.6.9 - 2.6.27
Network Distance: 1 hop

Apparently, this would be the one:
https://www.bbox-mag.fr/box/fixe/37-presentation-video-de-la-bbox-sensation-ng-sagem-5330b/

Yep, it is, and it has firmware 15.1.2 from 2018-02-13 (probably the latest available).

@tYYGH
Copy link
Author

@tYYGH tYYGH commented Jul 30, 2018

In your opinion, is this an important bug in this loaned router, that would justify, that Bouygues (my ISP) replaces it with a bug-free model? Or is this just a sadly-acceptable nuisance?

@phomes
Copy link
Contributor

@phomes phomes commented Jul 30, 2018

I am a bit surprised that an ISP would not have this fixed. All iOS 11 devices have ECN enabled and 50% randomly selected OS X computers do. So I would expect an ISP to drown in complaints over the router. But maybe our fallback solution is just not good as Apples as @SimonIremonger wrote.

Just to be sure. net.ipv4.tcp_ecn_fallback is set to 1, right?

@tYYGH
Copy link
Author

@tYYGH tYYGH commented Jul 30, 2018

@phomes You mean on the router? I have no way of knowing. Seen from my seat, this is a black-box appliance… :-(

On my PC net.ipv4.tcp_ecn_fallback = 1, as you wrote.

@SimonIremonger
Copy link

@SimonIremonger SimonIremonger commented Jul 30, 2018

@tYYGH The issue COULD be further up the chain in the ISP's network, not just the router.
In any case, I DO think it really is an issue the ISP SHOULD be fixing -- these days I do think it is NOT a sadly-acceptable-nuisance. Please do try to raise it with them, I appreciate it can be hard to get through to 2nd-line support, but worth a try. They might in any case get you a 'different type of router' if asked.

@tYYGH
Copy link
Author

@tYYGH tYYGH commented Jul 31, 2018

Interesting read!
I believe I had at least the packet-reordering problem:
https://ptpb.pw/fXg1
This is curl trying to download the Tor-Browser using the URL from the project’s web site.

@SimonIremonger
Copy link

@SimonIremonger SimonIremonger commented Jul 31, 2018

@tYYGH
Do find what you can about alternate-router / ISP tech support... Post onto phomes's query:-
https://www.reddit.com/r/linux/comments/933vys/is_tcp_ecn_still_a_problem_today/
If problem-routers quoted there I'll put notes in bufferbloat wiki.

@dtaht
Copy link

@dtaht dtaht commented Aug 3, 2018

a packet cap of the failing connection with ecn on would be revealing.

for the record, I have deeply conflicting feelings about the wide use of ecn. In my mind it is a good idea at very short rtts(sub 2ms), and very long ones(>50ms), and for doing things like protecting video iframes from loss. I use it to protect routing babel protocol packets from being dropped. Etc.

Others (in the bbr, l4s, dctcp communities) want to change the definition of ecn to mean a multi-bit rate reduction and obsolete rfc3168, where a loss is equivalent to a mark and the recommended rate reduction is 1/2. fq_codel, pie, red, and all other deployed ecn capable aqm systems essentially implement rfc3168 behavior and it's what apple's tcp - and linux cubic - and bsd's - and windows - general deployment expects. I had hoped with wider deployment of aqms that dealt with ecn properly that we'd see more tcp's also enabling it... and we'd see tcps evolve to treat aqms doing multiple ecn marks per rtt per rfc3168 more sanely than they do today as it is a stronger signal of congestion than loss, as well as

Instead... well, see bbr, which currently more or less ignores packet loss on it's quest to own the link, and currently has no ecn response. There's a thread on the bbr list that talks about how they are leaning towards not respecting that rfc. The l4s folk vehemently defend the idea that some form of dctcp can run outside the datacenter, based on bigbuckbunny based demos, combined with a custom and patented aqm, never tested against wifi or 3g, who create a lot of noise in the ietf, and little else.

Nearly every time I've quit smoking, an ecn debate started me up again, and instead of continuing to deal with it, I left the ietf, leaving the folk there to plot amongst themselves with no actual deployment to deal with. I'm so frustrated with the "make tcp go fast at any cost" people that periodically I fiddle with something called tcp-fu (for users), that has an adjustable response curve to fq_codel's ecn marks from background (torrent-like) to "gentle", rather than "rabid". Despite the enormous success of fq_codel and BQL in eliminating bufferbloat and network latency - I feel bad about essentially obsoleting the entire field of LPCC
(see: https://perso.telecom-paristech.fr/drossi/paper/rossi14comnet-b.pdf ) and would like to do something about it.

I certainly support systemd folk trying to move the state of the art forward and am eternally grateful for their adoption of fq_codel, which now covers the world, and (as a tiny component of the whole system) also has ecn enabled by default. If, in all the world, enabling ecn for cubic tcp only breaks this one box, well, fix that box.

Turning on ecn universally in systemd seems to be an idea that could be a major forcing, political and technical function, towards aiding deployment (fixing busted routers) and keeping the simplicity of an rfc3168-style ecn enabled aqm alive instead of crazed ietf alternatives like l4s + "dualQ". Of course those proponents would point to their lowered tcp delay (in the datacenter) as a forever lost opportunity and split the universe apart.

I used to take great glee in how fq techniques generally beat dctcp's, even in the datacenter... at how even the guy that invented dctcp moved onto fq...

But...

As the accidental co-author of what is now the largest edge-user ecn-enabled fq + aqm deployment in "fq_codel everywhere", + the implementation in the ath9k and athk wifi code... (does anyone have any idea how many users of systemd + fq_codel are?) I lose sleep over the ecn component only. I have a ton of data on it. It's mixed.

What I see with lots of tcp-ecn traffic on a link is that other valuable packets are delayed (slightly) or lost and am always saying "ECN has mass" to anyone that will listen. This is made up for by nearly eliminating retransmits, mostly. I think. It's a huge win for interactive tcp traffic, which is why apple adopted it (and - helpfully - their reno tcp is far less aggressive than linux cubic). I worry about lockout at low speeds and that one day we'd have to mark all packets of all types of flows (including voip and dns) as ecn capable, unless tcps evolve appropriately towards an agreed upon response curve.

One really interesting side-effect of ecn on is that fq_codel, running locally on the server, can self congest and start marking packets locally, thus regulating the behavior of the server better after a rtt.(try 128 flows coming out of a short path at a gbit). But local fq_codel induced loss (currently) on the other hand is sometimes not lost there but signals the local stack to immediately to reduce cwnd without actually losing the packet. Others might view either behavior as a problem and prefer that that server serve 100s more flows at ever increasing self inflicted local delay (as sch_fq does) until you run out of cpu. (I know this is not a good description of this issue, this is a bug note not a paper).

I do wish we'd come up with a more robust response to overload in fq_codel for ecn, as currently a malicious ecn sender can push fq_codel to its memory limit before being dropped robustly. (pie drops ecn at overload). fq_codel (and now sch_cake) can continue to evolve as can everything else of course!

ECN is the wet paint of the congestion control universe. I hope the systemd deployment goes well and if it does, I'll sleep better, and if it doesn't, and gets reverted with all the reasons why well understood, I'll also sleep better. thx for pushing the limits.

@dtaht
Copy link

@dtaht dtaht commented Aug 3, 2018

@tYYGH - looking at that packet cap is not helpful without the actual .cap file as it lacks IP headers. Secondly iy appears to be using tor. It would not surprise me if some tor's clients/servers encapsulation of ip packets was wrong (in perpetual draft is this: https://tools.ietf.org/html/draft-ietf-tsvwg-ecn-encap-guidelines-10)

I'm currently not inclined to "blame the router" but the server you are connecting to. This server has ecn enabled. Try downloading this file both "straight" and through tor, with ecn on and off, with wget or curl: http://flent-fremont.bufferbloat.net/~d/losangeles_and_wish_you_were_here.mp3

(for the record, it's me and a friend playing those two songs and is 12MB in size, enjoy)

(If you can setup a time I can tcpdump from my side).

@SimonIremonger - if you can't tell, my position on ecn is nuanced. Also, apple's enormous test tested their clients against their servers primarily, and not things like torrent or tor. I would not surprise me at all if ecn failure modes were higher against many other peoples severs, nor if linux's ecn response was insufficiently robust.

@tYYGH
Copy link
Author

@tYYGH tYYGH commented Aug 5, 2018

@dtaht - There is no TOR involved in the tcpdump output I posted. This is a plain curl download of the Tor Browser, not through the Tor Browser.

I wouldn’t blame the server, too (or at least not only this one), since this issue also prevented me from running any git fetch for example; and for instance, downloading https://f-droid.org/FDroid.apk failed the same way as downloading the Tor Browser did.

As for the usefulness of the capture I posted, I am sorry if it is useless; it is only what I had available at the time. I ran this capture at the direction of someone who understands these things better than I do. I do not know how to use tcpdump, and I do not understand its output, and your long post above is gibberish to me, sadly.

However, I am willing to learn, and to help, and if I can produce any output that is more helpful, please ask! I will do my best to do post the information you need.

@dtaht
Copy link

@dtaht dtaht commented Aug 6, 2018

ok, that much info points more at your router.

tcpdump -i yourinterface -w acapture.cap

captures the binary info needed to look that deeply into what was going on at the tcp level.

@jonathonf
Copy link

@jonathonf jonathonf commented Aug 12, 2018

Just to weigh in here. Silently "breaking" end-users' network connections isn't a great approach to fixing middleware. You also need to remember that large parts of the world don't have the same level of networking infrastructure so some users won't be able to do anything about this even if they knew the reason behind their issue.

As things stand, there are numerous reports of network connections suddenly becoming "slow" after an update to 239 on rolling release distros. This leads those users to test a different (frozen-pool) distro, find it works as expected (as it has an older systemd version), and then simply move to that distro. Of course, the issue will reappear a few months/years down the line when they upgrade that distro, but as far as they're concerned the original distro was broken.

Now - if I I have read #9748 (comment) correctly, it appears that ECN isn't a "magic bullet" that should be blanket-enabled, but is instead a "tweak" or "optimisation" which should be applied by network admins who know what it does and can check its effects. A "successful" test on iOS isn't representative for anything other than for iOS devices being used in areas where people can afford iOS devices - Linux isn't only run on high-end/pro-sumer hardware.

I don't know how new features can be rolled out without this sort of friction ("you can't please all the people all the time") but if ECN isn't "perfect" perhaps this change should be reverted? At least until it can be better/more widely tested? However, I'm certainly no expert so am happy to be overruled.

@dtaht
Copy link

@dtaht dtaht commented Aug 13, 2018

If the ecn enablement is causing trouble outside this bug report, I totally agree with reverting it in systemd.

In fact, at this point in time, as desirous as I was in an earlier post towards getting more data about what else can go wrong with it in a fuller ecn deployment, I believe the linux tcps are not ready to be ecn'd in the general case for the general public. So - while I don't mind if you get more data!! - please revert.

@keszybz
Copy link
Member

@keszybz keszybz commented Aug 20, 2018

Since #9880 is merged, let's close this. We can probably try again when the kernel gets better fallback mode.

@keszybz keszybz closed this Aug 20, 2018
@MayeulC
Copy link

@MayeulC MayeulC commented Oct 5, 2018

For what it's worth, I experienced this issue, with exactly the same ISP-provided router as @tYYGH, A Bouygues Telecom ADSL box (F@st5330b, firmware 13.2.2018). Arch + systemd 239 as well. Symptoms include Internet working on Wi-Fi but not Ethernet, everything working fine on an older distribution. The internet connection is very slow and unreliable, TCP connections start to drop after a while, downloads stall after a handful bytes, with some websites working, and others not. Symptoms are similar to (and this router might exhibit this problem as well) a window scaling issue.
The fact that connections work over Wi-Fi indicates that it's probably a home router configuration issue, and not somewhere else on the link.
I will contact the ISP a bit later, I hope to be able to sort trough the phone support maze. Suggestions are welcome.

I still think that this would be a sane default, provided that:

  • The fallback works better
  • the kernel prints to dmesg when it has to rely on the fallback (maybe after a certain threshold).
@tYYGH
Copy link
Author

@tYYGH tYYGH commented Oct 5, 2018

Interesting feedback, @MayeulC! I once had a competent technical person on the phone (can you believe it?); unfortunately I had to leave home at that time, and couldn’t investigate things with him 😢

In case that helps you, the way I did, is that I went to a real brick&mortar BBox agency at the mall, and told the people there, that I had a technical problem, and that I had already done some investigation with network experts from the Internet, and that the problem was definitely on Bouygues’ side, and that I wished to be called home by a technician. And call he did!

@rickco
Copy link

@rickco rickco commented Oct 16, 2018

Like @MayeulC @tYYGH , I had my archlinux having unreliable internet connection through my ISP Bouygues Telecom (more precisely box Miami) after an upgrade.
The issue is not easy to detect as the internet is kind of working. Even a linux confirmed user like me would never suspect anything wrong. Until I was really upset of download failing that i started to look more precisely and found out on a Bouygues forum that is was due to my archlinux latest defaut config. Nothing in the issue makes you think is can come from systemd, so i would advocate for a default config that works everywhere.
setting net.ipv4.tcpp_ecn=2 solved the issue in my case

@MayeulC
Copy link

@MayeulC MayeulC commented Oct 27, 2018

I just had a phone call with a "Bouygues Telecom technical service" employee (who was actually quite knowledgeable, which surprised me as I missed the phone call we planned at the BBox store and had to call back like any customer).

I was told they would investigate and keep me updated. Meanwhile, that person suggested me to post in the self-support forums, so that other people could chime in and report that they were affected as well. @tYYGH , @rickco, could you add a few words to this thread, please (I wrote the post in French, though)?
I hope they will patch their routers and leave us one step closer to enabling this everywhere. I'm not holding up my hopes too high, though, and this won't replace a better in-kernel fallback mechanism.

@rickco
Copy link

@rickco rickco commented Oct 27, 2018

thanks MayeulC for the post on Bouygues forum, just replied

@dtaht
Copy link

@dtaht dtaht commented Oct 27, 2018

@MayeulC
Copy link

@MayeulC MayeulC commented Oct 28, 2018

@dtaht It seems that their router doesn't exhibit this problem when using the Wi-Fi connection (I think they likely fixed that specifically for iOS devices).

halstead pushed a commit to openembedded/openembedded-core that referenced this issue Nov 19, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
kraj pushed a commit to kraj/poky that referenced this issue Nov 19, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: 18ed73bfc14a1a360292b5e7cc058755b8ffa650)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
halstead pushed a commit to openembedded/openembedded-core that referenced this issue Nov 19, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
kraj pushed a commit to kraj/poky that referenced this issue Nov 19, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: c77ab88a047b9ae4c6f8e2addc1c6c970a375570)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
halstead pushed a commit to openembedded/openembedded-core that referenced this issue Nov 19, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
kraj pushed a commit to kraj/poky that referenced this issue Nov 20, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f32c7ed9e21ce84138ca5f83a0fa68427fef60de)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
halstead pushed a commit to openembedded/openembedded-core that referenced this issue Nov 20, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
kraj pushed a commit to kraj/poky that referenced this issue Nov 20, 2018
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6f9fcf318f108ecdc3371498ee2e919e68)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
wak-google pushed a commit to wak-google/systemd that referenced this issue Dec 12, 2018
Turning on ECN still causes slow or broken network on linux. Our tcp
is not yet ready for wide spread use of ECN.

This reverts commit 9194727.

systemd#9748

Upstream-Status: Backport
Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
halstead pushed a commit to openembedded/openembedded-core that referenced this issue Jan 28, 2019
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Armin Kuster <akuster808@gmail.com>
gc-plp pushed a commit to gc-plp/poky that referenced this issue Jan 28, 2019
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6f9fcf318f108ecdc3371498ee2e919e68)

(From OE-Core rev: f2c5e46392b364a8c77734a77049487c6e19ebc1)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
gc-plp pushed a commit to gc-plp/poky that referenced this issue Feb 6, 2019
>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6f9fcf318f108ecdc3371498ee2e919e68)

(From OE-Core rev: f2c5e46392b364a8c77734a77049487c6e19ebc1)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
jpuhlman added a commit to MontaVista-OpenSourceTechnology/poky that referenced this issue Feb 8, 2019
Source: poky
MR: 00000
Type: Integration
Disposition: Merged from poky
ChangeID: 9d0e83f
Description:

>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6f9fcf318f108ecdc3371498ee2e919e68)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Jeremy Puhlman <jpuhlman@mvista.com>
jpuhlman added a commit to MontaVista-OpenSourceTechnology/poky that referenced this issue Feb 8, 2019
Source: poky
MR: 00000
Type: Integration
Disposition: Merged from poky
ChangeID: da23651
Description:

>From upstream:

  Turning on ECN still causes slow or broken network on linux. Our tcp
  is not yet ready for wide spread use of ECN.

systemd/systemd#9748

(From OE-Core rev: f951aa6f9fcf318f108ecdc3371498ee2e919e68)

(From OE-Core rev: f2c5e46392b364a8c77734a77049487c6e19ebc1)

Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Armin Kuster <akuster808@gmail.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
Signed-off-by: Jeremy Puhlman <jpuhlman@mvista.com>
@KenSharp
Copy link

@KenSharp KenSharp commented Nov 10, 2019

I don't know if you're collecting data elsewhere but:

$ sudo sysctl -w net.ipv4.tcp_ecn=1
net.ipv4.tcp_ecn = 1
$ curl http://flent-fremont.bufferbloat.net/~d/losangeles_and_wish_you_were_here.mp3 > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.9M  100 11.9M    0     0   307k      0  0:00:39  0:00:39 --:--:--  313k

$ sudo sysctl -w net.ipv4.tcp_ecn=2
net.ipv4.tcp_ecn = 2
$ curl http://flent-fremont.bufferbloat.net/~d/losangeles_and_wish_you_were_here.mp3 > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 11.9M  100 11.9M    0     0   860k      0  0:00:14  0:00:14 --:--:-- 1250k

$ uname -a
Linux 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Running Ubuntu 18.04. My router is a Now TV Hub 2 which is a Linux box, but the exact detail I do not know.

speedtest-cli utilizes the connection 100%. I'm guessing the test server doesn't use ECN. I can find out more if it's desirable.

@dtaht
Copy link

@dtaht dtaht commented Nov 10, 2019

@KenSharp
Copy link

@KenSharp KenSharp commented Nov 10, 2019

I'd run this twice to eliminate caching effects.

Sorry, I did mean to say that I run them multiple times. These are typical results in both cases.

(it would be cool if that box had a fq_codel or cake based qos system?)

I honestly have no idea. The box is, sadly, pretty locked down, and it looks like they haven't updated their downloads for a while:
http://sky.com/opensourcesoftware/NowTvHub/NowTv_hub_downloads.html

I get the impression that they employ some sort of QoS, as they're a quadruple-play company, but it's not user-configurable.

I've spoken with their second-line support a few times regarding a bug (which they fixed) and they seem quite helpful. I'm not sure how many "secrets" they'd be willing to give up though.

The core stuff we use for good network measurements is the "flent" tool

This, I shall have a look at!

@dtaht
Copy link

@dtaht dtaht commented Nov 14, 2019

I'm a big fan of merely asking if they are including fq_codel, cake and sqm support to fix their bufferbloat. And you could steer them at your question, here.

I hope you enjoy flent. It is currently being used big-time to prove out the relative merits of the L4S and SCE proposals in the ietf. https://github.com/heistp/sce-l4s-bakeoff

@dtaht
Copy link

@dtaht dtaht commented Nov 14, 2019

Wow, sky is shipping linux 3.4rt in their SKY-NR801 release. This is a more than a few kernel versions prior to where I would have considered fq_codel & htb stable (3.12). In fact, this was probably the most bufferbloated linux kernel of them all, with bufferbloat at every layer in the stack - pre-bql for starters. I can't imagine the sheer number of cves this build must have!? If at all possible I'd run screaming away from this hardware in search of something with reasonable security.

4.14 is the last decent -rt kernel....

bluca added a commit to bluca/systemd that referenced this issue Apr 23, 2020
Turning on ECN still causes slow or broken network on linux. Our tcp
is not yet ready for wide spread use of ECN.

This reverts commit 9194727.

systemd#9748

Upstream-Status: Backport
Signed-off-by: Alex Kiernan <alex.kiernan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants