Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sdn: set veth TX queue length to unblock QoS #11126

Merged
merged 1 commit into from
Sep 30, 2016

Conversation

dcbw
Copy link
Contributor

@dcbw dcbw commented Sep 27, 2016

docker/libnetwork appear to create the pod veth interfaces with
an explicitly 0 txqlen instead of leaving the default 1000 that
the kernel would assign. A txqlen of 0 blocks any packets when
QoS is enabled for a port due to the way the Linux HTB shaper
works. Explicitly set the veth txqlen to make QoS work again.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1378000

docker/libnetwork appear to create the pod veth interfaces with
an explicitly 0 txqlen instead of leaving the default 1000 that
the kernel would assign.  A txqlen of 0 blocks any packets when
QoS is enabled for a port due to the way the Linux HTB shaper
works.  Explicitly set the veth txqlen to make QoS work again.

Fixes https://bugzilla.redhat.com/show_bug.cgi?id=1378000
@dcbw
Copy link
Contributor Author

dcbw commented Sep 27, 2016

@openshift/networking @eparis

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but should we open an issue with docker / libnetwork too?

@dcbw
Copy link
Contributor Author

dcbw commented Sep 27, 2016

See https://bugzilla.redhat.com/show_bug.cgi?id=1152231 for background on why veth interfaces now work this way...

@dcbw
Copy link
Contributor Author

dcbw commented Sep 27, 2016

LGTM, but should we open an issue with docker / libnetwork too?

Its pretty murky; the explicit txqlen:0 was done back in 2014 to work around the bug that the kernel change from 1152231 fixes. So maybe it's time to revisit. But since we may not use docker for creating the veth in the future anyway, it probably doesn't matter much to us.

@danwinship
Copy link
Contributor

OK... so...

With old kernels, veths were created by default in such a way that they had bad performance (docker-archive/libcontainer#193). Setting txqueuelen to 0 fixes the performance problem but would cause explicitly-added qdiscs to fail (docker-archive/libcontainer#193 (comment)). Docker went with that approach anyway though, because they decided qdiscs were an advanced feature they didn't need to support (docker-archive/libcontainer#193 (comment)).

Except meanwhile, it turns out that the original theory that setting txqueuelen=0 would cause explicitly-added qdiscs to fail was mostly wrong, and in particular doesn't apply to the "htb" qdisc which we are using (http://www.spinics.net/lists/netdev/msg337586.html) (although it defaults to txqueuelen=2 in this case, not 1000, which seems like it might be... poor?) So with old kernels, OpenShift worked, even when using QoS.

Phil and Jesper eventually fixed the original "veths were created by default in such a way that they had bad performance" bug (https://bugzilla.redhat.com/show_bug.cgi?id=1152231), which means that the original libcontainer "set txqueuelen to 0" workaround is no longer needed (yay!), but in the process they also removed the workaround for txqueuelen==0 in htb (https://www.mail-archive.com/netdev@vger.kernel.org/msg74390.html), which means that the "set txqueuelen to 0" workaround is no longer allowed (boo!). (This seems like it might count as the sort of thing that kernel hackers consider an API break? Or maybe not, depending on exactly how poorly the txqueuelen of 2 worked...)

Anyway... with this patch:

  • old kernel: libcontainer sets txqueuelen to 0 and then brings up the veth, which causes the kernel to create it with no qdisc, which means it has good performance. Your patch then sets txqueuelen to 1000 before adding the htb qdisc, which has the desired effect (which is probably better than the old behavior)
  • new kernel: libcontainer sets txqueuelen to 0 and then brings up the veth, which would have been created with no qdisc anyway, and the txqueuelen has no effect. Your patch then sets txqueuelen to 1000 before adding the htb qdisc, which has the desired effect (which is certainly better than the old behavior of attaching the htb qdisc with txqueuelen 0, which broke everything).

So LGTM. [merge]

@openshift-bot
Copy link
Contributor

[Test]ing while waiting on the merge queue

@openshift-bot
Copy link
Contributor

Evaluated for origin test up to 8a2fbcf

@openshift-bot
Copy link
Contributor

continuous-integration/openshift-jenkins/test FAILURE (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9384/)

@dcbw
Copy link
Contributor Author

dcbw commented Sep 28, 2016

Travis failure is #11015 so re-[merge]

@pravisankar
Copy link

LGTM, @danwinship thx for the nice explanation

@knobunc
Copy link
Contributor

knobunc commented Sep 29, 2016

[merge]

@dcbw
Copy link
Contributor Author

dcbw commented Sep 29, 2016

flake is #11165 re-[merge]

@openshift-bot
Copy link
Contributor

Evaluated for origin merge up to 8a2fbcf

@openshift-bot
Copy link
Contributor

openshift-bot commented Sep 29, 2016

continuous-integration/openshift-jenkins/merge SUCCESS (https://ci.openshift.redhat.com/jenkins/job/test_pr_origin/9526/) (Image: devenv-rhel7_5102)

@openshift-bot openshift-bot merged commit dfab2c1 into openshift:master Sep 30, 2016
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 3, 2016
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] docker-archive/libcontainer#193
[2] openshift/origin#11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
fengguang pushed a commit to 0day-ci/linux that referenced this pull request Nov 8, 2016
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] docker-archive/libcontainer#193
[2] openshift/origin#11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
hohoxu pushed a commit to hohoxu/n5kernel that referenced this pull request Aug 22, 2018
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] docker-archive/libcontainer#193
[2] openshift/origin#11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
krazey pushed a commit to krazey/android_kernel_samsung_exynos9810 that referenced this pull request Nov 17, 2021
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] docker-archive/libcontainer#193
[2] openshift/origin#11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
krazey pushed a commit to krazey/android_kernel_samsung_exynos9810 that referenced this pull request Nov 17, 2021
It is a clear misconfiguration to attach a qdisc to a device with
tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
htb, plug and sfb) inherit/copy this value as their queue length.

Why should the kernel catch such a misconfiguration?  Because prior to
introducing the IFF_NO_QUEUE device flag, userspace found a loophole
in the qdisc config system that allowed them to achieve the equivalent
of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
a device.  The loophole on older kernels is setting tx_queue_len=0,
*prior* to device qdisc init (the config time is significant, simply
setting tx_queue_len=0 doesn't trigger the loophole).

This loophole is currently used by Docker[1] to get better performance
and scalability out of the veth device.  The Docker developers were
warned[1] that they needed to adjust the tx_queue_len if ever
attaching a qdisc.  The OpenShift project didn't remember this warning
and attached a qdisc, this were caught and fixed in[2].

[1] docker-archive/libcontainer#193
[2] openshift/origin#11126

Instead of fixing every userspace program that used this loophole, and
forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
catch the misconfiguration on the kernel side.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants