Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default qdisc not suitable for non-TCP network interfaces #9194

Open
yope opened this Issue Jun 5, 2018 · 68 comments

Comments

@yope
Copy link

yope commented Jun 5, 2018

Issue was seen on systemd version 237 (ubuntu 18.04) but is also present on latest git by judging from the code.

Used distribution
kubuntu 18.04

Expected behaviour you didn't see
All transmitted CAN frames are actually transmitted on the bus

Unexpected behaviour you saw
TX frames dropped without notice.

Steps to reproduce the problem
(Using a real CAN interface and CAN-utils)

$ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

$ cangen can0 -I i -g 0 -p 20 -n 400
$ ifconfig can0
can0: flags=193<UP,RUNNING,NOARP> mtu 16
unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 txqueuelen 1000 (UNSPEC)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 399 bytes 2351 (2.3 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

Note that the TX packet count is 1 less than what the cangen sent.
This particular issue is caused by the default queueing discipline set by sysctl.d/50-default.conf, but the underlying problem might probably be the fact that the assumption that all network interfaces in existence are only ever used for TCP/IP is... not quite accurate ;-)
Having a default setting that is usable only for one of many possible scenarios is not a good idea IMHO, and since it silently breaks user applications, I consider this a bug.

@boucman

This comment has been minimized.

Copy link
Contributor

boucman commented Jun 5, 2018

What default would you recommend, in general and for CAN specifically ?

@poettering poettering added the network label Jun 5, 2018

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Jun 5, 2018

@marckleinebudde, you appear to know CAN. Any idea?

@marckleinebudde

This comment has been minimized.

Copy link
Contributor

marckleinebudde commented Jun 12, 2018

For CAN we need a non dropping queue, pfifo_fast is fine for us.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Aug 30, 2018

At one level, after 6+ years of trying to redteam fq_codel and get it into fuller deployment, finally finding an edge-case where it doesn't work is a goodness. Openwrt ripped support for pfifo_fast out years ago.

After learning about this issue over here: https://lwn.net/Articles/763564/

I agree fq_codel is the wrong qdisc for CAN. At another... well, I have a couple ideas:

@marckleinebudde

  1. I should note that pfifo_fast is also a dropping qdisc, but it doesn't drop til it hits the 1000 packet limit. The original reporter's test ( https://marc.info/?l=linux-can&m=152792909929690&w=2 ) had he burst 1001 packets would have probably encountered the same problem. (well, there are cases where pfifo_fast pushes back, so possibly not)

It worries me a lot that CAN folk might think we never drop packets in their designs as there are plenty of other reasons packets can get dropped (like by being out of memory). I'd strongly recommend they test their code with very short pfifo txqueues and make sure their protocols and apps are robust to loss no matter what qdisc is in play!!!

Some solutions:

A) can devices can specify their own qdisc in the kernel in their device driver.
B) noqueue might be even saner than pfifo_fast but might require a few driver changes here and there.
the linux mac80211 stack switched to noqueue a few releases back where the new fq_codel queues there entered ( https://arxiv.org/pdf/1703.00064.pdf ) - as all the underlying wifi device drivers had sufficient buffering.

C) systemd can explicitly set pfifo in these cases on bringing the can device up. (I note that pfifo_fast is also overkill)

Thank you for finding this very important edge case.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Aug 30, 2018

the virtual can device uses noqueue. Elsewhere I see no-op. Both of those are appropriate for CAN. suspect it may be just the usb-can devices that get a default qdisc.

@yope

This comment has been minimized.

Copy link
Author

yope commented Aug 31, 2018

@dtaht
About us CAN folks thinking you never drop packets... I don't know how much you know about CAN, but the thing is, CAN is different than most other network interfaces in that the data link layer is actually part of the physical layer. CAN assumes that a successfully sent CAN frame is also guaranteed to be received successfully by someone. Under that premise, an application that sends a CAN frame should be able to assume the frame gets on the bus successfully eventually if the send() call was successful. So yes, we CAN folks assume that frames are not dropped, and any system deliberately dropping frames after they are accepted without complaint (no blocking, no error on send()) is flawed and unsuitable for CAN.
What you say about making sure our protocols and apps are robust to loss no matter what qdisc is in play, is somewhat shortsighted. On the lowest level, CAN is a complete protocol and we have no influence on it. The rule is simple: A sent frame is supposed to be delivered always as long as the network itself is operational. Many existing CAN application layers therefor don't even expect a response or anything from the other side, because there is no need. This makes it impossible to deal with loss of frames in the application.
One could say that any queueing discipline is already built in into the PHYSICAL layer of a CAN controller, and that includes handshaking and re-sending of messages on error as well as priority and traffic shaping. The Linux kernel itself has no control about those parts of the network and should thus not even try.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Aug 31, 2018

I knew very little about CAN until yesterday. Having read up on it throroughly in the past 24 hours I agree with almost all of your point (except we will drop packets for various blue-moon reasons).

My larger question was does pfifo_fast drop packets on overload on your stack or not? (easy test - try the original authors' test with a txqueuelen of 400... or 10....). It might not, leaving the stuff in the socket buffer... but I don't know.

As to your last point... I totally agree! the "right" qdisc for CAN appears to be noqueue or no-op to preserve the attributes in it it needs. That seems to be the case for the vcan device at least. I am thinking that perhaps it is just the usb-can device here that is getting the global qdisc setting for some reason, but lack any CAN devices to test at all.

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Nov 30, 2018

Hi,
@marckleinebudde
I tested our company 8Devices socketcan USB2CAN converters and got same problem with lost packets. I lost more time to detect cause of problem ( qdisck fq_codel ). fq_codel silence dropping packet, without any notice to user level. Increase txqueuelen not help.
With command "tc -s q" I can see, that fq_codel dropped packet. It looks that fq_codel buggy (Why it dropping packet if have big txqueue buffer? Not working anyway.) .
After change net.core.default_qdisc=pfifo_fast CAN lost packet problem was resolved. I noticed, that my system (web browsers ) became much stable and faster.
BR
Darius Babrauskas
8Devices

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Dec 1, 2018

@dtaht
"My larger question was does pfifo_fast drop packets on overload on your stack or not? (easy test - try the original authors' test with a txqueuelen of 400... or 10....). It might not, leaving the stuff in the socket buffer... but I don't know."
Yes pfifo_fast drop packets on overload on stack. But her no problems, because user level get error notice. Programs from CAN-UTILS handle this error.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Dec 4, 2018

My conclusion was can devices should use "noqueue" by default, as that matches the anticipation of the protocol. Is that not possible with this usb device? (it's 1 line of code in the device driver) Otherwise they should specify a fifo queue.

fq_codel drops from the head of the queue which is very tcp friendly, nearly eliminating the RTO problem tcp can have with tail-drop and making things like voip work better. It's not "buggy", codel strives for a maximum queue length no matter the bandwidth, of 5ms. It certainly was not designed for can, and can's assumptions.

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Dec 6, 2018

I tested with "noqueue". To replace qdisck I used this command:
sudo tc qdisc replace dev can0 root noqueue
sudo tc qdisc replace dev can1 root noqueue
ip link
User level got error message. So user must handle maybe with select, to decrease load at user level.
But it better than fq_codel, but pfifo better than noqueue, because for pfifo posible increase txqueuelen .
My conclusion was can devices should use "noqueue" by default for OpenWrt, which not support pfifo_fast, at other systems pfifo_fast.

P.S.
@dtaht
fq_codel was developed for wifi. Why it used from kernel 4.12 by default for cable ethernet (enp2s0)? (It seems to me, that now my PC with cable ethernet and with pfifo_fast working better. Yes it subjective)

@dtaht

This comment has been minimized.

Copy link

dtaht commented Dec 6, 2018

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Dec 6, 2018

I tested with "noqueue". To replace qdisck I used this command:
sudo tc qdisc replace dev can0 root noqueue
sudo tc qdisc replace dev can1 root noqueue
ip link
User level got error message.

I'm sorry, I don't understand. noqueue didn't work?

I am sorry, not explained clearly.

"noqueue" working predicted. If CAN driver level is filled with packet (20 packet), then CAN driver send message to higher drivers level "stop not send me more". If user level send next packet, then got back error "network down". This error message is reported, when interface are down state, too (ambiguous state ) .
pfifo_fast , when got message "stop not send me more" , begun fill internal tx queue. When it filled full, and user sent next packet- user will get error message "Buffer full". So pfifo_fast better for me.

To me, the semantics of the can bus, demand an immediate response to a packet that does not succeed.
But I'm no expert. If FIFO works right for you, by all means use it. But aren't there other error
conditions in can that will lose the packet?

I see one error condition. If packets are in buffer, and usb cable are disconnected, then this packets in buffers are lost. At this state User level mus got error message "Network down" or similar.
If packet not delivered to end CAN device, then socketcan layer got error messages back from CAN device, which try sent CAN packet. By CAN bus specification this packet must be repeated forever while not received. So user level got error message at each repeated packet.
Exp.: candump program, show errors like ERRORFRAME and error string. Of course if you sent more packet, not possible detect which message not delivered.
Possible all usb device can't use fq_codel. Because they can stop receiving usb packet if want.

P.S.
Thank you of your detailed answer about fq_codel.

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Feb 26, 2019

so is there anything actionable for us left to do?

@yope

This comment has been minimized.

Copy link
Author

yope commented Feb 27, 2019

Yes, the default qdisc for CAN interfaces should never be one that drops frames without informing the application (send() returning error). From the various opinions in this thread, I understand that pfifo_fast is the preferred candidate. So there should be at very least an exception for the qdisc default value (fq_codel) set in sysctl.d/50-default.conf

@poettering

This comment has been minimized.

Copy link
Member

poettering commented Feb 27, 2019

well, the sysctl is global for the system, so what do you expect us to do on systems that have both can and non-can ifaces?

@fox-mage

This comment has been minimized.

Copy link

fox-mage commented Mar 1, 2019

well, the sysctl is global for the system, so what do you expect us to do on systems that have both can and non-can ifaces?

For example, smart home equipment uses the CAN bus

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 1, 2019

I am concerned that the fact that this sysctl is system-wide, makes too many assumptions. IMHO one should never do this. Most probably the intention is good, and for most (TCP/IP-based) network interfaces it is a good choice, but the fact that it can unexpectedly break legitimate applications makes me think that it needs to be removed entirely.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 2, 2019

my suggestion was that the can devices since they require a fifo or noqueue qdisc, specify that in their driver. The only can device I could check was vcan, which does the right thing.

@boucman

This comment has been minimized.

Copy link
Contributor

boucman commented Mar 2, 2019

kernel bug, then ?

@fox-mage

This comment has been minimized.

Copy link

fox-mage commented Mar 2, 2019

my suggestion was that the can devices since they require a fifo or noqueue qdisc, specify that in their driver. The only can device I could check was vcan, which does the right thing.

I work in a mobile operator and the Internet. We have a lot of equipment NB-IOT sending statistics on UDP once a day, on the principle of sent and forgot. Typically, they are embedded linux based on OpenWRT. Imagine that now packages with logs of energy consumption will drop. I'll get eaten by our chief engineer.

There are many other applications running on a protocol other than TCP or not using flow control. By enabling the Code mechanism (fq_codel, codel, .etc) we cut them off immediately.

The percentage of linux on the desktop is good if it reaches 5% of the total variety of applications including android, IoT, automotive/ aviation/space electronics. All developers urgently rewrite SOFTWARE? With each contact, to explain the reason?

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 2, 2019

Not specifying a noqueue qdisc is a bug in X number of can drivers. I would like confirmation that it limited to these usb devices or a list of can devices it affects.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 2, 2019

I think this message is conflating two things - one - can bus bug on a couple device drivers - two an interpretation of how udp works that concerns me.

my suggestion was that the can devices since they require a fifo or noqueue qdisc, specify that in their driver. The only can device I could check was vcan, which does the right thing.

I work in a mobile operator and the Internet. We have a lot of equipment NB-IOT sending statistics on UDP once a day, on the principle of sent and forgot.

If you are assuming that's reliable delivery at any level of the stack or network, that would be bad. Do you have a protocol design document or src I could l look at for this application?

Usually a stats protocol over unreliable delivery sends a summary since boot, not since the last interval.

Typically, they are embedded linux based on OpenWRT.

Openwrt made fq_codel the default in 2012. This is the first report of a problem. I filed a bug there too:

https://bugs.openwrt.org/index.php?do=details&task_id=1826&string=fq_codel&type%5B0%5D=&sev%5B0%5D=&pri%5B0%5D=&due%5B0%5D=&reported%5B0%5D=&cat%5B0%5D=&status%5B0%5D=open&percent%5B0%5D=&opened=&dev=&closed=&duedatefrom=&duedateto=&changedfrom=&changedto=&openedfrom=&openedto=&closedfrom=&closedto=

The is not a systemd problem but one for anyone that fiddles with the default qdisc. I'm actually far more concerned about what to do in openwrt.

Imagine that now packages with logs of energy consumption will drop. I'll get eaten by our chief engineer.

Any transport that isn't reliable isn't reliable, it's not just the kernel qdisc. If I was relying on udp being reliable I'd get eaten by a lot of chief engineers.

There are many other applications running on a protocol other than TCP or not using flow control.

Sure.

5ms queue. 100ms burst tolerance.

Actually your interpretation is incorrect here. With a strict fifo queue, or a misbehaving application, fifos drop packets too. It appears in the can protocol, you get a notification of that. In the case of fq, pver a network, sparse flows such as your example, see no queue, and don't get dropped. It is generally an improvement on a fifo when multiple forms of traffic exist.

By enabling the Code mechanism (fq_codel, codel, .etc) we cut them off immediately.

No.

in the general case non-tcp transports on a network work better with these qdiscs.

On a can bus, not so much. Can we fix that?

The percentage of linux on the desktop is good if it reaches 5% of the total variety of applications including android, IoT, automotive/ aviation/space electronics. All developers urgently rewrite SOFTWARE? With each contact, to explain the reason?

Yes, I'd rather like a universal solution to all the queue theory problems in the world. Fix one problem, introduce another....

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 2, 2019

8Devices socketcan USB2CAN is the only device documented thus far as "doing the wrong thing". I gotta admit that idea of running a "reliable" can protocol over the already unreliable usb bus with anything other than noqueue (I think fifo is a bad idea too! and pfifo_fast, also has out of order 3 tier semantics which made it a "lucky" default that happened to work) worries me.

https://www.8devices.com/products/usb2can they are 65 dollars, which is less than I've already spent in time debating this, so I ordered one. I'll take a look next week.

Is there a native can interface on some hackerboard I can look at? A can test tool?

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 2, 2019

By CAN bus specification this packet must be repeated forever while not received.

Wow, this means that a single malfunctioning device on the can bus can wedge the whole system? That explains a lot. No timeouts?

Kind of off topic, but interesting in a general context we're in, is this document from the early days of the arpanet: https://apps.dtic.mil/dtic/tr/fulltext/u2/a053450.pdf starting at page 7. They didn't solve it then, and it's not solved now.

I would suggest moving this bug to the can bug tracker if one exists, and working together to solve it there.

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 3, 2019

By CAN bus specification this packet must be repeated forever while not received.

Wow, this means that a single malfunctioning device on the can bus can wedge the whole system? That explains a lot. No timeouts?

No. Please read the CAN specification. The acknowledge can be given by any node on the bus. So can a no-acknowledge. CAN is a real-time bus, so timeouts at the physical and data-link layer make no sense.
What happens is that if one node is broken, it can behave in two ways:

  1. It just doesn't receive correctly, so does not ACK the message. If there are other nodes, those will ACK the message and everything is fine. This is actually a scenario where loss of messages (by the failing node) can get undetected. The chance this occurs where this is not the fault of the failing node are very slim. If this happens while no other nodes are on the bus, then the sending node does not get an ACK and thus resends the message indefinitely at full speed. In this case there is nothing wrong with this behavior because there are no other nodes. BTW, indefinitely is not really true. There are internal error counters on the sending side, and the device will go into "bus-off" mode after a few retries. In practice, software will detect this, possibly wait a few ms and then restart the bus and possibly alert the user.

  2. A misbehaving node actively thrashes the message on the bus each time. In this case there is nothing that can be done. It is the same as on an ethernet hub, where one node misbehaves by thrashing the network. The network is DoSed.

Kind of off topic, but interesting in a general context we're in, is this document from the early days of the arpanet: https://apps.dtic.mil/dtic/tr/fulltext/u2/a053450.pdf starting at page 7. They didn't solve it then, and it's not solved now.

I would suggest moving this bug to the can bug tracker if one exists, and working together to solve it there.

I am not an expert on how Linux network device driver should somehow "magically" specify their own queuing discipline, but if that's what they are supposed to do, then this is IMHO another big reason to just remove this sysctl. Again, it does too many assumptions, and clearly people who implemented it don't have enough knowledge about different networking interfaces and how they work, that such assumptions can safely be made.

Please stop attacking users and take responsibility for breaking things. If there is really a bug on the CAN driver side (which I doubt), then the proper way to proceed would be to eliminate this sysctl, file a bug report, wait for all CAN drivers to get "fixed", and only then come back and implement this.

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 4, 2019

Hi @dtaht,
Can you tell me your order Nb? We want sent you new usb2can model like sample, which preparing for manufacturing. It better reporting errors to PC.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 4, 2019

@babrauskas I apologize, I clicked on it, reached for my credit card, and did not actually finish the order! I'd LOVE a new one, I can place an order via your website, or we can arrange something different via email (I can be reached as dave dot taht at gmail dot com)

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 7, 2019

Hi.

I patched driver. (my test system: 4.15.0-45-generic #48-Ubuntu SMP ..)
First results:

  1. After connecting usb2can device , system automaticaly change defaul qdisc to noqueue. I can see this info with command "ip -details -statistics link show can0"
    Respond:
    11: can0: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 10
    link/can promiscuity 0
    can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
    bitrate 20000 sample-point 0.875
    tq 3125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    usb_8dev: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
    clock 32000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes packets errors dropped overrun mcast
    4176 522 0 0 0 0
    TX: bytes packets errors dropped carrier collsns
    4176 522 0 0 0 0

  2. My test scripts working ok with different number of tx test packets.

It seem to me, that something was changed maybe in socketcan layer, possible after ubuntu updates? (maybe additional tx buffer was added). Currently noqueue and pfifo-fast working different. noqueue don't get link-down error like early, when driver buffer became full. pfifo-fast don't get buffer full error, when send packet >txqueuelen.
Anyway, now looks that working nice with noqueue and pfifo-fast.
Can anybody confirm this notices? (Edited: later I found that in driver code internal tx buffer size was not 20 but 1024 )
BR.
Darius Babrauskas

P.S. I tested with fq_codel now and dont get lost packets now!

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 8, 2019

I checked my notes about buggy socketcan tests.
Buggy kernel was 4.15.0-38-lowlatency #41-Ubuntu SMP PREEMPT Wed Oct 10 12:26:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Yesterday was : 4.15.0-45-generic #48-Ubuntu SMP
Today:
4.15.0-46-generic #49-Ubuntu SMP Wed Feb 6 09:33:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Today I repeated tests with different parameters and with fq_codel too - no problems found.
Only WARNING when do test on speed >=500k without termination resistor on can bus. Tx URB aborted (-2)
dmesg show this:
[ 9811.721823] usb_8dev 1-6:1.0 can0: Tx URB aborted (-2)
[ 9811.721829] WARNING: CPU: 1 PID: 0 at /build/linux-7kdHqT/linux-4.15.0/net/core/skbuff.c:611 skb_release_head_state+0xae/0xc0
[ 9811.721829] Modules linked in: can_raw can btrfs zstd_compress xor raid6_pq ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc overlay aufs pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) input_leds usb_8dev intel_rapl can_dev x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi ghash_clmulni_intel snd_hda_codec_realtek pcbc snd_hda_codec_generic snd_hda_intel snd_hda_codec aesni_intel snd_hda_core snd_hwdep aes_x86_64 crypto_simd glue_helper cryptd snd_pcm intel_cstate intel_rapl_perf snd_seq_midi snd_seq_midi_event
[ 9811.721850] snd_rawmidi snd_seq snd_seq_device snd_timer snd soundcore acpi_pad mei_me shpchp mei intel_pch_thermal kvm_intel mac_hid kvm irqbypass parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_generic usbhid hid i915 i2c_algo_bit drm_kms_helper syscopyarea mxm_wmi sysfillrect sysimgblt fb_sys_fops r8169 i2c_i801 mii ahci drm libahci video wmi
[ 9811.721863] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G W OE 4.15.0-46-generic #49-Ubuntu
[ 9811.721864] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./H110M-DGS, BIOS P7.10 10/19/2016
[ 9811.721865] RIP: 0010:skb_release_head_state+0xae/0xc0
[ 9811.721866] RSP: 0018:ffff99962ed03ba8 EFLAGS: 00010006
[ 9811.721866] RAX: ffffffffaea36e00 RBX: ffff99961d863000 RCX: 0000000000000006
[ 9811.721867] RDX: 0000000000010000 RSI: 0000000000000001 RDI: 0000000000000000
[ 9811.721868] RBP: ffff99962ed03bb0 R08: 0000000000000001 R09: 0000000000002e63
[ 9811.721868] R10: ffff99962ed03d28 R11: 0000000000000000 R12: ffff99961d863000
[ 9811.721869] R13: ffffffffaea506de R14: 0000000000023900 R15: ffff99962ed03c58
[ 9811.721870] FS: 0000000000000000(0000) GS:ffff99962ed00000(0000) knlGS:0000000000000000
[ 9811.721870] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9811.721871] CR2: 000055bb400c8160 CR3: 0000000165e0a003 CR4: 00000000002606e0
[ 9811.721871] Call Trace:
[ 9811.721872]
[ 9811.721873] skb_release_all+0x12/0x30
[ 9811.721874] kfree_skb+0x32/0xa0
[ 9811.721876] enqueue_to_backlog+0x9e/0x230
[ 9811.721877] netif_rx_internal+0x49/0x140
[ 9811.721879] ? netdev_info+0x64/0x80
[ 9811.721880] netif_rx+0x1c/0x70
[ 9811.721881] can_get_echo_skb+0x41/0x60 [can_dev]
[ 9811.721883] usb_8dev_write_bulk_callback+0x7f/0xc0 [usb_8dev]
[ 9811.721884] __usb_hcd_giveback_urb+0xa4/0x140
[ 9811.721886] usb_hcd_giveback_urb+0xb8/0xe0
[ 9811.721887] xhci_giveback_urb_in_irq.isra.40+0x84/0xf0
[ 9811.721888] handle_cmd_completion+0x931/0xfa0
[ 9811.721890] xhci_irq+0x38a/0xc90
[ 9811.721891] ? ttwu_do_activate+0x77/0x80
[ 9811.721892] xhci_msi_irq+0x11/0x20
[ 9811.721894] __handle_irq_event_percpu+0x44/0x1a0
[ 9811.721895] handle_irq_event_percpu+0x32/0x80
[ 9811.721896] handle_irq_event+0x3b/0x60
[ 9811.721898] handle_edge_irq+0x7c/0x190
[ 9811.721898] handle_irq+0x20/0x30
[ 9811.721900] do_IRQ+0x4e/0xd0
[ 9811.721901] common_interrupt+0x84/0x84
[ 9811.721902]
[ 9811.721903] RIP: 0010:cpuidle_enter_state+0xa7/0x2f0
[ 9811.721904] RSP: 0018:ffffb19540cebe68 EFLAGS: 00000246 ORIG_RAX: ffffffffffffffd8
[ 9811.721905] RAX: ffff99962ed22880 RBX: 000008ec783194a8 RCX: 000000000000001f
[ 9811.721906] RDX: 000008ec783194a8 RSI: fffffff96ab7c981 RDI: 0000000000000000
[ 9811.721906] RBP: ffffb19540cebea8 R08: 00000000ffffffff R09: 0000000000000018
[ 9811.721907] R10: ffffb19540cebe38 R11: 0000000000000038 R12: ffff99962ed2c910
[ 9811.721907] R13: 0000000000000002 R14: ffffffffaf771cd8 R15: 0000000000000000
[ 9811.721909] ? cpuidle_enter_state+0x97/0x2f0
[ 9811.721911] cpuidle_enter+0x17/0x20
[ 9811.721912] call_cpuidle+0x23/0x40
[ 9811.721913] do_idle+0x18c/0x1f0
[ 9811.721915] cpu_startup_entry+0x73/0x80
[ 9811.721916] start_secondary+0x1ab/0x200
[ 9811.721917] secondary_startup_64+0xa5/0xb0
[ 9811.721918] Code: 15 00 74 03 5b 5d c3 e8 01 98 a0 ff 5b 5d c3 e8 d9 92 05 00 eb da e8 a2 c2 0c 00 eb 9c 48 83 e7 fe e8 b7 08 02 00 e9 72 ff ff ff <0f> 0b eb aa 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44
[ 9811.721940] ---[ end trace 94446b9da56233ed ]---
This type of messages I no noticed early, but Tx URB aborted (-2) message I remember.

BR.
Darius Babrauskas

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 8, 2019

This last seems a different bug, and the horrifically complex usb state machine the reason why usb->can makes me nervous in general! But probably solvable for you, but it does help to have some test gear that can automate plug/unplug events, simulate low or weird power events, whilst you run through all the operations the usb device is expected to perform for simulated days/weeks/years at a time.

As an aside, I worked on this puppy, which used USB as a bus on multiple devices. In orbit. Where you have SEVs (single event upsets due to radiation) and so on.

https://en.wikipedia.org/wiki/Trailblazer_(satellite)

Getting it to stay up for more than a few hours in a rad chamber took a long time and the thing crashed hard quickly every time we simulated getting through the Van Allen belts.

After that it was decided that a simpler more robust bus than usb, with 3 virtual cpus, and a zillion other features were needed in both the hardware and linux for space-destined apps. There's still a lot from a recent, successful effort ( https://www.planetaryresources.com/2018/01/arkyd-6-is-in-orbit/) I can't yet talk about and wish I could.

Um, er, well, after that launch, I now have a high opinon of "fram microcontrollers", and I wish more people used them in high safety or high altitude designs, and hang the added expense! Let's leave it at that.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 8, 2019

Hi.

I patched driver. (my test system: 4.15.0-45-generic #48-Ubuntu SMP ..)
First results:

1. After connecting usb2can device , system automaticaly change defaul qdisc  to noqueue. I can see this info with command  "ip -details -statistics link show can0"

I am delighted!

   Respond:
   11: can0: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 10

Here, it's not up. I imagine it goes up during the test?

   link/can  promiscuity 0
   can state STOPPED (berr-counter tx 0 rx 0) restart-ms 0
   bitrate 20000 sample-point 0.875
   tq 3125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
   usb_8dev: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
   clock 32000000

What does the clock translate to?

   re-started bus-errors arbit-lost error-warn error-pass bus-off
   0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

I am puzzled as to why you have enormous gso support - these are very small packets, but I figure that's just an artifact of the rest of the system and don't matter.

What triggers restarts of a can bus?

   RX: bytes  packets  errors  dropped overrun mcast
   4176       522      0       0       0       0
   TX: bytes  packets  errors  dropped carrier collsns
   4176       522      0       0       0       0

2. My test scripts working ok with different number of tx test packets.

It seem to me, that something was changed maybe in socketcan layer, possible after ubuntu updates? (maybe additional tx buffer was added). Currently noqueue and pfifo-fast working different. noqueue don't get link-down error like early, when driver buffer became full. pfifo-fast don't get buffer full error, when send packet >txqueuelen.

This was a little hard to read. Were these all correct results? What are the expected results from the spec?

Anyway, now looks that working nice with noqueue and pfifo-fast.

Yea!

I have to stress that if you are wanting strict in-order behavior, "noqueue" or "fifo" are the right thing.

pfifo fast can also re-order - and is not in openwrt at all. fifo, was, at least last I checked.

Still, noqueue (to me at least) should return errors sooner than fifo does and seems most likely the right thing. It eliminates all kinds of issues that could happen with the coupled queue of the usb bus.

Can anybody confirm this notices?
BR.
Darius Babrauskas

P.S. I tested with fq_codel now and dont get lost packets now!

Well, I'm puzzled about that. If your test script floods the queue for more than 100ms worth of data, with fq_codel configured normally) it will start dropping from head, not tail as pfifo does with X number of packets defined in txqueuelen. Ideally a can system (IMHO, I really confess to not knowing enough about can), should not flood the queue at all with interdependent packets and always await a successful delivery before continuing.

The default qlen for fq_codel is 10,000 packets. fifo, 1000. Here, (I think) you have 10, which means it will drop from head in, um... under a few ms.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 8, 2019

Or were you saying that with the default qdisc sysctl set to fq_codel, you successfully overode that with your patch to be noqueue just for the can device?

That was my intent. We get that one liner installed for can devices everywhere that actually need it, and nobody - systemd, openwrt, oe, naive users or future products will have a problem. I am certainly willing to help push the patch into the -stable and openwrt kernel releases.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 8, 2019

Is txqueuelen also overridden in your driver to be 10? or does that come from elsewhere? or am I miss-reading things.

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 10, 2019

Unfortunately I have not yet been able to test the proposed patch with any of our CAN devices, but something that I am interested in knowing is whether:

  1. epoll/select-based send() works correctly. This usually requires a large txqueuelen (1000). If set to non-blocking, send() must never block. Also when epoll/select reports "writable", at least 1 frame must be accepted always without error nor blocking.
  2. It is possible to maintain (near) 100% bus-load transmitting under normal latency conditions.
  3. Errors are handled and reported correctly.
  4. No frame is ever lost without being reported to the user.
    Will get to test that hopefully next week.
@srn-skov-dk

This comment has been minimized.

Copy link

srn-skov-dk commented Mar 10, 2019

Hi @dtaht
With a proprietary board we have we saw CAN telegrams lost.
The workaround we applied was:
'# CONFIG_NET_SCH_FQ_CODEL is not set'
(This was march 2018. so a few kernel releases back)

This is an i.MX6 based board where we use only one of the two CAN interfaces.
The bug manifested only while testing with a high amount of CAN telegrams.
I assume this is the same issue that is discussed here - and this entry is only to tell that the issue has been seen on other devices.
I do NOT suggest our workaround as any kind of sane fix.

The flexcan driver used by i.MX6 have IFF_ECHO, but does not have IFF_NO_QUEUE
I may try to persuade someone here to try it with FQ_CODEL enabled and IFF_NO_QUEUE set.

I can see we use:
'# ip link set can0 txqueuelen 20'
But it is lost why we change this.

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 12, 2019

Hi @dtaht

Or were you saying that with the default qdisc sysctl set to fq_codel, you successfully overode that with your patch to be noqueue just for the can device?

That was my intent. We get that one liner installed for can devices everywhere that actually need it, and nobody - systemd, openwrt, oe, naive users or future products will have a problem. I am certainly willing to help push the patch into the -stable and openwrt kernel releases.

(Edited: later I found that in driver code internal tx buffer size was not 20 but 1024 )
I noticed that socketcan layer working different. I restored old driver (no patched). Then I do all test with default qdisc which is pfifo_fast. (All ok). (with old kernel i have write error, when pfifo_fast buffer with size txqueuelen overload)
After manually changed qdisc with into noqueue command:
sudo tc qdisc replace dev can0 root noqueue
sudo tc qdisc replace dev can1 root noqueue
All test run Ok (with old kernel i have "link down" error when internal driver (20) buffer overload).
#9194 (comment)

Then changed
sudo tc qdisc replace dev can0 root fq_codel
sudo tc qdisc replace dev can1 root fq_codel
All test run Ok again (with old kernel I have silent dropped packet)
(Edited: later I found that in driver code internal tx buffer size was not 20 but 1024
But if packet sent >=1024 then fq_codel can silent drop again.
)

Is txqueuelen also overridden in your driver to be 10? or does that come from elsewhere? or am I miss-reading things.

driver have self small internal 20 packet buffer.
txqueuelen we can change with
ip link set can0 txqueuelen 10
My test script allways change this txqueuelen exactly to packet nb. size of test parameter. I do it, that don't get "buffer overload" error at test time. (with old kernel and pfifo_fast qdisc)

The default qlen for fq_codel is 10,000 packets. fifo, 1000. Here, (I think) you have 10, which means it will drop from head in, um... under a few ms.

txqueuelen have effect on pfifo_fast buffer. But , I think, it don't have effect on fq_codel. For test I using slow CAN bus speed 20000. So all buffers are filled with slow "cansend" util, which execute time 1-2ms/packet. For high speed test, I using "cangen" util , which generate packets very quickly.

   Respond:
   11: can0: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 10

Here, it's not up. I imagine it goes up during the test?

Yes, you right.
At end of test script can0 and can1 are set to down state.

@marckleinebudde
(Edited: later I found that in driver code internal tx buffer size was not 20 but 1024)
Are you know about "new" changes at socketcan layer?
It seems to me, that now all packets with write function are sent successful (except when occurs "link down" error)?
Are socketcan ignored qdisc and using self buffer ?
BR.
Darius Babrauskas

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 12, 2019

Hi @srn-skov-dk, thanks for reporting.

I can see we use:
'# ip link set can0 txqueuelen 20'
But it is lost why we change this.

For CAN devices, you would normally need a high txqueuelen (1000) in order to be sure that non-blocking sockets won't ever block or give write errors when they are supposedly writable (select/epoll).
Maybe this can explain the errors @babrauskas is seeing with noqueue?
@babrauskas, can you try with txqueuelen = 1000 to see if it makes a difference?
The txqueuelen must be larger than SO_SNDBUF which has a minimum size of 1024 (is this still valid?).
You can read more in chapter 3.4 of this (somewhat old) pdf:
https://rtime.felk.cvut.cz/can/socketcan-qdisc-final.pdf

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 13, 2019

Hi,
I found cause of all my CAN test pass. It was my old experimental test driver patch
#define MAX_TX_URBS 1024
original
#define MAX_TX_URBS 20

https://elixir.bootlin.com/linux/v5.0.1/source/drivers/net/can/usb/usb_8dev.c#L39
So internal buffer became big and all qdisc then working ok, but have latency.
@marckleinebudde please ignore my question -about "new" changes at socketcan layer.

For CAN devices, you would normally need a high txqueuelen (1000) in order to be sure that non-blocking sockets won't ever block or give write errors when they are supposedly writable (select/epoll).
Maybe this can explain the errors @babrauskas is seeing with noqueue?
@babrauskas, can you try with txqueuelen = 1000 to see if it makes a difference?
The txqueuelen must be larger than SO_SNDBUF which has a minimum size of 1024 (is this still valid?).
You can read more in chapter 3.4 of this (somewhat old) pdf:

@yope
with noqueue if I set txqueuelen = 1000 and patched driver (netdev->priv_flags |= IFF_NO_QUEUE;)
tc q show
qdisc noqueue 0: dev can0 root refcnt 2
qdisc noqueue 0: dev can1 root refcnt 2

ip -statistics link show can0
16: can0: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

ip -statistics link show can1
17: can1: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 10
link/can
RX: bytes packets errors dropped overrun mcast
3488 436 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

Test result:
sudo python USB2CAN_TEST.py can1 can0 128 20000

Test loop no: 0

can0 transmiter, can1 receiver
108 **************************************************************************************
''
'write: Network is down\n'
109 **************************************************************************************
''
'write: Network is down\n'
110 **************************************************************************************
''
'write: Network is down\n'
111 **************************************************************************************
''
'write: Network is down\n'
113 **************************************************************************************
''
'write: Network is down\n'
114 **************************************************************************************
''
'write: Network is down\n'
115 **************************************************************************************
''
'write: Network is down\n'
116 **************************************************************************************
''
'write: Network is down\n'
117 **************************************************************************************
''
'write: Network is down\n'
119 **************************************************************************************
''
'write: Network is down\n'
120 **************************************************************************************
''
'write: Network is down\n'
121 **************************************************************************************
''
'write: Network is down\n'
122 **************************************************************************************
''
'write: Network is down\n'
124 **************************************************************************************
''
'write: Network is down\n'
125 **************************************************************************************
''
'write: Network is down\n'
126 **************************************************************************************
''
'write: Network is down\n'
127 **************************************************************************************
''
'write: Network is down\n'
128 **************************************************************************************
''
'write: Network is down\n'
pause_to_receive all: 1.10107400894
Packets sent 128, packets received 110, packets matched 110, lost packet: 18, error frames: 0
TEST FAIL

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 13, 2019

@yope
with noqueue if I set txqueuelen = 1000 and patched driver (netdev->priv_flags |= IFF_NO_QUEUE;)
tc q show
qdisc noqueue 0: dev can0 root refcnt 2
qdisc noqueue 0: dev can1 root refcnt 2

ip -statistics link show can0
16: can0: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000

ip -statistics link show can1
17: can1: <NOARP,ECHO> mtu 16 qdisc noqueue state DOWN mode DEFAULT group default qlen 10
link/can
RX: bytes packets errors dropped overrun mcast
3488 436 0 0 0 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0

Test result:
sudo python USB2CAN_TEST.py can1 can0 128 20000
[...]
'write: Network is down\n'
128 **************************************************************************************
''
'write: Network is down\n'
pause_to_receive all: 1.10107400894
Packets sent 128, packets received 110, packets matched 110, lost packet: 18, error frames: 0
TEST FAIL

That does not sound good. Can you share the source code of the test tool you use, in order to understand what it does?

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 13, 2019

I will ask tomorow my manager to share test script.
But it simple do 128(cmd parameter) times cansend from CAN utils. Or you can use cangen like in your report.
cangen can0 -I i -g 0 -p 20 -n 128
Internally cansend (or cangen) do write function which return error "Network is down" when driver internal buffer became full. Will be ideally that it return nobuff error like pfifo_fast.
But noqueue better than silent dropping.

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 14, 2019

@yope

That does not sound good. Can you share the source code of the test tool you use, in order to understand what it does?

Write me darius dot b at 8devices dot com. I will sent this test script.
BR
Darius

@babrauskas

This comment has been minimized.

Copy link

babrauskas commented Mar 14, 2019

@yope
This kernel code explain why noqueue return "link down" error when driver internal buffer became full.
https://elixir.bootlin.com/linux/v4.2/source/net/core/dev.c#L3102
if netif_xmit_stopped(txq) then rc = -ENETDOWN;

dmesg
usb_8dev 1-5:1.0 can0: Slow down tx path - this my added debug message show, that driver sent stop message to upper level.
Virtual device can0 asks to queue packet! message from this code line:
https://elixir.bootlin.com/linux/v4.2/source/net/core/dev.c#L3112

dmesg show:
Mar 14 08:59:21 darius-desktop kernel: [ 649.981120] usb_8dev 1-5:1.0 can0: Slow down tx path
Mar 14 08:59:21 darius-desktop kernel: [ 649.982029] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.982966] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.983822] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.984984] usb_8dev 1-5:1.0 can0: Slow down tx path
Mar 14 08:59:21 darius-desktop kernel: [ 649.986318] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.987559] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.988453] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.989712] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.990812] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.991743] Virtual device can0 asks to queue packet!
Mar 14 08:59:21 darius-desktop kernel: [ 649.992815] usb_8dev 1-5:1.0 can0: Slow down tx path

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 16, 2019

@yope

That does not sound good. Can you share the source code of the test tool you use, in order to understand what it does?

Write me darius dot b at 8devices dot com. I will sent this test script.
BR
Darius

Thanks for offering to share the cod privately, but since this is an open discussion, there is little point in looking at code that I cannot paste here. Sorry.
Anyway, I think I already know what your code does, since you mentioned it just executes cansend anyway.

https://github.com/linux-can/can-utils/blob/master/cansend.c

What cansend does is basically bind to the socket and call send(). It does not set the socket to O_NONBLOCK. This implies, that if the send queue was full, the send() call should just block until there is room in the queue. It should never just fail with an error!

So this behavior is obviously wrong, which demonstrates that "IFF_NO_QUEUE" is not a solution.

@poettering, @dtaht, can we finally conclude that the fix needs to come from systemd and not from kernel drivers? AFAICS, the only thing kernel drivers can do is specify IFF_NO_QUEUE, to avoid being given fq_codel by systemd. This is not just a quite questionable workaround for a problem triggered by systemd, it just flat out does not work.

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 16, 2019

I'm glad an indepth discussion has finally taken place. I'm still waiting on hardware. A second device has been identified as a possible problem. I keep saying over and over again, this does not just need to be worked around in systemd.

I'm on my way to prague, do you know if any other can device folk will be at netdevconf or ietf?

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 17, 2019

I don't know, but the netdevconf program looks fairly TCP-centric, so I kinda doubt it.

Please explain why you think fixing systemd is merely a workaround? Because to me it is pretty clear:

  1. The kernel supports network device queueing disciplines for different needs. For some applications, one is better suited than the other, and for some application there are qdiscs that are just unsuitable. So picking a universal default for all networking interfaces is pretty pointless IMHO. And if you really must do so, at least pick one that does no harm (like pfifo_fast?).

  2. Systemd forces a default qdisc on a network interface that does not work correctly with it. How is this not systemd's fault?

  3. There is a way of specifying a default qdisc for all networking devices in the kernel, and by default that is pfifo_fast. This makes systemd's setting a second source of truth. This is bad design. Can you give any good reason why systemd even cares about such a setting at all?

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 19, 2019

I don't know, but the netdevconf program looks fairly TCP-centric, so I kinda doubt it.

well, I'll find someone.

Please explain why you think fixing systemd is merely a workaround? Because to me it is pretty clear:

openwrt patches out pfifo_fast entirely.
any end user can set the default to the wrong thing.

  1. The kernel supports network device queueing disciplines for different needs. For some applications, one is better suited than the other, and for some application there are qdiscs that are just unsuitable. So picking a universal default for all networking interfaces is pretty pointless IMHO. And if you really must do so, at least pick one that does no harm (like pfifo_fast?).

To can.

  1. There is a way of specifying a default qdisc for all networking devices in the kernel, and by default that is pfifo_fast.

It's a kernel compile choice also. Many distros set fq_codel or sch_fq as the kernel default now. My cloud provider does, for example.

This makes systemd's setting a second source of truth. This is bad design. Can you give any good reason why systemd even cares about such a setting at all?

10-100+m installations with this setting with 1 bug reported in 5 years.

We did some good debugging here. There was 3 bugs fixed so far, including one in a workaround. We have a test script. We have a report, with workaround, of a similar sounding bug, on different hw. In your case, too, in your build, you can turn it off. The manufacturer is sending hardware to test with. There's no hurry.

You are the only person here that wants to move the universe to fix what will end up being a 1-3 line patch. I will either have this fixed at the end of ietf, or a few days after hardware and I arrive in the same place. (mid april).

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 19, 2019

I don't know, but the netdevconf program looks fairly TCP-centric, so I kinda doubt it.

well, I'll find someone.

That'll be great. Ask their opinion on this issue.

Please explain why you think fixing systemd is merely a workaround? Because to me it is pretty clear:

openwrt patches out pfifo_fast entirely.
any end user can set the default to the wrong thing.

I know of no openwrt hardware that uses SocketCAN, so I don't know why this is relevant here.
You still failed to explain why you think that fixing systemd is only a workaround.
Yes, end-users can always actively, purposefully screw things up, but in this case it is systemd that does, not the end-user....

  1. The kernel supports network device queueing disciplines for different needs. For some applications, one is better suited than the other, and for some application there are qdiscs that are just unsuitable. So picking a universal default for all networking interfaces is pretty pointless IMHO. And if you really must do so, at least pick one that does no harm (like pfifo_fast?).

To can.

To me "do no harm" means "do not break user-applications". Can you say that pfifo_fast really breaks user applications in the same sense?
I am also not saying that all networking interfaces should be treated equal. Quite on the contrary, the whole problem stems from systemd doing so. It can be fixed in two possible ways:

  1. Systemd stops specifying a default qdisc altogether and leaves that to networking services (where it belongs IMHO), where setup is done per interface, or
  2. Systemd gets smarter about knowing what works for which kind of interface and only sets a qdisc for an interface type it knows of.

Doing this in systemd as it is done now really sounds like the easiest place to just force fq_codel on everyone because "we know it is the holy grail and it solves all problems and you should just shut up and trust us because we know more about networks than you". This is not just arrogant, it is also wrong.

  1. There is a way of specifying a default qdisc for all networking devices in the kernel, and by default that is pfifo_fast.

It's a kernel compile choice also. Many distros set fq_codel or sch_fq as the kernel default now. My cloud provider does, for example.

Again, I doubt cloud providers will ever use SocketCAN (on real interfaces). I am running a desktop PC or an embedded system that has CAN interfaces, and as soon as I build a kernel that includes the fq_codel module, and I run systemd, stuff breaks. This is wrong, and the fact that 5 years have passed of "intensive testing" as you call it, without this being noticed, is most probably because it hasn't been tested enough. Not because it is correct.

This makes systemd's setting a second source of truth. This is bad design. Can you give any good reason why systemd even cares about such a setting at all?

10-100+m installations with this setting with 1 bug reported in 5 years.

It is only about two years or less that this is an issue on Ubuntu desktop installations, and on our embedded systems we run kernels that don't have fq_codel built in. And I reported this bug almost a year ago already.

We did some good debugging here. There was 3 bugs fixed so far, including one in a workaround. We have a test script. We have a report, with workaround, of a similar sounding bug, on different hw. In your case, too, in your build, you can turn it off. The manufacturer is sending hardware to test with. There's no hurry.

You are the only person here that wants to move the universe to fix what will end up being a 1-3 line patch. I will either have this fixed at the end of ietf, or a few days after hardware and I arrive in the same place. (mid april).

A 1-3 line patch where? In systemd? Then why does it take almost a year if it is so simple?

In the meantime I have been studying the net/ code in Linux, and found out exactly the place where things break when using fq_codel or any other qdisc that discards any other frame that is not the one that is currently being sent. I'll try to summarize it here, although I assume you already know this:

can_send() calls dev_queue_xmit() here:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/net/can/af_can.c?h=v5.1-rc1#n216

The line "err = net_xmit_errno(err);" filters out NET_XMIT_CN (congestion notification), so it will only return an error to the user if NET_XMIT_DROP is set. This return code is only set by qdiscs that drop the current packet only. If they do, this will get translated into an -ENOBUF error returned by the send() syscall. This way the application knows that it must retry this frame later.
If the user application uses epoll or select, datagram_poll() is called, which returns POLLOUT if sk_wmem_alloc is smaller than sk->sk_sndbuf/2. This condition will guarantee successful send() of at least one frame.

This works fine if the qdisc never drops a frame without returning NET_XMIT_DROP for the frame it drops, and txqueuelen is sufficiently big, so that wmem_alloc can reach at least half of sk_sndbuf. AFAICS, this also implies that IFF_NO_QUEUE will not work because we cannot specify a queue length obviously.

It also means that qdiscs like fq_codel are not usable because it will drop frames without notifying the user about which frames it dropped. And unfortunately a CAN network cannot function if frames are dropped. The link layer included in the hardware of the CAN device ensures that this is handled correctly there (in hardware).

Please refer to this analysis for any further discussion.

@MarkJonas

This comment has been minimized.

Copy link

MarkJonas commented Mar 20, 2019

I would like to offer a theory why this went without complaints for a long time: I think that most embedded programmers hitting this are doing payed work. In that setup it is often easier to solve the problem for yourself than getting into a lengthy community discussion. And often you even first have to ask you boss whether you are allowed to show up in public.

I am from exactly this setup and we hit the problem roughly a year ago. I also found this discussion during that time. We came to the conclusion that the change was introduced in Systemd 217 and pragmatically patched it back to the original behavior.

diff --git a/sysctl.d/50-default.conf b/sysctl.d/50-default.conf
index e263cf0628..e2ffc9b695 100644
--- a/sysctl.d/50-default.conf
+++ b/sysctl.d/50-default.conf
@@ -30,8 +30,15 @@ net.ipv4.conf.all.accept_source_route = 0
 # Promote secondary addresses when the primary address is removed
 net.ipv4.conf.all.promote_secondaries = 1
 
+# Since Systemd 217, fq_codel is the default qdisc instead of pfifo_fast.
+# The advertised benefit of fq_codel is that it helps fighting network
+# bufferbloat. But that comes with the price that depending on the network
+# load it might drop frames. For embedded dropping frames (Ethernet or CAN) is
+# often not desirable. Thus, we do not set fq_codel as the default qeueuing
+# discipline.
 # Fair Queue CoDel packet scheduler to fight bufferbloat
-net.core.default_qdisc = fq_codel
+#net.core.default_qdisc = fq_codel
 
 # Enable hard and soft link protection
 fs.protected_hardlinks = 1
@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 20, 2019

Thanks, @MarkJonas , I completely agree with your theory. Another solution would be to not build fq_codel in your custom Linux kernel. But these are all just workarounds, and specially on a desktop PC neither are practical to do, let alone maintain.

@marckleinebudde

This comment has been minimized.

Copy link
Contributor

marckleinebudde commented Mar 21, 2019

Hey,

inspired by @dtaht I crafted this patch at netdevconf 0x13. Feel free to test.

Please post the output of tc -s qdisc show.

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index c05e4d50d43d..34bcabc35127 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -646,6 +646,7 @@ static void can_setup(struct net_device *dev)

        /* New-style flags. */
        dev->flags = IFF_NOARP;
+       dev->priv_flags = IFF_FIFO_QUEUE;
        dev->features = NETIF_F_HW_CSUM;
 }

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 26f69cf763f4..f961fa2131f8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1500,6 +1500,7 @@ struct net_device_ops {
  * @IFF_FAILOVER: device is a failover master device
  * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device
  * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device
+ * @IFF_FIFO_QUEUE: device must run with FIFO qdisc attached. skb drop without NET_XMIT_DROP is fatal
  */
 enum netdev_priv_flags {
        IFF_802_1Q_VLAN                 = 1<<0,
@@ -1532,6 +1533,7 @@ enum netdev_priv_flags {
        IFF_FAILOVER                    = 1<<27,
        IFF_FAILOVER_SLAVE              = 1<<28,
        IFF_L3MDEV_RX_HANDLER           = 1<<29,
+       IFF_FIFO_QUEUE                  = 1<<30,
 };

 #define IFF_802_1Q_VLAN                        IFF_802_1Q_VLAN
@@ -1563,6 +1565,7 @@ enum netdev_priv_flags {
 #define IFF_FAILOVER                   IFF_FAILOVER
 #define IFF_FAILOVER_SLAVE             IFF_FAILOVER_SLAVE
 #define IFF_L3MDEV_RX_HANDLER          IFF_L3MDEV_RX_HANDLER
+#define IFF_FIFO_QUEUE                 IFF_FIFO_QUEUE

 /**
  *     struct net_device - The DEVICE structure.
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index a117d9260558..ed73fe455971 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1049,6 +1049,9 @@ static void attach_one_default_qdisc(struct net_device *dev,
        if (dev->priv_flags & IFF_NO_QUEUE)
                ops = &noqueue_qdisc_ops;

+       if (dev->priv_flags & IFF_FIFO_QUEUE)
+               ops = &pfifo_fast_ops;
+
        qdisc = qdisc_create_dflt(dev_queue, ops, TC_H_ROOT, NULL);
        if (!qdisc) {
                netdev_info(dev, "activation failed\n");

@dtaht

This comment has been minimized.

Copy link

dtaht commented Mar 22, 2019

groovy!

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 25, 2019

Nice! This patch works. Just got around to testing it finally.
The following is on an i.MX6 SoC with the flexcan driver:

$ tc -s qdisc show
qdisc noqueue 0: dev lo root refcnt 2 
 Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc pfifo_fast 0: dev can0 root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
 Sent 48000 bytes 3000 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
qdisc fq_codel 0: dev eth0 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5.0ms interval 100.0ms memory_limit 32Mb ecn 
 Sent 2783 bytes 37 pkt (dropped 0, overlimits 0 requeues 0) 
 backlog 0b 0p requeues 0
  maxpacket 0 drop_overlimit 0 new_flow_count 0 ecn_mark 0
  new_flows_len 0 old_flows_len 0

As expected: no dropped frames.
There is one concern I have with this solution though: While this solves the issue for all CAN interfaces, it is still potentially breaking stuff for other use-cases. Things like EtherCAT and other protocols over raw ethernet come to mind. @MarkJonas mentions Ethernet in his patch comment above. I wonder what his use-case was, and if this solution would be acceptable for him.

@MarkJonas

This comment has been minimized.

Copy link

MarkJonas commented Mar 25, 2019

@yope We are running a proprietary, UDP based protocol over Ethernet. Therefore, it is essential for us to also avoid fq_codel on Ethernet. That's why we completely disabled it on our system: We removed it from the kernel and don't set it for systemd (sysctl net.core.default_qdisc).

Our protocol can handle dropped frames but it is designed with a local network in mind where packet loss is an exception and not normal like on the internet. Thus, the performance of our protocol will suffer badly in combination with fq_codel.

@marckleinebudde

This comment has been minimized.

Copy link
Contributor

marckleinebudde commented Mar 27, 2019

I've send a patch series to netdev. Feel free to join the discussion.

@yope

This comment has been minimized.

Copy link
Author

yope commented Mar 27, 2019

@marckleinebudde Thanks. Nevertheless, this comment is more concerned with systemd than with the netdev patch, so I'll keep it here:
While this patch will solve the problem for CAN devices, I am still concerned that it will not solve potential(?) problems for other networking protocols, like @MarkJonas stated.
Correct me if I am wrong, but queueing disciplines like fq_codel are specialized for certain network applications/protocols, like TCP/IP (I assume) in this case. Aren't these inherently tied to protocols and not to network interface types?
If we solve this particular issue by setting a default for a particular network interface type, we are still assuming that fq_codel is never going to break applications when combined with any other network interface. What about protocols like EtherCAT for example? IPX? Or other applications based on something else than TCP/IP? What would be the implications of UDP-based NFS for example (not that I'd recommend using it in 2019)?
With your proposed patch we are introducing a change from a "one size fits all" approach to a "one size fits all net devices of a certain type" approach, and I doubt that this is the correct way of solving the issue. Shouldn't it rather be "one size fits all TCP/IP sockets" instead?
Shouldn't network management be in charge of setting the most appropriate qdisc at the moment a network is configured for a certain protocol stack?

@marckleinebudde

This comment has been minimized.

Copy link
Contributor

marckleinebudde commented Mar 27, 2019

I've written a patch, that yes solves the CAN problem, but it's a discussion starter on the netdev mailing list. If there are potential problems with non TCP/IP protocols on Ethernet with default fq_codel then the developer should try to (re)produce the problems and join the discussion on the mailing list.

Shouldn't network management be in charge of setting the most appropriate qdisc at the moment a network is configured for a certain protocol stack?

Network Management as in people administrating that server,yes probably. If some user space daemon or the kernel should do it automatically? I'm not sure. But that sounds like a big research topic 😃

fengguang pushed a commit to 0day-ci/linux that referenced this issue Mar 29, 2019

net: sch_generic: add flag IFF_FIFO_QUEUE to use pfifo_fast as defaul…
…t scheduler

There is networking hardware that isn't based on Ethernet for layers 1 and 2.

For example CAN.

CAN is a multi-master serial bus standard for connecting Electronic Control
Units [ECUs] also known as nodes. A frame on the CAN bus carries up to 8 bytes
of payload. Frame corruption is detected by a CRC. However frame loss due to
corruption is possible, but a quite unusual phenomenon.

While fq_codel works great for TCP/IP, it doesn't for CAN. There are a lot of
legacy protocols on top of CAN, which are not build with flow control or high
CAN frame drop rates in mind.

When using fq_codel, as soon as the queue reaches a certain delay based length,
skbs from the head of the queue are silently dropped. Silently meaning that the
user space using a send() or similar syscall doesn't get an error. However
TCP's flow control algorithm will detect dropped packages and adjust the
bandwidth accordingly.

When using fq_codel and sending raw frames over CAN, which is the common use
case, the user space thinks the package has been sent without problems, because
send() returned without an error. pfifo_fast will drop skbs, if the queue
length exceeds the maximum. But with this scheduler the skbs at the tail are
dropped, an error (-ENOBUFS) is propagated to user space. So that the user
space can slow down the package generation.

On distributions, where fq_codel is made default via CONFIG_DEFAULT_NET_SCH
during compile time, or set default during runtime with sysctl
net.core.default_qdisc (see [1]), we get a bad user experience. In my test case
with pfifo_fast, I can transfer thousands of million CAN frames without a frame
drop. On the other hand with fq_codel there is more then one lost CAN frame per
thousand frames.

As pointed out fq_codel is not suited for CAN hardware, so this patch
introduces a new netdev_priv_flag called "IFF_FIFO_QUEUE" (in contrast to the
existing "IFF_NO_QUEUE").

During transition of a netdev from down to up state the default queuing
discipline is attached by attach_default_qdiscs() with the help of
attach_one_default_qdisc(). This patch modifies attach_one_default_qdisc() to
attach the pfifo_fast (pfifo_fast_ops) if the "IFF_FIFO_QUEUE" flag is set.

[1] systemd/systemd#9194

Cc: Dave Taht <dave.taht@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.