Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pi3b+: "kevent 4 may have been dropped" with lan78xx #2447

Closed
jens-maus opened this issue Mar 17, 2018 · 102 comments

Comments

Projects
None yet
@jens-maus
Copy link

commented Mar 17, 2018

Every second or third boot with a 4.14.26 kernel where the LAN78XX support had been enabled I receive the following notices in dmesg:

...
[   16.345186] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.346150] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.347155] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.348168] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.349165] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.350173] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.351150] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.352149] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.353164] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.354165] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   16.355150] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
...

Sometimes the system still continues and is able to setup eth0 properly. However, every now and then the system is not able to bring up eth0 and thus the boot up process is interrupted.

Any idea where this might come from or how to debug this situation?

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Mar 17, 2018

Bugger, thought the kevent issue was ancient history. IIRC, it is because an event is stuck on the system event queue and then another one turns up before it is serviced. But it's not been reported for a couple of years, and that was on the smsc75xx driver. Not sure this is the same thing though, might be a systemd initialisation issue.

@jens-maus

This comment has been minimized.

Copy link
Author

commented Mar 17, 2018

The system I am developing and compiling the raspberrypi kernel for is not using systemd at all but is based on buildroot without systemd.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Mar 18, 2018

OK, do you get the same problem with Raspbian?

@jens-maus

This comment has been minimized.

Copy link
Author

commented Mar 18, 2018

Haven‘t tested it yet with Raspbian. However, this is the kernel defconfig I am using:

https://github.com/jens-maus/RaspberryMatic/blob/master/buildroot-external/board/raspberrypi3/kernel_defconfig

@Bilg21

This comment has been minimized.

Copy link

commented Mar 22, 2018

I have the same issue with Compute Module 3. But it looks like a hardware issue. It's not happening on another setup which is identical.

@danijelt

This comment has been minimized.

Copy link

commented Apr 13, 2018

I am also using Buildroot to make my own image, and I managed to resolve this issue by disabling VLAN support in kernel.
In my case, I do have systemd, and this message appears after systemd starts hostname service:

[  OK  ] Started Hostname Service.
[   17.326467] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   17.326507] 8021q: adding VLAN 0 to HW filter on device eth0
[   17.338617] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   17.345081] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped

After disabling VLAN support (CONFIG_BRIDGE_VLAN_FILTERING, CONFIG_VLAN_8021Q and CONFIG_VLAN_8021Q_GVRP) in latest rpi-4.14-y (4.14.33) this message stopped appearing almost completely, and I don't have the lockup bug anymore.

Firmware is ae9a493932e47e08cabb25a2728037298075fd00 and userland is a343dcad1dae4e93f4bfb99496697e207f91027e.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2018

@danijelt @jens-maus We have committed three or four fixes to VLAN support in the lan78xx driver since 4.14.33, so its probably worth getting the latest kernel and seeing if it has fixed the kevent issue with VLAN enabled. There are also other lan78xx fixes in there, so well worth trying to see if the kevent issues has gone.

If you do find it has gone, please close this issue.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented May 11, 2018

@danijelt @jens-maus If no further reports/updates are forthcoming, I'm inclined to close this report.

@danijelt

This comment has been minimized.

Copy link

commented May 11, 2018

We have released a stable production image without VLAN and I'm currently occupied with another project.

I'll keep this in mind during the next update and report here if kevent/VLAN issue persists.

@fhunleth

This comment has been minimized.

Copy link

commented Jun 2, 2018

I was able to easily reproduce this issue with the VLAN options disabled when running Linux 4.14.29 at commit c117a8b. I just tried updating to the latest rpi-4.14-y (4.14.44) at commit 4fca48b, and I can no longer reproduce the kevent 4 issue.

@bcutter

This comment has been minimized.

Copy link

commented Jun 6, 2018

I also saw this some times on my Pi 3 B+. Currently the Pi is completely dead, on console screen the

"** 1 printk messages dropped ** [ 3808.632569] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped"

messages go wild meaning every second about 50 of those lines are shown. Only one option: hard reset of the system. Resulting in fsck errors and so on.

Raspbian Stretch by the way. 1 Gbit/s ethernet port.
Anything I can do on this? Will there be a fix? A workaround?

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

@bcutter Are you using the latest kernel? You can apt update, to get an official release, or rpi-update to get the latest bleeding edge one. You might need rpi-update, but backup first. There have been a number of kernel updates so worth trying.

@bcutter

This comment has been minimized.

Copy link

commented Jun 6, 2018

Currently on 4.14.34-v7+. Latest official release. I try to avoid using rpi-update, cause of the risk of running into other issues (maybe fix one and maybe grab some others - no perfect chance/risk ratio :-))... maybe my opinion will change temporarily if this kevent4 keeps on appearing.

@vnd

This comment has been minimized.

Copy link

commented Jun 14, 2018

I'm also experiencing this issue occasionally, 4.14.48-v7+.
upd.: 3B+, error message exactly the same as in the first comment.
upd2.: mostly happened on boot, but probably not limited to boot only (power turns off occasinally, so can't be sure).

@bcutter

This comment has been minimized.

Copy link

commented Jun 14, 2018

Updated to 4.14.48-v7+ recently, will monitor future behaviour.
But @vnd´s comment doesn´t make me feel positive about this...

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Please note, the 3B error reporting kevent 0 is, I think, a different issue - please keep this thread to 3B+.

For those people using 3B+, do you still see the issue after a rpi-update? Report above seems to indicate it has gone away with the latest, but others say no, but I would like to confirm that it is still present, using the latest kernel on the 3B+.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Also, can I just confirm this only happens during startup? Or have people encountered this error in general usage?

@bcutter

This comment has been minimized.

Copy link

commented Jun 14, 2018

I always experienced this during normal usage a few hours or even days after startup.

@fhunleth

This comment has been minimized.

Copy link

commented Jun 14, 2018

I'm no longer able to reproduce it, but when I could, it was by running ip link set dev eth0 down and ip link set dev eth0 up repeatedly.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Just some notes from a a quick look at the driver code. kevent 4 is EVENT_LINK_RESET. This is only fired off in two places, first during opening the driver, and secondly when an interrupt is received (via USB, so not a 'direct line') from the PHY. I believe the only time an interrupt is received requires the link to be reset so this makes sense.

The action taken when the kevent is processed is fairly minimal, most work done is in lan78xx_link_reset, which does a bunch of phy/register reads and writes. Nothing I've seen that would take any real amounts of time. Note though that register reads all go via USB, so they could be delayed if the other end is busy.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 14, 2018

Been giving the latest kernel some stress testing on a debug enabled lan78xx driver. Running stress-ng --cpu 4, so loading the CPU's right up, then making the ethernet go up and down as quickly as possible (ip link set dev eth0 down/up), whilst monitoring for any kevents. No errors at all after well over an hour of this. Will leave it running for a few more hours.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 15, 2018

Cannot get this to go wrong. Run test scripts for hours, no issues. Clearly not exercising the right runes. If anyone seeing this has anything unusual in their networking setup, please let me know here. But for the moment I have to look at some other stuff.

@bcutter

This comment has been minimized.

Copy link

commented Jun 15, 2018

The only two „special“ things I have are:

  1. Second virtual interface, so:
    eth0
    eth0:0
    where the second one has a different IP configuration.
  2. Various HDDs connected directly or with a hub via USB to the Pi - which is quite different to many other setups e.g. of friends of mine I can compare to (and they don’t see this issue at all)

Hope this helps a bit.

@danijelt

This comment has been minimized.

Copy link

commented Jun 20, 2018

I have tried with latest 4.14.50 kernel (commit 3b01f05):

[    0.000000] Linux version 4.14.50 (danijel@dev) (gcc version 5.4.0 (Buildroot 2017.05-git-gd5acf12b1)) #1 SMP Wed Jun 20 10:59:49 CEST 2018

Still having the same problem when I enable VLAN support:

[  OK  ] Started Network Manager Script Dispatcher Service.
         Starting Hostname Service...
[  OK  ] Started Hostname Service.
[   17.853152] 8021q: adding VLAN 0 to HW filter on device eth0
[   17.853178] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped
[   17.853924] lan78xx 1-1.1.1:1.0 eth0: kevent 4 may have been dropped

My setup: Buildroot 2017.05, systemd 232, NetworkManager 1.8.0. kevent error appears appears after systemd-hostnamed service is started and kernel notifies that VLAN 0 is added.

@JamesH65

This comment has been minimized.

Copy link
Contributor

commented Jun 20, 2018

@danijelt Does this only occur during startup? Does it ever cause any problems or does everything still work OK?

@6by9

This comment has been minimized.

Copy link
Contributor

commented Jun 20, 2018

Configuring the VLAN mask and setting up multicast subscriptions both use worker threads
https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L1036
https://github.com/raspberrypi/linux/blob/rpi-4.14.y/drivers/net/usb/lan78xx.c#L2325

VLAN takes data that is precomputed, and multicast works out the settings in the worker, so being requested twice before having been called for the first time should have no side effect (minor assumption that the flag is cleared before the worker task is executed).
They're independent work queues, so there shouldn't be a conflict between the two. Unless it's indicative of a worker thread having wedged totally, and that is then preventing any other worker thread from running. I don't know enough about how INIT_WORK works when it comes to scheduling.

@danijelt

This comment has been minimized.

Copy link

commented Jun 21, 2018

@JamesH65 Yes, only during startup. Once it boots (if kevent 4 doesn't happen), I don't have any problems anymore.

I don't do anything special with the network, just bring it up at boot and leave it that way. I'm not even touching anything VLAN related, adding VLAN 0 is probably NetworkManager's (or kernel's) default behaviour.

popcornmix added a commit that referenced this issue Mar 12, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Mar 15, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Mar 15, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Mar 15, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Mar 21, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Mar 21, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 2, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 2, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 2, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

artynet added a commit to artynet/rpi-linux that referenced this issue Apr 3, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: raspberrypi#2447

popcornmix added a commit that referenced this issue Apr 8, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 8, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

Gadgetoid added a commit to Gadgetoid/linux that referenced this issue Apr 10, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: raspberrypi#2447

popcornmix added a commit that referenced this issue Apr 18, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 18, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 18, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 23, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 23, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 23, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 30, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 30, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 30, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue Apr 30, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 7, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 7, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 7, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 13, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 13, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447

popcornmix added a commit that referenced this issue May 13, 2019

lan78xx: Debounce link events to minimize poll storm
The bInterval is set to 4 (i.e. 8 microframes => 1ms) and the only bit
that the driver pays attention to is "link was reset". If there's a
flapping status bit in that endpoint data, (such as if PHY negotiation
needs a few tries to get a stable link) then polling at a slower rate
would act as a de-bounce.

See: #2447
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.