Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out #3192

Closed
shenki opened this issue May 18, 2018 · 7 comments
Closed

NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out #3192

shenki opened this issue May 18, 2018 · 7 comments
Assignees
Labels

Comments

@shenki
Copy link
Member

shenki commented May 18, 2018

Romulus running v2.2-17-gabf205c and 4.13.16-a1340b5c31182ee3e191ac745f8e591badaeb586

[   91.139253] ------------[ cut here ]------------
[   91.144052] WARNING: CPU: 0 PID: 7 at /kernel-source//net/sched/sch_generic.c:316 dev_watchdog+0x22c/0x244
[   91.153715] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out
[   91.160624] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.13.16-a1340b5c31182ee3e191ac745f8e591badaeb586 #1
[   91.170368] Hardware name: Generic DT based system
[   91.175213] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[   91.183110] [<80106bc4>] (show_stack) from [<80581bc4>] (dump_stack+0x20/0x28)
[   91.190370] [<80581bc4>] (dump_stack) from [<80115ed4>] (__warn+0xe0/0x108)
[   91.197349] [<80115ed4>] (__warn) from [<80115f50>] (warn_slowpath_fmt+0x54/0x74)
[   91.204857] [<80115f50>] (warn_slowpath_fmt) from [<804a16a8>] (dev_watchdog+0x22c/0x244)
[   91.213180] [<804a16a8>] (dev_watchdog) from [<80158088>] (call_timer_fn+0x40/0x124)
[   91.220958] [<80158088>] (call_timer_fn) from [<80158220>] (expire_timers+0xb4/0xc0)
[   91.228724] [<80158220>] (expire_timers) from [<80158320>] (run_timer_softirq+0xa4/0x19c)
[   91.236923] [<80158320>] (run_timer_softirq) from [<80101600>] (__do_softirq+0xd0/0x2e4)
[   91.245136] [<80101600>] (__do_softirq) from [<80119c8c>] (run_ksoftirqd+0x34/0x4c)
[   91.252835] [<80119c8c>] (run_ksoftirqd) from [<80137794>] (smpboot_thread_fn+0x11c/0x1bc)
[   91.261132] [<80137794>] (smpboot_thread_fn) from [<801335d4>] (kthread+0x150/0x168)
[   91.268895] [<801335d4>] (kthread) from [<801026f8>] (ret_from_fork+0x14/0x3c)
[   91.276114] ---[ end trace 76f10eba546cb007 ]---
@shenki
Copy link
Member Author

shenki commented May 18, 2018

eth0      Link encap:Ethernet  HWaddr 0C:C4:7A:D5:48:43  
          inet addr:169.254.177.119  Bcast:169.254.255.255  Mask:255.255.0.0
          inet6 addr: fe80::ec4:7aff:fed5:4843%710272/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:62886930 errors:0 dropped:62886930 overruns:0 frame:2287
          TX packets:210 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:4024763520 (3.7 GiB)  TX bytes:14322 (13.9 KiB)
          Interrupt:19 

@shenki
Copy link
Member Author

shenki commented May 18, 2018

root@romulus:~# ifconfig eth0 down
[ 5261.344056] ftgmac100 1e660000.ethernet eth0: NCSI interface down
root@romulus:~# ifconfig eth0 up
[ 5263.436681] ftgmac100 1e660000.ethernet eth0: NCSI: No channel found with link
[ 5263.444477] ftgmac100 1e660000.ethernet eth0: NCSI interface down
ifconfig: SIOCSIFFLAGS: No such device

@shenki
Copy link
Member Author

shenki commented May 18, 2018

A few seconds later:

[ 5290.164975] ftgmac100 1e660000.ethernet eth0: NCSI: No channel found with link
[ 5290.172235] ftgmac100 1e660000.ethernet eth0: NCSI interface down
[ 5302.208463] ftgmac100 1e660000.ethernet eth0: NCSI: No channel found with link
[ 5302.216308] ftgmac100 1e660000.ethernet eth0: NCSI interface down
[ 5305.729705] Unable to handle kernel paging request at virtual address d6d70786
[ 5305.736955] pgd = 9ddc4000
[ 5305.739675] [d6d70786] *pgd=00000000
[ 5305.743272] Internal error: Oops: 805 [#1] ARM
[ 5305.747733] CPU: 0 PID: 1217 Comm: dreport Tainted: G        W       4.13.16-a1340b5c31182ee3e191ac745f8e591badaeb586 #1
[ 5305.758591] Hardware name: Generic DT based system
[ 5305.763376] task: 9d6bc120 task.stack: 9dd48000
[ 5305.767919] PC is at rb_erase+0x340/0x3bc
[ 5305.771933] LR is at 0x9e6d0a39
[ 5305.775077] pc : [<8058a480>]    lr : [<9e6d0a39>]    psr: a0000093
[ 5305.781332] sp : 9dd49dd4  ip : d6d70786  fp : 9dd49dec
[ 5305.786549] r10: 80903008  r9 : 9d6bc398  r8 : 8060186c
[ 5305.791767] r7 : 8090e7d8  r6 : 9e49a090  r5 : 8090e7d8  r4 : 9e6c6058
[ 5305.798286] r3 : 9d791118  r2 : 9e6d0a38  r1 : 8090e800  r0 : 9e6c6058
[ 5305.804805] Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
[ 5305.812025] Control: 00c5387d  Table: 9ddc4008  DAC: 00000051
[ 5305.817771] Process dreport (pid: 1217, stack limit = 0x9dd48188)
[ 5305.823862] Stack: (0x9dd49dd4 to 0x9dd4a000)
[ 5305.828227] 9dc0:                                              9e49a098 8090e7d8 9e49a090
[ 5305.836406] 9de0: 9dd49e04 9dd49df0 8013ecec 8058a14c 8090e7a8 9e49a090 9dd49e24 9dd49e08
[ 5305.844585] 9e00: 8013f1a4 8013ecc4 9d6bc120 ffffe000 8090e7a8 00000000 9dd49e74 9dd49e28
[ 5305.852767] 9e20: 80598ca4 8013f0b4 9dd49ea4 9dd49e38 801fa688 801f7654 801f879c 80599154
[ 5305.860949] 9e40: 01200011 6934344a 00000055 ffffe000 9d6bc320 9d6bc120 9dd48000 ffffe000
[ 5305.869128] 9e60: 9d6bc118 80903008 9dd49e8c 9dd49e78 80599154 80598bf4 9dd49ed4 9d6bc320
[ 5305.877310] 9e80: 9dd49ecc 9dd49e90 801182b8 80599118 7e980fd0 9dd49eec 9dd49ef4 6934344a
[ 5305.885491] 9ea0: 8010c588 00000004 00000000 80903008 7e9811f0 00000000 00000000 00000000
[ 5305.893672] 9ec0: 9dd49f2c 9dd49ed0 8011929c 801180c0 0000081f 00000003 00000004 00000000
[ 5305.901853] 9ee0: 00000000 00000000 00000000 00000000 9d6bc120 801174a4 9ddf8274 9ddf8274
[ 5305.910036] 9f00: 00000000 6934344a 00000000 80903008 00000000 00000072 80102804 9dd48000
[ 5305.918216] 9f20: 9dd49fa4 9dd49f30 80119424 80119220 80903008 00000051 00000051 7e98105c
[ 5305.926398] 9f40: 00000014 00000000 00000000 80125f14 00000008 0004e5ac 04000000 76dc6df0
[ 5305.934579] 9f60: 00000000 00000000 00000000 04000000 76dc6df0 00000000 00000000 6934344a
[ 5305.942760] 9f80: 00000000 00000000 000f1d5c 00000000 00000072 80102804 00000000 9dd49fa8
[ 5305.950941] 9fa0: 80102640 8011937c 00000000 000f1d5c ffffffff 7e9811f0 00000000 00000000
[ 5305.959123] 9fc0: 00000000 000f1d5c 00000000 00000072 000f97f0 000f30b8 000f8fc4 000ec724
[ 5305.967304] 9fe0: 000ec16c 7e9811b0 00050c18 76e34d74 60000010 ffffffff 00000000 00000000
[ 5305.975520] [<8058a480>] (rb_erase) from [<8013ecec>] (__dequeue_entity+0x34/0x48)
[ 5305.983113] [<8013ecec>] (__dequeue_entity) from [<8013f1a4>] (pick_next_task_fair+0xfc/0x150)
[ 5305.991738] [<8013f1a4>] (pick_next_task_fair) from [<80598ca4>] (__schedule+0xbc/0x4f0)
[ 5305.999837] [<80598ca4>] (__schedule) from [<80599154>] (schedule+0x48/0xac)
[ 5306.006907] [<80599154>] (schedule) from [<801182b8>] (do_wait+0x204/0x25c)
[ 5306.013882] [<801182b8>] (do_wait) from [<8011929c>] (kernel_wait4+0x88/0x15c)
[ 5306.021113] [<8011929c>] (kernel_wait4) from [<80119424>] (SyS_wait4+0xb4/0xfc)
[ 5306.028444] [<80119424>] (SyS_wait4) from [<80102640>] (ret_fast_syscall+0x0/0x3c)
[ 5306.036021] Code: 1382e001 e582c008 e5802004 e5830004 (158ce000) 
[ 5306.042128] ---[ end trace 76f10eba546cb008 ]---
[ 5306.046744] Kernel panic - not syncing: Fatal exception

@shenki
Copy link
Member Author

shenki commented May 18, 2018

On reboot, this happened straight away:

[   22.235622] ftgmac100 1e660000.ethernet eth0: NCSI: No channel found with link
[   22.242880] ftgmac100 1e660000.ethernet eth0: NCSI interface down
[   86.199809] random: crng init done
[   90.085823] ------------[ cut here ]------------
[   90.090567] WARNING: CPU: 0 PID: 7 at /kernel-source//net/sched/sch_generic.c:316 dev_watchdog+0x22c/0x244
[   90.100358] NETDEV WATCHDOG: eth0 (ftgmac100): transmit queue 0 timed out
[   90.107174] CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 4.13.16-a1340b5c31182ee3e191ac745f8e591badaeb586 #1
[   90.116994] Hardware name: Generic DT based system
[   90.121849] [<80109930>] (unwind_backtrace) from [<80106bc4>] (show_stack+0x20/0x24)
[   90.129622] [<80106bc4>] (show_stack) from [<80581bc4>] (dump_stack+0x20/0x28)
[   90.136871] [<80581bc4>] (dump_stack) from [<80115ed4>] (__warn+0xe0/0x108)
[   90.143956] [<80115ed4>] (__warn) from [<80115f50>] (warn_slowpath_fmt+0x54/0x74)
[   90.151481] [<80115f50>] (warn_slowpath_fmt) from [<804a16a8>] (dev_watchdog+0x22c/0x244)
[   90.159695] [<804a16a8>] (dev_watchdog) from [<80158088>] (call_timer_fn+0x40/0x124)
[   90.167459] [<80158088>] (call_timer_fn) from [<80158220>] (expire_timers+0xb4/0xc0)
[   90.175220] [<80158220>] (expire_timers) from [<80158320>] (run_timer_softirq+0xa4/0x19c)
[   90.183524] [<80158320>] (run_timer_softirq) from [<80101600>] (__do_softirq+0xd0/0x2e4)
[   90.191653] [<80101600>] (__do_softirq) from [<80119c8c>] (run_ksoftirqd+0x34/0x4c)
[   90.199346] [<80119c8c>] (run_ksoftirqd) from [<80137794>] (smpboot_thread_fn+0x11c/0x1bc)
[   90.207639] [<80137794>] (smpboot_thread_fn) from [<801335d4>] (kthread+0x150/0x168)
[   90.215400] [<801335d4>] (kthread) from [<801026f8>] (ret_from_fork+0x14/0x3c)
[   90.222620] ---[ end trace e70b1cf0a025b893 ]---

@sammj
Copy link

sammj commented May 21, 2018

The queue timeouts are in the same class of problem that we've been seeing on Witherspoon - namely that the Broadcom NIC goes on a holiday when the host is powered down.
Hypothetically these issues are fixed by updating the Broadcom microcode to 5719-v1.43 NCSI v1.4.22.0, although I have not had a chance yet to test this on a Romulus.
This is only fixed by a full power cycle so the trace in the above comment is expected.

However the
[ 5305.729705] Unable to handle kernel paging request at virtual address d6d70786
trace is a bit worrying - I'll look into this.

@stale
Copy link

stale bot commented Mar 28, 2019

This issue has been automatically marked as stale because no activity has occurred in the last 6 months. It will be closed if no activity occurs in the next 30 days. If this issue should not be closed please add a comment. Thank you for your understanding and contributions.

@stale stale bot added the stale label Mar 28, 2019
@stale
Copy link

stale bot commented Apr 27, 2019

This issue has been closed because no activity has occurred in the last 7 months. Please reopen if this issue should not have been closed. Thank you for your contributions.

@stale stale bot closed this as completed Apr 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants