Interrupt collision between smsc95xx and USB storage drivers under heavy load #9

Closed
benosteen opened this Issue Apr 21, 2012 · 115 comments

Comments

Projects
None yet
@benosteen

Steps to reproduce:

  1. Lots of files on a USB drive, plugged in and mounted.
  2. Begin a download of a large file (100Mb+ is suggested) to that USB drive.
  3. During download, try to access large numbers of files (suggestions to follow)

This will at some indeterminate point freeze the system with kernel panics from the USB storage driver - "... not syncing: Fatal exception in interrupt" and kernel errors from the ethernet driver : "kevent may have dropped the interrupt."

Suggested means to replicate step 3)

If rootfs is on USB, apt-get install'ing a group of packages, apt-cache search and so on are good ways to uncover this collision.
Otherwise, searching or grepping through a reasonable number of files on the USB is enough (find . | xargs grep -i "foo") for example.

It is hard to capture this error, as the kern.log doesn't sync the errors to disc, and the errors flash by too fast on tty to see them with any clarity.

Recreated with latest kernel + UAS built in and new modules and with kernel modules from 13/04 - with rootfs on USB and with the stock rootfs on SD. Having the rootfs on SD makes it more difficult to simulate the type of storage demand required to replicate the bug however.

@Hexxeh

This comment has been minimized.

Show comment
Hide comment
@Hexxeh

Hexxeh Apr 21, 2012

I'm seeing this issue too. Possible regression seeing as I don't recall having this problem before, despite having downloaded the same file before. With the latest files, it happens every time I download the file in question.

Hexxeh commented Apr 21, 2012

I'm seeing this issue too. Possible regression seeing as I don't recall having this problem before, despite having downloaded the same file before. With the latest files, it happens every time I download the file in question.

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 22, 2012

You might want to put a serial tty on there to capture the errors.

Someone put a screenshot up in the forum of what sounds like the same issue:
http://www.raspberrypi.org/forum/troubleshooting/external-hard-drive-kernel-panic#p67994

shirro commented Apr 22, 2012

You might want to put a serial tty on there to capture the errors.

Someone put a screenshot up in the forum of what sounds like the same issue:
http://www.raspberrypi.org/forum/troubleshooting/external-hard-drive-kernel-panic#p67994

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 22, 2012

Contributor

I have seen a kernel panic from dwc_oth driver when copying files from network. Strangely the same experiment doesn't fail at work (or on machine of the colleague who knows this driver best). I had serial connected so got a call stack. Not sure if this is the same issue.

Need a test case that can be made to fail on colleague's setup.

[ 528.407851] Unable to handle kernel paging request at virtual address 88ad3e90 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND [ 528.415071] pgd = c69440003824 552 S 39.5 3.1 1:23.70 rsyslogd [ 528.417771] [88ad3e90] *pgd=00000000S 33.1 1.3 3:28.40 fiberlamp [ 528.421347] Internal error: Oops: 5 [#1].4 12.1 1:17.94 Xorg 1500 root 20 0 4240 1060 400 D 7.0 0.9 0:21.73 cp Entering kdb (current=0xc78c6e40, pid 850) Oops: (null)04.86 top due to oops @ 0xc0225fb4 0 0 0 S 1.2 0.0 0:03.97 kworker/0:0 321 root 20 0 0 0 0 S 0.6 0.0 0:02.86 kswapd0 Pid: 850, comm: ifplugd 0 S 0.6 0.0 0:03.42 mmcqd/0 CPU: 0 Not tainted (3.1.9+ #224) 0 S 0.3 0.0 0:02.73 rcu_kthread PC is at memcpy+0x114/0x3300 0 0 S 0.3 0.0 0:00.15 kworker/0:2 LR is at DWC_MEMCPY+0x18/0x1c3932 2284 S 0.3 3.2 0:02.65 lxpanel pc : [<c0225fb4>] lr : [<c02cf8f0>] psr: 6000019300.89 init sp : c7acdb44 ip : 00000002 fp : c7acdb5c.0 0.0 0:00.00 kthreadd r10: 88ad3e92 r9 : c798f908 r8 : c79f6620.0 0.0 0:00.00 ksoftirqd/0 r7 : c798f8c0 r6 : 0000ffff r5 : c68e2560 r4 : c79890c022 kworker/u:0 r3 : 893bc77f r2 : 3859066a r1 : 88ad3e90 r0 : ffdd000000 khelper Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 00c5387d Table: 06944008 DAC: 00000015 [<c000f6b4>] (show_regs+0x0/0x58) from [<c007a8cc>] (kdb_dumpregs+0x38/0x60) r4:c05dbff8 r3:00000001 [<c007a894>] (kdb_dumpregs+0x0/0x60) from [<c007d63c>] (kdb_main_loop+0x56c/0x7ac) r6:00000005 r5:c7acdaf8 r4:c05dc204 r3:893bc77f [<c007d0d0>] (kdb_main_loop+0x0/0x7ac) from [<c007ff50>] (kdb_stub+0x280/0x3f8) [<c007fcd0>] (kdb_stub+0x0/0x3f8) from [<c0076678>] (kgdb_handle_exception+0x160/0x64c) [<c0076518>] (kgdb_handle_exception+0x0/0x64c) from [<c00147e0>] (kgdb_notify+0x3c/0x74) [<c00147a4>] (kgdb_notify+0x0/0x74) from [<c03b8810>] (notifier_call_chain+0x54/0x94) r6:00000000 r5:00000000 r4:fffffffc r3:c00147a4 more> [<c03b87bc>] (notifier_call_chain+0x0/0x94) from [<c03b88a4>] (atomic_notifier_call_chain+0x28/0x30) r8:c78c6e40 r7:00000005 r6:c04898cc r5:c7acdaf8 r4:c7acc000 r3:ffffffff [<c03b887c>] (atomic_notifier_call_chain+0x0/0x30) from [<c03b88ec>] (notify_die+0x40/0x4c) [<c03b88ac>] (notify_die+0x0/0x4c) from [<c00123a4>] (die+0xb0/0x364) [<c00122f4>] (die+0x0/0x364) from [<c0018bd4>] (__do_kernel_fault+0x74/0x94) [<c0018b60>] (__do_kernel_fault+0x0/0x94) from [<c03b8440>] (do_page_fault+0xa4/0x36c) r8:c7acc000 r7:00000005 r6:c79944e0 r5:88ad3e90 r4:c7acdaf8 r3:c7acdaf8 [<c03b839c>] (do_page_fault+0x0/0x36c) from [<c03b87b4>] (do_translation_fault+0xac/0xb4) [<c03b8708>] (do_translation_fault+0x0/0xb4) from [<c0008340>] (do_DataAbort+0x40/0xa8) r7:00000005 r6:c057f404 r5:88ad3e90 r4:00000005 [<c0008300>] (do_DataAbort+0x0/0xa8) from [<c03b68dc>] (__dabt_svc+0x3c/0x60) Exception stack(0xc7acdaf8 to 0xc7acdb40) dae0: ffdd0000 88ad3e90 db00: 3859066a 893bc77f c79890c0 c68e2560 0000ffff c798f8c0 c79f6620 c798f908 db20: 88ad3e92 c7acdb5c 00000002 c7acdb44 c02cf8f0 c0225fb4 60000193 ffffffff r8:c79f6620 r7:c7acdb2c r6:ffffffff r5:60000193 r4:c0225fb4 [<c02cf8d8>] (DWC_MEMCPY+0x0/0x1c) from [<c02c4c1c>] (assign_and_init_hc+0x250/0x58c) [<c02c49cc>] (assign_and_init_hc+0x0/0x58c) from [<c02c5c5c>] (dwc_otg_hcd_select_transactions+0x11c/0x18c) [<c02c5b40>] (dwc_otg_hcd_select_transactions+0x0/0x18c) from [<c02c8fbc>] (dwc_otg_hcd_handle_sof_intr+0xb4/0xe4) [<c02c8f08>] (dwc_otg_hcd_handle_sof_intr+0x0/0xe4) from [<c02ca428>] (dwc_otg_hcd_handle_intr+0xd4/0x120) more> r6:00000008 r5:c798f8c0 r4:00000008 r3:00000000 [<c02ca354>] (dwc_otg_hcd_handle_intr+0x0/0x120) from [<c02c7cfc>] (dwc_otg_hcd_irq+0x1c/0x28) r7:00000000 r6:00000001 r5:60000193 r4:c797ddc0 [<c02c7ce0>] (dwc_otg_hcd_irq+0x0/0x28) from [<c02a5c54>] (usb_hcd_irq+0x48/0xc0) [<c02a5c0c>] (usb_hcd_irq+0x0/0xc0) from [<c0080da8>] (handle_irq_event_percpu+0x68/0x258) r6:0000004b r5:0000004b r4:c79825e0 r3:c02a5c0c [<c0080d40>] (handle_irq_event_percpu+0x0/0x258) from [<c0080fd0>] (handle_irq_event+0x38/0x48) [<c0080f98>] (handle_irq_event+0x0/0x48) from [<c0082c2c>] (handle_level_irq+0x90/0x108) r4:c0586edc r3:00020000 [<c0082b9c>] (handle_level_irq+0x0/0x108) from [<c00806ec>] (generic_handle_irq+0x3c/0x50) r4:c0593dac r3:c0082b9c [<c00806b0>] (generic_handle_irq+0x0/0x50) from [<c000efcc>] (handle_IRQ+0x40/0x94) [<c000ef8c>] (handle_IRQ+0x0/0x94) from [<c0008470>] (asm_do_IRQ+0x18/0x1c) r6:f200b200 r5:60000113 r4:c029b0f4 r3:c057ae94 [<c0008458>] (asm_do_IRQ+0x0/0x1c) from [<c03b6938>] (__irq_svc+0x38/0xc0) Exception stack(0xc7acdcf0 to 0xc7acdd38) dce0: 00000004 00000114 00000840 c058b720 dd00: c7a4cb80 c7a4cb80 00000114 00000840 00000001 00000000 bed6e9d0 c7acdd6c dd20: c7acdd70 c7acdd38 c029b210 c029b0f4 60000113 ffffffff [<c029b0c8>] (smsc95xx_write_reg+0x0/0xe0) from [<c029b210>] (smsc95xx_mdio_read+0x68/0xe0) r7:00000001 r6:c7a4cb98 r5:00000001 r4:c7a4cb80 [<c029b1a8>] (smsc95xx_mdio_read+0x0/0xe0) from [<c029a440>] (mii_link_ok+0x40/0x50) more> r8:c7acc000 r7:c03e731c r6:bed6e9f0 r5:0000000a r4:c7a4cc14 [<c029a400>] (mii_link_ok+0x0/0x50) from [<c029cfcc>] (usbnet_get_link+0x50/0x5c) r4:c7a4c800 r3:c029b1a8 [<c029cf7c>] (usbnet_get_link+0x0/0x5c) from [<c031df00>] (dev_ethtool+0x2010/0x25e4) [<c031bef0>] (dev_ethtool+0x0/0x25e4) from [<c0319e9c>] (dev_ioctl+0x5b4/0x8e4) [<c03198e8>] (dev_ioctl+0x0/0x8e4) from [<c0302a64>] (sock_ioctl+0xa0/0x280) [<c03029c4>] (sock_ioctl+0x0/0x280) from [<c00f7518>] (do_vfs_ioctl+0x8c/0x590) r7:00000007 r6:c7523380 r5:bed6e9d0 r4:bed6e9d0 [<c00f748c>] (do_vfs_ioctl+0x0/0x590) from [<c00f7a64>] (sys_ioctl+0x48/0x70) [<c00f7a1c>] (sys_ioctl+0x0/0x70) from [<c000e000>] (ret_fast_syscall+0x0/0x48) r7:00000036 r6:01787008 r5:00000007 r4:bed6ead8

Contributor

popcornmix commented Apr 22, 2012

I have seen a kernel panic from dwc_oth driver when copying files from network. Strangely the same experiment doesn't fail at work (or on machine of the colleague who knows this driver best). I had serial connected so got a call stack. Not sure if this is the same issue.

Need a test case that can be made to fail on colleague's setup.

[ 528.407851] Unable to handle kernel paging request at virtual address 88ad3e90 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND [ 528.415071] pgd = c69440003824 552 S 39.5 3.1 1:23.70 rsyslogd [ 528.417771] [88ad3e90] *pgd=00000000S 33.1 1.3 3:28.40 fiberlamp [ 528.421347] Internal error: Oops: 5 [#1].4 12.1 1:17.94 Xorg 1500 root 20 0 4240 1060 400 D 7.0 0.9 0:21.73 cp Entering kdb (current=0xc78c6e40, pid 850) Oops: (null)04.86 top due to oops @ 0xc0225fb4 0 0 0 S 1.2 0.0 0:03.97 kworker/0:0 321 root 20 0 0 0 0 S 0.6 0.0 0:02.86 kswapd0 Pid: 850, comm: ifplugd 0 S 0.6 0.0 0:03.42 mmcqd/0 CPU: 0 Not tainted (3.1.9+ #224) 0 S 0.3 0.0 0:02.73 rcu_kthread PC is at memcpy+0x114/0x3300 0 0 S 0.3 0.0 0:00.15 kworker/0:2 LR is at DWC_MEMCPY+0x18/0x1c3932 2284 S 0.3 3.2 0:02.65 lxpanel pc : [<c0225fb4>] lr : [<c02cf8f0>] psr: 6000019300.89 init sp : c7acdb44 ip : 00000002 fp : c7acdb5c.0 0.0 0:00.00 kthreadd r10: 88ad3e92 r9 : c798f908 r8 : c79f6620.0 0.0 0:00.00 ksoftirqd/0 r7 : c798f8c0 r6 : 0000ffff r5 : c68e2560 r4 : c79890c022 kworker/u:0 r3 : 893bc77f r2 : 3859066a r1 : 88ad3e90 r0 : ffdd000000 khelper Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 00c5387d Table: 06944008 DAC: 00000015 [<c000f6b4>] (show_regs+0x0/0x58) from [<c007a8cc>] (kdb_dumpregs+0x38/0x60) r4:c05dbff8 r3:00000001 [<c007a894>] (kdb_dumpregs+0x0/0x60) from [<c007d63c>] (kdb_main_loop+0x56c/0x7ac) r6:00000005 r5:c7acdaf8 r4:c05dc204 r3:893bc77f [<c007d0d0>] (kdb_main_loop+0x0/0x7ac) from [<c007ff50>] (kdb_stub+0x280/0x3f8) [<c007fcd0>] (kdb_stub+0x0/0x3f8) from [<c0076678>] (kgdb_handle_exception+0x160/0x64c) [<c0076518>] (kgdb_handle_exception+0x0/0x64c) from [<c00147e0>] (kgdb_notify+0x3c/0x74) [<c00147a4>] (kgdb_notify+0x0/0x74) from [<c03b8810>] (notifier_call_chain+0x54/0x94) r6:00000000 r5:00000000 r4:fffffffc r3:c00147a4 more> [<c03b87bc>] (notifier_call_chain+0x0/0x94) from [<c03b88a4>] (atomic_notifier_call_chain+0x28/0x30) r8:c78c6e40 r7:00000005 r6:c04898cc r5:c7acdaf8 r4:c7acc000 r3:ffffffff [<c03b887c>] (atomic_notifier_call_chain+0x0/0x30) from [<c03b88ec>] (notify_die+0x40/0x4c) [<c03b88ac>] (notify_die+0x0/0x4c) from [<c00123a4>] (die+0xb0/0x364) [<c00122f4>] (die+0x0/0x364) from [<c0018bd4>] (__do_kernel_fault+0x74/0x94) [<c0018b60>] (__do_kernel_fault+0x0/0x94) from [<c03b8440>] (do_page_fault+0xa4/0x36c) r8:c7acc000 r7:00000005 r6:c79944e0 r5:88ad3e90 r4:c7acdaf8 r3:c7acdaf8 [<c03b839c>] (do_page_fault+0x0/0x36c) from [<c03b87b4>] (do_translation_fault+0xac/0xb4) [<c03b8708>] (do_translation_fault+0x0/0xb4) from [<c0008340>] (do_DataAbort+0x40/0xa8) r7:00000005 r6:c057f404 r5:88ad3e90 r4:00000005 [<c0008300>] (do_DataAbort+0x0/0xa8) from [<c03b68dc>] (__dabt_svc+0x3c/0x60) Exception stack(0xc7acdaf8 to 0xc7acdb40) dae0: ffdd0000 88ad3e90 db00: 3859066a 893bc77f c79890c0 c68e2560 0000ffff c798f8c0 c79f6620 c798f908 db20: 88ad3e92 c7acdb5c 00000002 c7acdb44 c02cf8f0 c0225fb4 60000193 ffffffff r8:c79f6620 r7:c7acdb2c r6:ffffffff r5:60000193 r4:c0225fb4 [<c02cf8d8>] (DWC_MEMCPY+0x0/0x1c) from [<c02c4c1c>] (assign_and_init_hc+0x250/0x58c) [<c02c49cc>] (assign_and_init_hc+0x0/0x58c) from [<c02c5c5c>] (dwc_otg_hcd_select_transactions+0x11c/0x18c) [<c02c5b40>] (dwc_otg_hcd_select_transactions+0x0/0x18c) from [<c02c8fbc>] (dwc_otg_hcd_handle_sof_intr+0xb4/0xe4) [<c02c8f08>] (dwc_otg_hcd_handle_sof_intr+0x0/0xe4) from [<c02ca428>] (dwc_otg_hcd_handle_intr+0xd4/0x120) more> r6:00000008 r5:c798f8c0 r4:00000008 r3:00000000 [<c02ca354>] (dwc_otg_hcd_handle_intr+0x0/0x120) from [<c02c7cfc>] (dwc_otg_hcd_irq+0x1c/0x28) r7:00000000 r6:00000001 r5:60000193 r4:c797ddc0 [<c02c7ce0>] (dwc_otg_hcd_irq+0x0/0x28) from [<c02a5c54>] (usb_hcd_irq+0x48/0xc0) [<c02a5c0c>] (usb_hcd_irq+0x0/0xc0) from [<c0080da8>] (handle_irq_event_percpu+0x68/0x258) r6:0000004b r5:0000004b r4:c79825e0 r3:c02a5c0c [<c0080d40>] (handle_irq_event_percpu+0x0/0x258) from [<c0080fd0>] (handle_irq_event+0x38/0x48) [<c0080f98>] (handle_irq_event+0x0/0x48) from [<c0082c2c>] (handle_level_irq+0x90/0x108) r4:c0586edc r3:00020000 [<c0082b9c>] (handle_level_irq+0x0/0x108) from [<c00806ec>] (generic_handle_irq+0x3c/0x50) r4:c0593dac r3:c0082b9c [<c00806b0>] (generic_handle_irq+0x0/0x50) from [<c000efcc>] (handle_IRQ+0x40/0x94) [<c000ef8c>] (handle_IRQ+0x0/0x94) from [<c0008470>] (asm_do_IRQ+0x18/0x1c) r6:f200b200 r5:60000113 r4:c029b0f4 r3:c057ae94 [<c0008458>] (asm_do_IRQ+0x0/0x1c) from [<c03b6938>] (__irq_svc+0x38/0xc0) Exception stack(0xc7acdcf0 to 0xc7acdd38) dce0: 00000004 00000114 00000840 c058b720 dd00: c7a4cb80 c7a4cb80 00000114 00000840 00000001 00000000 bed6e9d0 c7acdd6c dd20: c7acdd70 c7acdd38 c029b210 c029b0f4 60000113 ffffffff [<c029b0c8>] (smsc95xx_write_reg+0x0/0xe0) from [<c029b210>] (smsc95xx_mdio_read+0x68/0xe0) r7:00000001 r6:c7a4cb98 r5:00000001 r4:c7a4cb80 [<c029b1a8>] (smsc95xx_mdio_read+0x0/0xe0) from [<c029a440>] (mii_link_ok+0x40/0x50) more> r8:c7acc000 r7:c03e731c r6:bed6e9f0 r5:0000000a r4:c7a4cc14 [<c029a400>] (mii_link_ok+0x0/0x50) from [<c029cfcc>] (usbnet_get_link+0x50/0x5c) r4:c7a4c800 r3:c029b1a8 [<c029cf7c>] (usbnet_get_link+0x0/0x5c) from [<c031df00>] (dev_ethtool+0x2010/0x25e4) [<c031bef0>] (dev_ethtool+0x0/0x25e4) from [<c0319e9c>] (dev_ioctl+0x5b4/0x8e4) [<c03198e8>] (dev_ioctl+0x0/0x8e4) from [<c0302a64>] (sock_ioctl+0xa0/0x280) [<c03029c4>] (sock_ioctl+0x0/0x280) from [<c00f7518>] (do_vfs_ioctl+0x8c/0x590) r7:00000007 r6:c7523380 r5:bed6e9d0 r4:bed6e9d0 [<c00f748c>] (do_vfs_ioctl+0x0/0x590) from [<c00f7a64>] (sys_ioctl+0x48/0x70) [<c00f7a1c>] (sys_ioctl+0x0/0x70) from [<c000e000>] (ret_fast_syscall+0x0/0x48) r7:00000036 r6:01787008 r5:00000007 r4:bed6ead8

@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Apr 22, 2012

We may have two separate bugs then, as that doesn't look that familiar.

I'll reconnect the UART and see if I can recreate the USB heavy load one.

(Hexxeh pointed out on IRC that current draw could be a factor, I agree but I can only measure this if I power it via the GPIO pins - does this skip the polyfuse?)

We may have two separate bugs then, as that doesn't look that familiar.

I'll reconnect the UART and see if I can recreate the USB heavy load one.

(Hexxeh pointed out on IRC that current draw could be a factor, I agree but I can only measure this if I power it via the GPIO pins - does this skip the polyfuse?)

@Hexxeh

This comment has been minimized.

Show comment
Hide comment
@Hexxeh

Hexxeh Apr 22, 2012

Pretty sure I recall reading somewhere that it /does/ indeed bypass the polyfuse.

Hexxeh commented Apr 22, 2012

Pretty sure I recall reading somewhere that it /does/ indeed bypass the polyfuse.

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 23, 2012

My Pi is in transit but I thought I would grab the Debian image and run ksymoops to investigate some of the stacktraces being posted only to discover there is no System.map on the image.

It would be REALLY handy to have a System.map included with the default image. Otherwise we all have to compile our own kernels and trigger the crashes ourselves to debug these things.

shirro commented Apr 23, 2012

My Pi is in transit but I thought I would grab the Debian image and run ksymoops to investigate some of the stacktraces being posted only to discover there is no System.map on the image.

It would be REALLY handy to have a System.map included with the default image. Otherwise we all have to compile our own kernels and trigger the crashes ourselves to debug these things.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 23, 2012

Contributor

@shirro Good point. I'll include System.map with next github update.

I think this is the map from latest github firmware.
http://dl.dropbox.com/u/3669512/stable/System.map.git

I think this is the map from latest debian firmware.
http://dl.dropbox.com/u/3669512/stable/System.map.deb

(I believe they are the same code, but were built on different machines, so the offsets are slightly different)

Contributor

popcornmix commented Apr 23, 2012

@shirro Good point. I'll include System.map with next github update.

I think this is the map from latest github firmware.
http://dl.dropbox.com/u/3669512/stable/System.map.git

I think this is the map from latest debian firmware.
http://dl.dropbox.com/u/3669512/stable/System.map.deb

(I believe they are the same code, but were built on different machines, so the offsets are slightly different)

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 23, 2012

Contributor

Okay the screenshot has:
c022bd90 T DWC_MEMCPY
at top of stack so looks like the same panic as my one.

If you use kernel_debug.img (from github) instead of kernel.img you should get stacktrace with function names.

Contributor

popcornmix commented Apr 23, 2012

Okay the screenshot has:
c022bd90 T DWC_MEMCPY
at top of stack so looks like the same panic as my one.

If you use kernel_debug.img (from github) instead of kernel.img you should get stacktrace with function names.

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 23, 2012

ksymoops looks to be well deprecated since the 2.4 days since the kernel usually prints out the symbols these days. I must be getting old. Perhaps we need that on by default? I just grepped the number out of a pastebin mozzwald put on irc and it is DWC_MEMCPY as well. Perhaps having the html docs in there will not be such a bad thing after all :-)

http://pastebin.com/u4C98Tfq

shirro commented Apr 23, 2012

ksymoops looks to be well deprecated since the 2.4 days since the kernel usually prints out the symbols these days. I must be getting old. Perhaps we need that on by default? I just grepped the number out of a pastebin mozzwald put on irc and it is DWC_MEMCPY as well. Perhaps having the html docs in there will not be such a bad thing after all :-)

http://pastebin.com/u4C98Tfq

@abishur

This comment has been minimized.

Show comment
Hide comment
@abishur

abishur Apr 23, 2012

This thread

http://www.raspberrypi.org/forum/troubleshooting/kernel-panic-on-concurrent-network-and-usb-storage

has a screenshot of the kernel panic I've uploaded two incidents of the panic where I was transferring data to or from a usb attached hard drive on the pi

abishur commented Apr 23, 2012

This thread

http://www.raspberrypi.org/forum/troubleshooting/kernel-panic-on-concurrent-network-and-usb-storage

has a screenshot of the kernel panic I've uploaded two incidents of the panic where I was transferring data to or from a usb attached hard drive on the pi

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 23, 2012

Contributor

Can anyone rule out a 5V power supply issue
E.g. use a 10W ipad charger with high quality USB cable with measured ~5V at board, and still observe the issue.
I don't think it is this, but it is something that needs ruling out.

Contributor

popcornmix commented Apr 23, 2012

Can anyone rule out a 5V power supply issue
E.g. use a 10W ipad charger with high quality USB cable with measured ~5V at board, and still observe the issue.
I don't think it is this, but it is something that needs ruling out.

@abishur

This comment has been minimized.

Show comment
Hide comment
@abishur

abishur Apr 23, 2012

I'm using a 5V 1A HTC charger with high quality usb cable. Does that count?

abishur commented Apr 23, 2012

I'm using a 5V 1A HTC charger with high quality usb cable. Does that count?

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 23, 2012

Contributor

If you've measured the voltage between TP1 and TP2 then yes...

Contributor

popcornmix commented Apr 23, 2012

If you've measured the voltage between TP1 and TP2 then yes...

@abishur

This comment has been minimized.

Show comment
Hide comment
@abishur

abishur Apr 23, 2012

4.75V at full load (two usb devices, ethernet, and hdmi), and error still occurs

abishur commented Apr 23, 2012

4.75V at full load (two usb devices, ethernet, and hdmi), and error still occurs

@mozzwald

This comment has been minimized.

Show comment
Hide comment
@mozzwald

mozzwald Apr 23, 2012

Here is boot log up to kernel panic while trying to download to USB hard drive: http://pastebin.com/u4C98Tfq

My current setup is:

  • Latest debian image on Transcend 4GB Class 6 card
  • Raspi power 5V 1.5A supply, Voltage never goes below 4.84V on the test points and current draw averages 400mA to 500mA.
  • USB Powered hub w/ 5V 2.1A supply
  • USB Powered 2.5" SATA HDD on hub
  • USB Optical mouse on hub
  • USB Keyboard on pi

Here is boot log up to kernel panic while trying to download to USB hard drive: http://pastebin.com/u4C98Tfq

My current setup is:

  • Latest debian image on Transcend 4GB Class 6 card
  • Raspi power 5V 1.5A supply, Voltage never goes below 4.84V on the test points and current draw averages 400mA to 500mA.
  • USB Powered hub w/ 5V 2.1A supply
  • USB Powered 2.5" SATA HDD on hub
  • USB Optical mouse on hub
  • USB Keyboard on pi
@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Apr 23, 2012

I use a 5V 1A supply, and measured voltage between TP1 and TP2 is around the 4.84V mark before, during and after. Fluctuates by 10mV or so during load. Unfortunately, I think some of the voltage drop is in the cable itself - 0.20V+ - direct voltage at the adapter is around 5.1V but I only measured that very early on.

What would be the sort of voltage drop that would be worrying? 4V? 4.5V? 4.6V?

I use a 5V 1A supply, and measured voltage between TP1 and TP2 is around the 4.84V mark before, during and after. Fluctuates by 10mV or so during load. Unfortunately, I think some of the voltage drop is in the cable itself - 0.20V+ - direct voltage at the adapter is around 5.1V but I only measured that very early on.

What would be the sort of voltage drop that would be worrying? 4V? 4.5V? 4.6V?

@abishur

This comment has been minimized.

Show comment
Hide comment
@abishur

abishur Apr 23, 2012

swapped for another charger, got 4.8 across tp1/tp2 and error still occurred

abishur commented Apr 23, 2012

swapped for another charger, got 4.8 across tp1/tp2 and error still occurred

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 23, 2012

Contributor

Well I believe USB quotes 5% so 4.75V is the limit. I would expect 4.84V to be fine, so I think this isn't (5V) power related.

The guy who knows most about this driver (although this driver is written by synopsys, so noone at Broadcom knows much about it) is going to try and reproduce this with an external USB drive. Hopefully he'll be able to see it fail.

I've seen the failure at home (copying from NFS mounted drive over network - no USB hard drive involved). But running exactly the same test on work's network didn't fail (and the driver guy couldn't reproduce it). Perhaps the USB drive is a better way of provoking it.

Contributor

popcornmix commented Apr 23, 2012

Well I believe USB quotes 5% so 4.75V is the limit. I would expect 4.84V to be fine, so I think this isn't (5V) power related.

The guy who knows most about this driver (although this driver is written by synopsys, so noone at Broadcom knows much about it) is going to try and reproduce this with an external USB drive. Hopefully he'll be able to see it fail.

I've seen the failure at home (copying from NFS mounted drive over network - no USB hard drive involved). But running exactly the same test on work's network didn't fail (and the driver guy couldn't reproduce it). Perhaps the USB drive is a better way of provoking it.

@shirro

This comment has been minimized.

Show comment
Hide comment

shirro commented Apr 23, 2012

I added symbols to the oops from @mozzwald
https://gist.github.com/2471526

@larsth

This comment has been minimized.

Show comment
Hide comment
@larsth

larsth Apr 23, 2012

To completely rule out PSU issues, maybe add an extra capacitor, so the voltage is more stable - 220 uF should be ok, and not trigger the fuse (i guess).

4,8 volt is a voltage drop equal to 200 mV, which is -4%, and that could be close to a edge of a +/- 5% limit.

Think: a relatively long thin wire on the RPi PCB to the BCM2835 + a large current when the oscillator creates a clock impulse = the BCM2835 creates a relatively large voltage drop over the wire, so the 4,8 volt at the power connector now becomes maybe 4.6 volt at the BCM2835, which is too low.

larsth commented Apr 23, 2012

To completely rule out PSU issues, maybe add an extra capacitor, so the voltage is more stable - 220 uF should be ok, and not trigger the fuse (i guess).

4,8 volt is a voltage drop equal to 200 mV, which is -4%, and that could be close to a edge of a +/- 5% limit.

Think: a relatively long thin wire on the RPi PCB to the BCM2835 + a large current when the oscillator creates a clock impulse = the BCM2835 creates a relatively large voltage drop over the wire, so the 4,8 volt at the power connector now becomes maybe 4.6 volt at the BCM2835, which is too low.

@mozzwald

This comment has been minimized.

Show comment
Hide comment
@mozzwald

mozzwald Apr 23, 2012

Added 220 uF capacitor to input power as suggested by @larsth, problem persists. Also, tried changing power source of pi to be 5V 2.1A and USB hub to be 5V 1.5A, problem persists.

Added 220 uF capacitor to input power as suggested by @larsth, problem persists. Also, tried changing power source of pi to be 5V 2.1A and USB hub to be 5V 1.5A, problem persists.

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb Apr 23, 2012

Someone on the forums claims that constantly dropping caches works around the issue:

while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

http://www.raspberrypi.org/forum/troubleshooting/kernel-panic-on-concurrent-network-and-usb-storage/#p68752

asb commented Apr 23, 2012

Someone on the forums claims that constantly dropping caches works around the issue:

while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

http://www.raspberrypi.org/forum/troubleshooting/kernel-panic-on-concurrent-network-and-usb-storage/#p68752

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 23, 2012

It might work but it doesn't mean it is the solution. If I am reading the code correctly the usb driver does a memcpy to align some data to an 8 byte boundary if DMA is enabled and sometimes it is accessing memory it should not. It tests for allocation failure so perhaps the length is wrong. Needs some printk I think. I think you could load the usb driver as a module with a parameter to disable dma and that would stop this code ever being executed but that wouldn't really be an answer either. We have the source so there is no real need to guess.

shirro commented Apr 23, 2012

It might work but it doesn't mean it is the solution. If I am reading the code correctly the usb driver does a memcpy to align some data to an 8 byte boundary if DMA is enabled and sometimes it is accessing memory it should not. It tests for allocation failure so perhaps the length is wrong. Needs some printk I think. I think you could load the usb driver as a module with a parameter to disable dma and that would stop this code ever being executed but that wouldn't really be an answer either. We have the source so there is no real need to guess.

@rewolff

This comment has been minimized.

Show comment
Hide comment
@rewolff

rewolff Apr 25, 2012

Two things....
Measuring a 4.84V or even 5.1V on the RPI testpoints is not a guarantee of "no powersupply issues". The Multimeter is way too slow to notice sudden short drops in power. Suppose the USB charger has a "bug" that drops power for a millisecond every 10 seconds? The multimeter will not notice. Of course with a full milisecond of no power the RPI will reset. (The capacitor will hold out for about 0.3 ms). As a charger this wouldn't matter. The product would still work fine for charging cellphone batteries. So this "bug" might go unnoticed.
Of course the above scenario is exaggerating. A full reset would be more obvious to RPI users.
A slightly more realistic scenario would be that the RPI suddenly needs a bigger current and that the powersupply takes a few ms to react to the higher current draw.
That said, it is VERY unlikely that such an issue would result in the observed effects. The crashes seem to be coming from the SAME routine every time.

rewolff commented Apr 25, 2012

Two things....
Measuring a 4.84V or even 5.1V on the RPI testpoints is not a guarantee of "no powersupply issues". The Multimeter is way too slow to notice sudden short drops in power. Suppose the USB charger has a "bug" that drops power for a millisecond every 10 seconds? The multimeter will not notice. Of course with a full milisecond of no power the RPI will reset. (The capacitor will hold out for about 0.3 ms). As a charger this wouldn't matter. The product would still work fine for charging cellphone batteries. So this "bug" might go unnoticed.
Of course the above scenario is exaggerating. A full reset would be more obvious to RPI users.
A slightly more realistic scenario would be that the RPI suddenly needs a bigger current and that the powersupply takes a few ms to react to the higher current draw.
That said, it is VERY unlikely that such an issue would result in the observed effects. The crashes seem to be coming from the SAME routine every time.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 25, 2012

Contributor

Can anyone confirm whether:
while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

helps? Whilst not a solution, it is a very useful piece of data if it does work around the problem

Contributor

popcornmix commented Apr 25, 2012

Can anyone confirm whether:
while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

helps? Whilst not a solution, it is a very useful piece of data if it does work around the problem

@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Apr 25, 2012

I'll have a go - just dd'ing the debian image fresh to my SD.

I'll have a go - just dd'ing the debian image fresh to my SD.

@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Apr 25, 2012

13/04 debian image, freshly dd'd to a 2Gb SD

  1. mount'd a USB stick with a sizeable collection of files (a rootfs)
  2. started the drop_caches loop
  3. wget large_file_from_github
  4. find /mountpoint/of/usb | xargs grep "foo"

Same sort of kernel error
http://www.flickr.com/photos/ben_on_the_move/6967016434/in/photostream

(Also, the serial logging of kernel panics seems to be at 115200baud, regardless of cmdline.txt settings. Is this set somewhere else? I can capture the bootup, but I was using 9600 to do so.)

13/04 debian image, freshly dd'd to a 2Gb SD

  1. mount'd a USB stick with a sizeable collection of files (a rootfs)
  2. started the drop_caches loop
  3. wget large_file_from_github
  4. find /mountpoint/of/usb | xargs grep "foo"

Same sort of kernel error
http://www.flickr.com/photos/ben_on_the_move/6967016434/in/photostream

(Also, the serial logging of kernel panics seems to be at 115200baud, regardless of cmdline.txt settings. Is this set somewhere else? I can capture the bootup, but I was using 9600 to do so.)

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 25, 2012

Contributor

From Gray (not directly in response to you, but this question has been asked before):

An awful lot of what is printed during the boot sequence is output by the kernel during initialization - i.e. during the set-up of devices that are later used to support the operating system implemented on the root filie system.

One of the classes of devices that need to be set-up are terminal (tty) devices - so it kind-of follows that the thing being output to during this kernel initialization process isn't really a tty device. The kernel calls it (well them actually) a 'console'. The kernel command line allows you to identify and set the baud rate for these consoles and kernel output goes to them all (e.g. to the HDMI framebuffer console and to the UART console).

Each console normally ends up being presented as a separate tty device in /dev.

Once the operating system gets hold of the devices the kernel has left it, it configures them and uses them as it sees fit. In our case we do the standard thing of running a shell on just about any tty we can find. This is implemented in the file that controls what we do when control is first passed to the operating system - /etc/inittab.

In /etc/inittab each tty is read by a program 'getty' in its own process. This explains why, once you get to a log-on prompt, [1] the baud rate might change; and [2] the output is no longer the same as it is on other console/ttys. (You may have noticed that you can log on separately to a shell over the UART and a different one over the HDMI/keyboard.)

So, in short, edit /etc/inittab and change
/sbin/getty -L ttyAMA0 115200 vt100
to
/sbin/getty -L ttyAMA0 9600 vt100
if you want the operating system to run at 9600 baud.

Contributor

popcornmix commented Apr 25, 2012

From Gray (not directly in response to you, but this question has been asked before):

An awful lot of what is printed during the boot sequence is output by the kernel during initialization - i.e. during the set-up of devices that are later used to support the operating system implemented on the root filie system.

One of the classes of devices that need to be set-up are terminal (tty) devices - so it kind-of follows that the thing being output to during this kernel initialization process isn't really a tty device. The kernel calls it (well them actually) a 'console'. The kernel command line allows you to identify and set the baud rate for these consoles and kernel output goes to them all (e.g. to the HDMI framebuffer console and to the UART console).

Each console normally ends up being presented as a separate tty device in /dev.

Once the operating system gets hold of the devices the kernel has left it, it configures them and uses them as it sees fit. In our case we do the standard thing of running a shell on just about any tty we can find. This is implemented in the file that controls what we do when control is first passed to the operating system - /etc/inittab.

In /etc/inittab each tty is read by a program 'getty' in its own process. This explains why, once you get to a log-on prompt, [1] the baud rate might change; and [2] the output is no longer the same as it is on other console/ttys. (You may have noticed that you can log on separately to a shell over the UART and a different one over the HDMI/keyboard.)

So, in short, edit /etc/inittab and change
/sbin/getty -L ttyAMA0 115200 vt100
to
/sbin/getty -L ttyAMA0 9600 vt100
if you want the operating system to run at 9600 baud.

@mozzwald

This comment has been minimized.

Show comment
Hide comment
@mozzwald

mozzwald Apr 25, 2012

while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

This actually makes the problem worse for me. Running it then trying to download file to USB device cause kernel panic instantly. Without it the file will download for a while before kernel panic.

while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done &

This actually makes the problem worse for me. Running it then trying to download file to USB device cause kernel panic instantly. Without it the file will download for a while before kernel panic.

@larsth

This comment has been minimized.

Show comment
Hide comment
@larsth

larsth Apr 25, 2012

@shirro

memcpy?
Where?

If a device driver in kernel space uses plain C memory copying from user space, instead of using the copy_from_user(9) function, then you has maybe found the bug we is searching for.

Very long list of where you can find the "copy_from_user" word in the kernel : http://lxr.free-electrons.com/ident?i=copy_from_user

I know that a large part of the USB stuff is in user space (AFAIK), but some of it is of course in kernel space.

larsth commented Apr 25, 2012

@shirro

memcpy?
Where?

If a device driver in kernel space uses plain C memory copying from user space, instead of using the copy_from_user(9) function, then you has maybe found the bug we is searching for.

Very long list of where you can find the "copy_from_user" word in the kernel : http://lxr.free-electrons.com/ident?i=copy_from_user

I know that a large part of the USB stuff is in user space (AFAIK), but some of it is of course in kernel space.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Apr 25, 2012

Contributor

The fault is the length passed to DWC_MEMCPY is garbage. When I added some logging the length was 3349608928.
It seems the URB is getting corrupted somewhere...

Contributor

popcornmix commented Apr 25, 2012

The fault is the length passed to DWC_MEMCPY is garbage. When I added some logging the length was 3349608928.
It seems the URB is getting corrupted somewhere...

@larsth

This comment has been minimized.

Show comment
Hide comment
@larsth

larsth Apr 25, 2012

@popcornmix any possibility of that could be a pointer to an int - used as an int

larsth commented Apr 25, 2012

@popcornmix any possibility of that could be a pointer to an int - used as an int

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 25, 2012

Probably @larsth but as you may have guessed I don't know much about kernel internals but I know enough to recognise a likely buffer overflow which looks to be confirmed. The driver would never get into mainline, I know that much. It has a compatibility layer to ease porting and there is a macro called dwc_memcpy for a function DCW_MEMCPY which wraps a call to a memcpy and that is as far as I went down the rabbit hole. I am due to get a Pi any day now. Hopefully there will still be some bugs left. I found a few other DWC drivers referenced on an OpenWRT mailing list and it looks like some of them are considerably simpler. Since the OTG functionality isn't available on this hardware anyway I wonder if one of those other drivers wouldn't be better?

shirro commented Apr 25, 2012

Probably @larsth but as you may have guessed I don't know much about kernel internals but I know enough to recognise a likely buffer overflow which looks to be confirmed. The driver would never get into mainline, I know that much. It has a compatibility layer to ease porting and there is a macro called dwc_memcpy for a function DCW_MEMCPY which wraps a call to a memcpy and that is as far as I went down the rabbit hole. I am due to get a Pi any day now. Hopefully there will still be some bugs left. I found a few other DWC drivers referenced on an OpenWRT mailing list and it looks like some of them are considerably simpler. Since the OTG functionality isn't available on this hardware anyway I wonder if one of those other drivers wouldn't be better?

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb Apr 25, 2012

Hi @shirro could you give pointers to any dwc drivers you saw? I'm pretty sure I've only seen other releases of the synopsys code for dwc2. A recent discovery is that the upstream Samsung s3c-hsotg is in fact an instantiation of the dwc. Even better, Samsung devs have been generalising it so it could be used with other versions and reasonably renamed to 'dwc2'. The code is probably a far better starting point, but unfortunately only supports peripheral mode at the moment.

See the discussion at http://thread.gmane.org/gmane.linux.usb.general/61676

asb commented Apr 25, 2012

Hi @shirro could you give pointers to any dwc drivers you saw? I'm pretty sure I've only seen other releases of the synopsys code for dwc2. A recent discovery is that the upstream Samsung s3c-hsotg is in fact an instantiation of the dwc. Even better, Samsung devs have been generalising it so it could be used with other versions and reasonably renamed to 'dwc2'. The code is probably a far better starting point, but unfortunately only supports peripheral mode at the moment.

See the discussion at http://thread.gmane.org/gmane.linux.usb.general/61676

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 26, 2012

The OpenWRT dev list seems to refer to several DWC drivers from different places over time and lots of patches. The one I posted on IRC I will put here for everyone: http://permalink.gmane.org/gmane.comp.embedded.openwrt.devel/12602 - there is a link to the source SztupY has taken out of the Samsung Cyanogen Android source. There is also a Fritzbox link. And I have seen mention of others on those lists. If the s3c-hsotg has a good rep I might try and give it a go when I get my Pi. Is there an official repo for it somewhere? I am guessing I probably need to grab it out of an Android kernel source?

shirro commented Apr 26, 2012

The OpenWRT dev list seems to refer to several DWC drivers from different places over time and lots of patches. The one I posted on IRC I will put here for everyone: http://permalink.gmane.org/gmane.comp.embedded.openwrt.devel/12602 - there is a link to the source SztupY has taken out of the Samsung Cyanogen Android source. There is also a Fritzbox link. And I have seen mention of others on those lists. If the s3c-hsotg has a good rep I might try and give it a go when I get my Pi. Is there an official repo for it somewhere? I am guessing I probably need to grab it out of an Android kernel source?

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb Apr 26, 2012

@shirro Check out the replies in the thread I linked to. There are refs there. s3c-hsotg is upstream (but device only afaik), and there is a recent patchset against on the linux-usb mailing list. The other potential starting point is APM's version of the dwc code (they got permission to replace the license with GPL, and do at least meet kernel coding style). http://article.gmane.org/gmane.linux.usb.general/53348

It would be fantastic if you were able to help look in to some of these issues.

asb commented Apr 26, 2012

@shirro Check out the replies in the thread I linked to. There are refs there. s3c-hsotg is upstream (but device only afaik), and there is a recent patchset against on the linux-usb mailing list. The other potential starting point is APM's version of the dwc code (they got permission to replace the license with GPL, and do at least meet kernel coding style). http://article.gmane.org/gmane.linux.usb.general/53348

It would be fantastic if you were able to help look in to some of these issues.

@narensankar

This comment has been minimized.

Show comment
Hide comment
@narensankar

narensankar Apr 26, 2012

We actually tried in the past to look at porting other DWC drivers to the pi. But in general the problem is that in every SOC the DWC logic is hooked up differently and once we hack up one of the alternative drivers to match our logic, it makes it impossible to update from the "official" Synopsys sources. Our official support is from Synopsys and if we break it they won't come rushing to our help.

We actually tried in the past to look at porting other DWC drivers to the pi. But in general the problem is that in every SOC the DWC logic is hooked up differently and once we hack up one of the alternative drivers to match our logic, it makes it impossible to update from the "official" Synopsys sources. Our official support is from Synopsys and if we break it they won't come rushing to our help.

@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Apr 26, 2012

Logically, I guess the next series of questions are:

1 - Are Synopsys aware of this showstopper bug?
2 - Would they acknowledge the problem as being theirs?
3 - Is fixing the bug part of the support they will offer?

Logically, I guess the next series of questions are:

1 - Are Synopsys aware of this showstopper bug?
2 - Would they acknowledge the problem as being theirs?
3 - Is fixing the bug part of the support they will offer?

@shirro

This comment has been minimized.

Show comment
Hide comment
@shirro

shirro Apr 26, 2012

Sorry if it is redundant but I want to add a me too. Just got mine and was doing a git clone and copying a video from an sshfs mount to usb storage and I got the exact same error.

shirro commented Apr 26, 2012

Sorry if it is redundant but I want to add a me too. Just got mine and was doing a git clone and copying a video from an sshfs mount to usb storage and I got the exact same error.

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb Apr 26, 2012

For those who are able to reproduce, does adding vm.min_free_kbytes = 12288 to /etc/sysctl.conf or smsc95xx.turbo_mode=N to /boot/cmdline.txt alleviate the issue?

asb commented Apr 26, 2012

For those who are able to reproduce, does adding vm.min_free_kbytes = 12288 to /etc/sysctl.conf or smsc95xx.turbo_mode=N to /boot/cmdline.txt alleviate the issue?

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 30, 2012

Currently I'm running 3.1.9-15 (I noticed know that today there is a new one, -16).
Linux berry 3.1.9-15-ARCH+ #8 PREEMPT Tue May 22 01:15:53 UTC 2012 armv6l GNU/Linux

Should I need to use hexxeh's firmware update tool while using arch? I thought that that tool would be usefull for debian since there are less updates. Is is needed in arch?

Another question: for every raspberrypi-firmware update, does the kernel need to be rebuilt?

I will update now to the 3.1.9-16 that came today, and test if the problem still exists. I think it does since I didn't see any updates on solving this issue.

best regards

Currently I'm running 3.1.9-15 (I noticed know that today there is a new one, -16).
Linux berry 3.1.9-15-ARCH+ #8 PREEMPT Tue May 22 01:15:53 UTC 2012 armv6l GNU/Linux

Should I need to use hexxeh's firmware update tool while using arch? I thought that that tool would be usefull for debian since there are less updates. Is is needed in arch?

Another question: for every raspberrypi-firmware update, does the kernel need to be rebuilt?

I will update now to the 3.1.9-16 that came today, and test if the problem still exists. I think it does since I didn't see any updates on solving this issue.

best regards

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix May 30, 2012

Contributor

Interesting. That is not a prebuilt kernel of ours from github (we only enabled PREEMPT a couple of days ago).
So we know when it was built, but not from what source, of with what .config options.

So yes, running hexxeh's updater tool will replace it with a known up to date kernel would be worthwhile.

Contributor

popcornmix commented May 30, 2012

Interesting. That is not a prebuilt kernel of ours from github (we only enabled PREEMPT a couple of days ago).
So we know when it was built, but not from what source, of with what .config options.

So yes, running hexxeh's updater tool will replace it with a known up to date kernel would be worthwhile.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 30, 2012

Ok tks! Will test this one with the update and if it panics (I'm pretty sure it will), I'll run hexxeh's updater tool.

Will report in any situation; it might help someone else.

edit: already got the kernel panic. Will try hexxeh's update tool now

Ok tks! Will test this one with the update and if it panics (I'm pretty sure it will), I'll run hexxeh's updater tool.

Will report in any situation; it might help someone else.

edit: already got the kernel panic. Will try hexxeh's update tool now

@pepedog

This comment has been minimized.

Show comment
Hide comment
@pepedog

pepedog May 30, 2012

First, the updater tool should be ok on arch.
Arch is built with this config
https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-raspberrypi/config
I asked for kernel and firmware pkgs to be rebuilt, it hasn't happened with firmware, it's 9 days old.
Think he is busy, whisper is he's setting up a build farm
Lastly, why is this issue in firmware, surely it belongs in linux?

pepedog commented May 30, 2012

First, the updater tool should be ok on arch.
Arch is built with this config
https://github.com/archlinuxarm/PKGBUILDs/blob/master/core/linux-raspberrypi/config
I asked for kernel and firmware pkgs to be rebuilt, it hasn't happened with firmware, it's 9 days old.
Think he is busy, whisper is he's setting up a build farm
Lastly, why is this issue in firmware, surely it belongs in linux?

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 30, 2012

I think it's working now!:D

the firmware+kernel update from the git repository (with rpi-update) worked!

tks!

I think it's working now!:D

the firmware+kernel update from the git repository (with rpi-update) worked!

tks!

@pepedog

This comment has been minimized.

Show comment
Hide comment
@pepedog

pepedog May 30, 2012

I should also point out with arch, kernel and modules are not bundled in with firmware.
The kernel is a separate package, this site is the source but we do our own config based on the default config, trips up occasionally. Preempt was on in our package because it was recently built.
The firmware pkg is going to be rebuilt tonight.
Mostly just advancing release version will rebuild the packages, if a config change is needed I just hope Dom lets me know.

pepedog commented May 30, 2012

I should also point out with arch, kernel and modules are not bundled in with firmware.
The kernel is a separate package, this site is the source but we do our own config based on the default config, trips up occasionally. Preempt was on in our package because it was recently built.
The firmware pkg is going to be rebuilt tonight.
Mostly just advancing release version will rebuild the packages, if a config change is needed I just hope Dom lets me know.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix May 30, 2012

Contributor

@pepdog erm, the config has changed...

Contributor

popcornmix commented May 30, 2012

@pepdog erm, the config has changed...

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 31, 2012

I'm starting to think I'm doomed.
I'm getting kernel panics again while downloading torrents. I was using ntfs in the external hard drive and change it to ext4 to see if it helped. It didn't. The cpu gets ~100% while downloading since it drains all the download bandwidth it can handle (my internet connection can download ~6MB/s; the Pi goes near 2MB/s).

If it helps, the setup is:
wifi rt5370
external hd (wd 320GB)
arch linux arm with kernel and firmware updated with the hexxeh's rpi-update tool

I'm going to test now with the debug kernel img to see the stacktrace.

best regards

I'm starting to think I'm doomed.
I'm getting kernel panics again while downloading torrents. I was using ntfs in the external hard drive and change it to ext4 to see if it helped. It didn't. The cpu gets ~100% while downloading since it drains all the download bandwidth it can handle (my internet connection can download ~6MB/s; the Pi goes near 2MB/s).

If it helps, the setup is:
wifi rt5370
external hd (wd 320GB)
arch linux arm with kernel and firmware updated with the hexxeh's rpi-update tool

I'm going to test now with the debug kernel img to see the stacktrace.

best regards

@rewolff

This comment has been minimized.

Show comment
Hide comment
@rewolff

rewolff May 31, 2012

I have a GPS reciever on an PL2303 USB serial converter. This crashes more than once a day (I leave in the morning, when I come back it's crashed). So about 300 bytes of USB traffic per second manages to crash things. I don't know if I get a kernel oops. I don't have a screen there. I tried running "netconsole" but that didn't work out: the eth driver doesn't support polling. Next option is the uart for the kernel oops output... :-)

rewolff commented May 31, 2012

I have a GPS reciever on an PL2303 USB serial converter. This crashes more than once a day (I leave in the morning, when I come back it's crashed). So about 300 bytes of USB traffic per second manages to crash things. I don't know if I get a kernel oops. I don't have a screen there. I tried running "netconsole" but that didn't work out: the eth driver doesn't support polling. Next option is the uart for the kernel oops output... :-)

@rewolff

This comment has been minimized.

Show comment
Hide comment
@rewolff

rewolff May 31, 2012

P.S. Not sure if it's the same issue, of course. But I thought I'd mention it because it might provide a hint as to what's wrong.

rewolff commented May 31, 2012

P.S. Not sure if it's the same issue, of course. But I thought I'd mention it because it might provide a hint as to what's wrong.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 31, 2012

I don't know why but I can't run kernel_debug.img... A square with "rainbow" colors apear. I guess is the gpu not being able to boot the kernel..

I don't know why but I can't run kernel_debug.img... A square with "rainbow" colors apear. I guess is the gpu not being able to boot the kernel..

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix May 31, 2012

Contributor

A bad kernel_debug.img was checked in a couple of days ago, and fixed today. Can you update?

Contributor

popcornmix commented May 31, 2012

A bad kernel_debug.img was checked in a couple of days ago, and fixed today. Can you update?

@pepedog

This comment has been minimized.

Show comment
Hide comment
@pepedog

pepedog May 31, 2012

I just reviewed the config here, arch will have problems with compiled kernel, devtmpfs is missing.
Deb will be hit too with latest udev

pepedog commented May 31, 2012

I just reviewed the config here, arch will have problems with compiled kernel, devtmpfs is missing.
Deb will be hit too with latest udev

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 31, 2012

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated.
Will do it again anyway

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated.
Will do it again anyway

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb May 31, 2012

No, unfortunately it updates from his own repository so it can lag behind.

On 31 May 2012 17:24, guisacouto
reply@reply.github.com
wrote:

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated.
Will do it again anyway


Reply to this email directly or view it on GitHub:
#9 (comment)

asb commented May 31, 2012

No, unfortunately it updates from his own repository so it can lag behind.

On 31 May 2012 17:24, guisacouto
reply@reply.github.com
wrote:

I did update a 2 hours ago. Does the hexxeh's tool use raspberrypi@github as source? If it does, mine should be updated.
Will do it again anyway


Reply to this email directly or view it on GitHub:
#9 (comment)

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix May 31, 2012

Contributor

Hexxeh's repo is up to date now.

Contributor

popcornmix commented May 31, 2012

Hexxeh's repo is up to date now.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 31, 2012

I already updated, however it doesn't boot properly i think.

Here is an image of where it stops:
http://desmond.imageshack.us/Himg848/scaled.php?server=848&filename=20120531180440.jpg&res=landing

This is all really odd.. I guess a kernel panic could be ok if I were out of memory since there is no swap, but in this case
while I'm downloading before it crashes I'm only using ~30MB or something, only the cpu gets crazy working at ~100% trying to use as much network bandwidth as possible... this should only make things slower, but without crashing

edit: I'm not connected with ethernet, only wireless, but I think that in kernel_debug it doesn't load the driver module

I already updated, however it doesn't boot properly i think.

Here is an image of where it stops:
http://desmond.imageshack.us/Himg848/scaled.php?server=848&filename=20120531180440.jpg&res=landing

This is all really odd.. I guess a kernel panic could be ok if I were out of memory since there is no swap, but in this case
while I'm downloading before it crashes I'm only using ~30MB or something, only the cpu gets crazy working at ~100% trying to use as much network bandwidth as possible... this should only make things slower, but without crashing

edit: I'm not connected with ethernet, only wireless, but I think that in kernel_debug it doesn't load the driver module

@pepedog

This comment has been minimized.

Show comment
Hide comment
@pepedog

pepedog May 31, 2012

guisacouto
Can you see the line "cannot stat", that is symptom of no devtmpfs
New kernel and firmware for arch tommorow, there is a way to make pacman install pkg from an x86 arch install system

pepedog commented May 31, 2012

guisacouto
Can you see the line "cannot stat", that is symptom of no devtmpfs
New kernel and firmware for arch tommorow, there is a way to make pacman install pkg from an x86 arch install system

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto May 31, 2012

@pepedog

oh ok, will see how kernel_debug goes tomorrow. I hope it gives some clues

@pepedog

oh ok, will see how kernel_debug goes tomorrow. I hope it gives some clues

@pepedog

This comment has been minimized.

Show comment
Hide comment
@pepedog

pepedog May 31, 2012

If the kernel_debug doesn't have devtmpfs it may fail.
Not sure, think config.txt has to have something set for debug, perhaps someone can advise.
Just google devtmpfs udev

pepedog commented May 31, 2012

If the kernel_debug doesn't have devtmpfs it may fail.
Not sure, think config.txt has to have something set for debug, perhaps someone can advise.
Just google devtmpfs udev

@asb

This comment has been minimized.

Show comment
Hide comment
@asb

asb May 31, 2012

@pepedog: thanks for the heads up regarding CONFIG_DEVTMPFS and udev. Even Debian sid is only using udev 175, so that requirement hadn't cropped up. It sounds like it would be worth enabling.

asb commented May 31, 2012

@pepedog: thanks for the heads up regarding CONFIG_DEVTMPFS and udev. Even Debian sid is only using udev 175, so that requirement hadn't cropped up. It sounds like it would be worth enabling.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto Jun 4, 2012

I'm not sure if this is the same issue or not. Please tell me if it is different.

I've been thinking a bit about this problem while downloading torrents (heavy network+usb storage), and I thought that maybe giving transmission an higher nice, so it has a lower priority could help. This way it wouldn't take all the cpu when it's needed by some system process or something.

This did kind of help. Know I'm not getting a kernel panic, and the system keeps running, but "usb-storage" crashes!
The nice is set to 19 (lowest priority possible).

The dmesg is here: http://pastebin.com/Y4mnP709

I'm not sure if this is the same issue or not. Please tell me if it is different.

I've been thinking a bit about this problem while downloading torrents (heavy network+usb storage), and I thought that maybe giving transmission an higher nice, so it has a lower priority could help. This way it wouldn't take all the cpu when it's needed by some system process or something.

This did kind of help. Know I'm not getting a kernel panic, and the system keeps running, but "usb-storage" crashes!
The nice is set to 19 (lowest priority possible).

The dmesg is here: http://pastebin.com/Y4mnP709

@mgreeves

This comment has been minimized.

Show comment
Hide comment
@mgreeves

mgreeves Jun 5, 2012

rewolff,

Your PL2303 issue is probably different. There are reports that the prolific PL-2303X has the same vendor ID and product ID as the older PL-2303. I've seen lsusb report a pl-2303 with MaxPacketSize of 64 and suspect its a 2303x. Running a x64 3.1.10 kernel I had a problem where long transfers experienced dropped data. Replacing the device with a FTDI FT232 eliminated the error.

mgreeves commented Jun 5, 2012

rewolff,

Your PL2303 issue is probably different. There are reports that the prolific PL-2303X has the same vendor ID and product ID as the older PL-2303. I've seen lsusb report a pl-2303 with MaxPacketSize of 64 and suspect its a 2303x. Running a x64 3.1.10 kernel I had a problem where long transfers experienced dropped data. Replacing the device with a FTDI FT232 eliminated the error.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto Jun 6, 2012

Been doing some research.. smsc95xx is just an ethernet chip right?
The fix that was submited to the kernel about this colission between usb storage and high network usage was in the smsc95xx drivers or somehing really in the kernel?
If it was just in the drivers, that pretty much explains why it hasn't fixed my problem, since I'm just using wifi and no ethernet... So I guess there is still a problem when usb devices are compeeting for the unique usb bus.

Been doing some research.. smsc95xx is just an ethernet chip right?
The fix that was submited to the kernel about this colission between usb storage and high network usage was in the smsc95xx drivers or somehing really in the kernel?
If it was just in the drivers, that pretty much explains why it hasn't fixed my problem, since I'm just using wifi and no ethernet... So I guess there is still a problem when usb devices are compeeting for the unique usb bus.

@guisacouto

This comment has been minimized.

Show comment
Hide comment
@guisacouto

guisacouto Jun 12, 2012

After some updates, I'm getting a different kernel panic (a lot shorter in output), when downloading+usb storage.
Here is a screen: http://img577.imageshack.us/img577/6830/20120610215808.jpg

After some updates, I'm getting a different kernel panic (a lot shorter in output), when downloading+usb storage.
Here is a screen: http://img577.imageshack.us/img577/6830/20120610215808.jpg

@rewolff

This comment has been minimized.

Show comment
Hide comment
@rewolff

rewolff Jun 19, 2012

In that case, please check the following:

  • Is your memory really filling up? While Windows may report lots of
    "free memory" that is actually "wasted memory". Linux tries to use
    it to cache bits and pieces of the hardrive (or in this case SD card)
    that it has needed in memory anyway.
  • Is the "torrent" process filling up the memory?
  • Do you have swap enabled? IIRC, the stock debian comes with swap
    disabled. This means that when the system runs out of memory it will
    at first become very slow and then soon afterwards stop responding
    alltogehter. If you have swap enabled, the system might become
    sluggish, or be perfectly workable, depending on the swap usage. In
    any case, you have WAY longer to "fix" any problems before the system
    has to be physically rebooted.

** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
-- BitWizard writes Linux device drivers for any device you may have! --
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.

rewolff commented Jun 19, 2012

In that case, please check the following:

  • Is your memory really filling up? While Windows may report lots of
    "free memory" that is actually "wasted memory". Linux tries to use
    it to cache bits and pieces of the hardrive (or in this case SD card)
    that it has needed in memory anyway.
  • Is the "torrent" process filling up the memory?
  • Do you have swap enabled? IIRC, the stock debian comes with swap
    disabled. This means that when the system runs out of memory it will
    at first become very slow and then soon afterwards stop responding
    alltogehter. If you have swap enabled, the system might become
    sluggish, or be perfectly workable, depending on the swap usage. In
    any case, you have WAY longer to "fix" any problems before the system
    has to be physically rebooted.

** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
** Delftechpark 26 2628 XH Delft, The Netherlands. KVK: 27239233 **
-- BitWizard writes Linux device drivers for any device you may have! --
The plan was simple, like my brother-in-law Phil. But unlike
Phil, this plan just might work.

@volpino

This comment has been minimized.

Show comment
Hide comment
@volpino

volpino Jul 1, 2012

@guisacouto i'm getting the exact same issue with OpenELEC.
I'm running transmission and xbmc and I tried different versions of openelec, i even tried to recompile the current one from git. I have a Logilink hub (it's in the working peripheral section on the raspberry wiki) with an external ntfs hdd, a wifi dongle and a mce remote.

volpino commented Jul 1, 2012

@guisacouto i'm getting the exact same issue with OpenELEC.
I'm running transmission and xbmc and I tried different versions of openelec, i even tried to recompile the current one from git. I have a Logilink hub (it's in the working peripheral section on the raspberry wiki) with an external ntfs hdd, a wifi dongle and a mce remote.

@SamuelDebruyn

This comment has been minimized.

Show comment
Hide comment
@SamuelDebruyn

SamuelDebruyn Jul 19, 2012

More people are running into this issue, so am I. It was reported with some links to threads here: raspberrypi/linux#56

More people are running into this issue, so am I. It was reported with some links to threads here: raspberrypi/linux#56

@volpino

This comment has been minimized.

Show comment
Hide comment
@volpino

volpino Jul 27, 2012

I had the issue only using wifi. I workarounded with an access point in client mode that let me connect the RaspberryPi via ethernet but still use wifi.

volpino commented Jul 27, 2012

I had the issue only using wifi. I workarounded with an access point in client mode that let me connect the RaspberryPi via ethernet but still use wifi.

@Dmole

This comment has been minimized.

Show comment
Hide comment
@Dmole

Dmole Aug 13, 2012

smsc95xx.turbo_mode=N to /boot/cmdline.txt fixes this problem !!!!

Dmole commented Aug 13, 2012

smsc95xx.turbo_mode=N to /boot/cmdline.txt fixes this problem !!!!

@SamuelDebruyn

This comment has been minimized.

Show comment
Hide comment
@SamuelDebruyn

SamuelDebruyn Sep 15, 2012

It doesn't. It's still happening here with turbo mode disabled.

It doesn't. It's still happening here with turbo mode disabled.

@lorenzos

This comment has been minimized.

Show comment
Hide comment
@lorenzos

lorenzos Sep 16, 2012

Just to report, I had exactly the same error and I solved with:

  • vm.min_free_kbytes = 32768 in /etc/sysctl.conf
  • smsc95xx.turbo_mode=N in /boot/cmdline.txt

Before that edits, I experienced this issue about twice or more times every hour, while using my Raspberry to do lot of network data transfers (file sharing at 250KB/s circa) and very very frequent SD file reads/writes. Never got a kernel panic, btw.

After that edits, I have not experienced any problem at all for two days now.

Just to report, I had exactly the same error and I solved with:

  • vm.min_free_kbytes = 32768 in /etc/sysctl.conf
  • smsc95xx.turbo_mode=N in /boot/cmdline.txt

Before that edits, I experienced this issue about twice or more times every hour, while using my Raspberry to do lot of network data transfers (file sharing at 250KB/s circa) and very very frequent SD file reads/writes. Never got a kernel panic, btw.

After that edits, I have not experienced any problem at all for two days now.

@popcornmix

This comment has been minimized.

Show comment
Hide comment
@popcornmix

popcornmix Jul 20, 2013

Contributor

@benosteen
A lot has been fixed since this report.
Is the panic still happening with latest (rpi-update) kernel?

Contributor

popcornmix commented Jul 20, 2013

@benosteen
A lot has been fixed since this report.
Is the panic still happening with latest (rpi-update) kernel?

@benosteen

This comment has been minimized.

Show comment
Hide comment
@benosteen

benosteen Jul 22, 2013

I can't comment on this bug as I'm not in a position to fire up a RasPi and
monitor for any problems. I would say to close this - as it was such an
early issue - and let someone reopen it in the unlikely event that the
problem still persists.

On Saturday, 20 July 2013, popcornmix wrote:

@benosteen https://github.com/benosteen
A lot has been fixed since this report.
Is the panic still happening with latest (rpi-update) kernel?


Reply to this email directly or view it on GitHubhttps://github.com/raspberrypi/firmware/issues/9#issuecomment-21291424
.

I can't comment on this bug as I'm not in a position to fire up a RasPi and
monitor for any problems. I would say to close this - as it was such an
early issue - and let someone reopen it in the unlikely event that the
problem still persists.

On Saturday, 20 July 2013, popcornmix wrote:

@benosteen https://github.com/benosteen
A lot has been fixed since this report.
Is the panic still happening with latest (rpi-update) kernel?


Reply to this email directly or view it on GitHubhttps://github.com/raspberrypi/firmware/issues/9#issuecomment-21291424
.

@popcornmix popcornmix closed this Jul 22, 2013

pelwell referenced this issue Apr 11, 2016

pwm_sdm: fix ring buffer UINT_MAX wraparound bug
See: https://www.raspberrypi.org/forums/viewtopic.php?f=29&t=136445

firmware: IL ISP: Correct RGB to YUV matrices, and ignore code side info

firmware: MJPEG encode: Handle stereoscopic images
See: https://www.raspberrypi.org/forums/viewtopic.php?f=43&t=138325&p=918041

firmware: IL Camera: Change unspecified colour space to being JFIF
See: raspberrypi/userland#78

firmware: OV5647: Option to configure auto lens shading to use potential fix

firmware: arm_loader: Factor out DT support into arm_dt
See: raspberrypi/linux#1394

firmware: arm_ldconfig: Switch to using arm stubs generated from tools/mkimage
firmware: arm_ldconfig: Support loading arm stubs from file
See: #579
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment