Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raspberry Pi freezes with high network load #244

Closed
pl31 opened this issue Mar 9, 2013 · 4 comments
Closed

Raspberry Pi freezes with high network load #244

pl31 opened this issue Mar 9, 2013 · 4 comments
Assignees

Comments

@pl31
Copy link

pl31 commented Mar 9, 2013

The Raspberry Pi freezes under high network load. Netload can be generated by a packet generator or arping using another computer:
arping -w 100000 192.168.0.104

192.168.0.104 would be the IP-address of the Raspberry Pi.

The raspberry freezes, input via keyboard is ignored. When netload drops, the raspberry is reacting normally.

The problem does not occur, when using "dwc_otg.speed=1" in cmdline.txt.

Tested using two different Model B Devices "Made in China" and following systems:

Raspian:
Linux 3.6.11+ #371 PREEMPT Thu Feb 7 16:31:35 GMT 2013 armv6l GNU/Linux
slitaz:
Linux 3.2.27-slitaz #1 PREEMPT Fri Dec 14 15:01:17 CST 2012 armv6l GNU/Linux
tinycorelinux:
Linux 3.2.27 #4 PREEMPT Sun Sep 16 09:59:00 PDT 2012 armv6l GNU/Linux

Edit: changed 'heavy netload' to 'high network load' :).

@Ferroin
Copy link
Contributor

Ferroin commented Mar 9, 2013

This isn't specific to the P, or even to Linux. In theory, any system will react like that when the I/O load is 'high', it's just that 'high' is a relatively small number in this case.

@ghost ghost assigned ghollingworth Mar 9, 2013
@pl31
Copy link
Author

pl31 commented Jan 4, 2014

Repeated test today using tinycorelinux (piCore-5.1.rc4):

tc@box:~$ uname -r
3.12.1-piCore+

When arping'ing the PI, it still reacted normally.

@P33M
Copy link
Contributor

P33M commented Jan 8, 2014

If the issue has been resolved, ok to close?

@pl31
Copy link
Author

pl31 commented Jan 8, 2014

Closed. Thanks a lot.

@pl31 pl31 closed this as completed Jan 8, 2014
anholt pushed a commit to anholt/linux that referenced this issue Oct 12, 2015
If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ raspberrypi#244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
popcornmix pushed a commit that referenced this issue Oct 23, 2015
commit 3ec0c97 upstream.

If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ #244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: William Dauchy <william@gandi.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
popcornmix pushed a commit that referenced this issue Oct 23, 2015
commit 3ec0c97 upstream.

If filelayout_decode_layout fail, _filelayout_free_lseg will causes
a double freeing of fh_array.

[ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.281010] PGD 0
[ 1179.281443] Oops: 0000 [#1]
[ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
[ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ #244
[ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
[ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
[ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
[ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
[ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
[ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
[ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
[ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
[ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
[ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
[ 1179.292731] Stack:
[ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
[ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
[ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
[ 1179.294623] Call Trace:
[ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
[ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
[ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
[ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
[ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
[ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
[ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
[ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
[ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
[ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
[ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
[ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
[ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
[ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
[ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
[ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
[ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
[ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
[ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
[ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
[ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
[ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
[ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
[ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
[ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
[ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
[ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
[ 1179.309177]  RSP <ffff88003e3c77f8>
[ 1179.309582] CR2: 0000000000000000

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: William Dauchy <william@gandi.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
davet321 pushed a commit to davet321/rpi-linux that referenced this issue Jan 11, 2016
I managed to tickle this warning:

  [ 2338.884942] ------------[ cut here ]------------
  [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
  [ 2338.900504] Modules linked in:
  [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty raspberrypi#244
  [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
  [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
  [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
  [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
  [ 2338.947987] Call Trace:
  [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
  [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
  [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
  [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
  [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
  [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
  [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
  [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
  [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
  [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
  [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
  [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
  [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
  [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---

Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
what we expected it to be.

This is because context switches can swap the task_struct::perf_event_ctxp[]
pointer around. Therefore you have to either disable preemption when looking
at current, or hold ctx->lock.

Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
before disabling interrupts, therefore a preemption in the right place
can swap contexts around and we're using the wrong one.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Potapenko <glider@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: syzkaller <syzkaller@googlegroups.com>
Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
popcornmix pushed a commit that referenced this issue Jun 4, 2019
[ Upstream commit 09f11b6 ]

switch_lock was introduced because it allowed serialization of device
authorization requests from userspace without need to take the big
domain lock (tb->lock). This was fine because device authorization with
ICM is just one command that is sent to the firmware. Now that we start
to handle all tunneling in the driver switch_lock is not enough because
we need to walk over the topology to establish paths.

For this reason drop switch_lock from the driver completely in favour of
big domain lock.

There is one complication, though. If userspace is waiting for the lock
in tb_switch_set_authorized(), it keeps the device_del() from removing
the sysfs attribute because it waits for active users to release the
attribute first which leads into following splat:

    INFO: task kworker/u8:3:73 blocked for more than 61 seconds.
          Tainted: G        W         5.1.0-rc1+ #244
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u8:3    D12976    73      2 0x80000000
    Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
    Call Trace:
     ? __schedule+0x2e5/0x740
     ? _raw_spin_lock_irqsave+0x12/0x40
     ? prepare_to_wait_event+0xc5/0x160
     schedule+0x2d/0x80
     __kernfs_remove.part.17+0x183/0x1f0
     ? finish_wait+0x80/0x80
     kernfs_remove_by_name_ns+0x4a/0x90
     remove_files.isra.1+0x2b/0x60
     sysfs_remove_group+0x38/0x80
     sysfs_remove_groups+0x24/0x40
     device_remove_attrs+0x3d/0x70
     device_del+0x14c/0x360
     device_unregister+0x15/0x50
     tb_switch_remove+0x9e/0x1d0 [thunderbolt]
     tb_handle_hotplug+0x119/0x5a0 [thunderbolt]
     ? process_one_work+0x1b7/0x420
     process_one_work+0x1b7/0x420
     worker_thread+0x37/0x380
     ? _raw_spin_unlock_irqrestore+0xf/0x30
     ? process_one_work+0x420/0x420
     kthread+0x118/0x130
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x35/0x40

We deal this by following what network stack did for some of their
attributes and use mutex_trylock() with restart_syscall(). This makes
userspace release the attribute allowing sysfs attribute removal to
progress before the write is restarted and eventually fail when the
attribute is removed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Jun 4, 2019
[ Upstream commit 09f11b6 ]

switch_lock was introduced because it allowed serialization of device
authorization requests from userspace without need to take the big
domain lock (tb->lock). This was fine because device authorization with
ICM is just one command that is sent to the firmware. Now that we start
to handle all tunneling in the driver switch_lock is not enough because
we need to walk over the topology to establish paths.

For this reason drop switch_lock from the driver completely in favour of
big domain lock.

There is one complication, though. If userspace is waiting for the lock
in tb_switch_set_authorized(), it keeps the device_del() from removing
the sysfs attribute because it waits for active users to release the
attribute first which leads into following splat:

    INFO: task kworker/u8:3:73 blocked for more than 61 seconds.
          Tainted: G        W         5.1.0-rc1+ #244
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u8:3    D12976    73      2 0x80000000
    Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
    Call Trace:
     ? __schedule+0x2e5/0x740
     ? _raw_spin_lock_irqsave+0x12/0x40
     ? prepare_to_wait_event+0xc5/0x160
     schedule+0x2d/0x80
     __kernfs_remove.part.17+0x183/0x1f0
     ? finish_wait+0x80/0x80
     kernfs_remove_by_name_ns+0x4a/0x90
     remove_files.isra.1+0x2b/0x60
     sysfs_remove_group+0x38/0x80
     sysfs_remove_groups+0x24/0x40
     device_remove_attrs+0x3d/0x70
     device_del+0x14c/0x360
     device_unregister+0x15/0x50
     tb_switch_remove+0x9e/0x1d0 [thunderbolt]
     tb_handle_hotplug+0x119/0x5a0 [thunderbolt]
     ? process_one_work+0x1b7/0x420
     process_one_work+0x1b7/0x420
     worker_thread+0x37/0x380
     ? _raw_spin_unlock_irqrestore+0xf/0x30
     ? process_one_work+0x420/0x420
     kthread+0x118/0x130
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x35/0x40

We deal this by following what network stack did for some of their
attributes and use mutex_trylock() with restart_syscall(). This makes
userspace release the attribute allowing sysfs attribute removal to
progress before the write is restarted and eventually fail when the
attribute is removed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
popcornmix pushed a commit that referenced this issue Jun 10, 2019
[ Upstream commit 09f11b6 ]

switch_lock was introduced because it allowed serialization of device
authorization requests from userspace without need to take the big
domain lock (tb->lock). This was fine because device authorization with
ICM is just one command that is sent to the firmware. Now that we start
to handle all tunneling in the driver switch_lock is not enough because
we need to walk over the topology to establish paths.

For this reason drop switch_lock from the driver completely in favour of
big domain lock.

There is one complication, though. If userspace is waiting for the lock
in tb_switch_set_authorized(), it keeps the device_del() from removing
the sysfs attribute because it waits for active users to release the
attribute first which leads into following splat:

    INFO: task kworker/u8:3:73 blocked for more than 61 seconds.
          Tainted: G        W         5.1.0-rc1+ #244
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    kworker/u8:3    D12976    73      2 0x80000000
    Workqueue: thunderbolt0 tb_handle_hotplug [thunderbolt]
    Call Trace:
     ? __schedule+0x2e5/0x740
     ? _raw_spin_lock_irqsave+0x12/0x40
     ? prepare_to_wait_event+0xc5/0x160
     schedule+0x2d/0x80
     __kernfs_remove.part.17+0x183/0x1f0
     ? finish_wait+0x80/0x80
     kernfs_remove_by_name_ns+0x4a/0x90
     remove_files.isra.1+0x2b/0x60
     sysfs_remove_group+0x38/0x80
     sysfs_remove_groups+0x24/0x40
     device_remove_attrs+0x3d/0x70
     device_del+0x14c/0x360
     device_unregister+0x15/0x50
     tb_switch_remove+0x9e/0x1d0 [thunderbolt]
     tb_handle_hotplug+0x119/0x5a0 [thunderbolt]
     ? process_one_work+0x1b7/0x420
     process_one_work+0x1b7/0x420
     worker_thread+0x37/0x380
     ? _raw_spin_unlock_irqrestore+0xf/0x30
     ? process_one_work+0x420/0x420
     kthread+0x118/0x130
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x35/0x40

We deal this by following what network stack did for some of their
attributes and use mutex_trylock() with restart_syscall(). This makes
userspace release the attribute allowing sysfs attribute removal to
progress before the write is restarted and eventually fail when the
attribute is removed.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants