Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generic/166: writing to a full zone #63

Open
naota opened this issue Jul 1, 2022 · 0 comments
Open

generic/166: writing to a full zone #63

naota opened this issue Jul 1, 2022 · 0 comments

Comments

@naota
Copy link
Owner

naota commented Jul 1, 2022

Running generic/166 sometimes writes into a full zone.

Jun 30 15:17:10 kernel: nvme1n2: Write(0x1) @ LBA 4855543, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:10 kernel: I/O error, dev nvme1n2, sector 38844344 op 0x1:(WRITE) flags 0x5800 phys_seg 26 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4855863, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38846904 op 0x1:(WRITE) flags 0x5800 phys_seg 13 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4856183, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38849464 op 0x1:(WRITE) flags 0x5800 phys_seg 9 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4856503, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38852024 op 0x1:(WRITE) flags 0x5800 phys_seg 6 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4856823, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38854584 op 0x1:(WRITE) flags 0x5800 phys_seg 6 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4857143, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38857144 op 0x1:(WRITE) flags 0x5800 phys_seg 7 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4857463, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38859704 op 0x1:(WRITE) flags 0x5800 phys_seg 3 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4857783, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38862264 op 0x1:(WRITE) flags 0x5800 phys_seg 2 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4858103, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38864824 op 0x1:(WRITE) flags 0x5800 phys_seg 3 prio class 0
Jun 30 15:17:11 kernel: nvme1n2: Write(0x1) @ LBA 4858423, 320 blocks, Zone Is Full (sct 0x1 / sc 0xb9) MORE DNR 
Jun 30 15:17:11 kernel: I/O error, dev nvme1n2, sector 38867384 op 0x1:(WRITE) flags 0x5800 phys_seg 2 prio class 0
Jun 30 15:17:11 kernel: BTRFS error (device nvme1n2): bdev /dev/nvme1n2 errs: wr 1, rd 0, flush 0, corrupt 0, gen 0
Jun 30 15:17:11 kernel: BTRFS: error (device nvme1n2) in btrfs_commit_transaction:2422: errno=-5 IO failure (Error while writing out transaction)
Jun 30 15:17:11 kernel: BTRFS info (device nvme1n2: state E): forced readonly
Jun 30 15:17:11 kernel: BTRFS warning (device nvme1n2: state E): Skipping commit of aborted transaction.
Jun 30 15:17:11 kernel: BTRFS: error (device nvme1n2: state EA) in cleanup_transaction:1969: errno=-5 IO failure

I suspect metadata BG is finished before writings of the extent buffers are completed.

naota pushed a commit that referenced this issue Dec 7, 2022
The coreboot_table driver registers a coreboot bus while probing a
"coreboot_table" device representing the coreboot table memory region.
Probing this device (i.e., registering the bus) is a dependency for the
module_init() functions of any driver for this bus (e.g.,
memconsole-coreboot.c / memconsole_driver_init()).

With synchronous probe, this dependency works OK, as the link order in
the Makefile ensures coreboot_table_driver_init() (and thus,
coreboot_table_probe()) completes before a coreboot device driver tries
to add itself to the bus.

With asynchronous probe, however, coreboot_table_probe() may race with
memconsole_driver_init(), and so we're liable to hit one of these two:

1. coreboot_driver_register() eventually hits "[...] the bus was not
   initialized.", and the memconsole driver fails to register; or
2. coreboot_driver_register() gets past #1, but still races with
   bus_register() and hits some other undefined/crashing behavior (e.g.,
   in driver_find() [1])

We can resolve this by registering the bus in our initcall, and only
deferring "device" work (scanning the coreboot memory region and
creating sub-devices) to probe().

[1] Example failure, using 'driver_async_probe=*' kernel command line:

[    0.114217] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
...
[    0.114307] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 6.1.0-rc1 #63
[    0.114316] Hardware name: Google Scarlet (DT)
...
[    0.114488] Call trace:
[    0.114494]  _raw_spin_lock+0x34/0x60
[    0.114502]  kset_find_obj+0x28/0x84
[    0.114511]  driver_find+0x30/0x50
[    0.114520]  driver_register+0x64/0x10c
[    0.114528]  coreboot_driver_register+0x30/0x3c
[    0.114540]  memconsole_driver_init+0x24/0x30
[    0.114550]  do_one_initcall+0x154/0x2e0
[    0.114560]  do_initcall_level+0x134/0x160
[    0.114571]  do_initcalls+0x60/0xa0
[    0.114579]  do_basic_setup+0x28/0x34
[    0.114588]  kernel_init_freeable+0xf8/0x150
[    0.114596]  kernel_init+0x2c/0x12c
[    0.114607]  ret_from_fork+0x10/0x20
[    0.114624] Code: 5280002b 1100054a b900092a f9800011 (885ffc01)
[    0.114631] ---[ end trace 0000000000000000 ]---

Fixes: b81e314 ("firmware: coreboot: Make bus registration symmetric")
Cc: <stable@vger.kernel.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20221019180934.1.If29e167d8a4771b0bf4a39c89c6946ed764817b9@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
naota pushed a commit that referenced this issue Dec 7, 2022
If a socket bound to a wildcard address fails to connect(), we
only reset saddr and keep the port.  Then, we have to fix up the
bhash2 bucket; otherwise, the bucket has an inconsistent address
in the list.

Also, listen() for such a socket will fire the WARN_ON() in
inet_csk_get_port(). [0]

Note that when a system runs out of memory, we give up fixing the
bucket and unlink sk from bhash and bhash2 by inet_put_port().

[0]:
WARNING: CPU: 0 PID: 207 at net/ipv4/inet_connection_sock.c:548 inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
Modules linked in:
CPU: 0 PID: 207 Comm: bhash2_prev_rep Not tainted 6.1.0-rc3-00799-gc8421681c845 #63
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.amzn2022.0.1 04/01/2014
RIP: 0010:inet_csk_get_port (net/ipv4/inet_connection_sock.c:548 (discriminator 1))
Code: 74 a7 eb 93 48 8b 54 24 18 0f b7 cb 4c 89 e6 4c 89 ff e8 48 b2 ff ff 49 8b 87 18 04 00 00 e9 32 ff ff ff 0f 0b e9 34 ff ff ff <0f> 0b e9 42 ff ff ff 41 8b 7f 50 41 8b 4f 54 89 fe 81 f6 00 00 ff
RSP: 0018:ffffc900003d7e50 EFLAGS: 00010202
RAX: ffff8881047fb500 RBX: 0000000000004e20 RCX: 0000000000000000
RDX: 000000000000000a RSI: 00000000fffffe00 RDI: 00000000ffffffff
RBP: ffffffff8324dc00 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000004e20 R15: ffff8881054e1280
FS:  00007f8ac04dc740(0000) GS:ffff88842fc00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020001540 CR3: 00000001055fa003 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 inet_csk_listen_start (net/ipv4/inet_connection_sock.c:1205)
 inet_listen (net/ipv4/af_inet.c:228)
 __sys_listen (net/socket.c:1810)
 __x64_sys_listen (net/socket.c:1819 net/socket.c:1817 net/socket.c:1817)
 do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
 entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7f8ac051de5d
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 93 af 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007ffc1c177248 EFLAGS: 00000206 ORIG_RAX: 0000000000000032
RAX: ffffffffffffffda RBX: 0000000020001550 RCX: 00007f8ac051de5d
RDX: ffffffffffffff80 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00007ffc1c177270 R08: 0000000000000018 R09: 0000000000000007
R10: 0000000020001540 R11: 0000000000000206 R12: 00007ffc1c177388
R13: 0000000000401169 R14: 0000000000403e18 R15: 00007f8ac0723000
 </TASK>

Fixes: 28044fc ("net: Add a bhash2 table hashed by port and address")
Reported-by: syzbot <syzkaller@googlegroups.com>
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Acked-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant