Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors when committing superblock #54

Open
kdave opened this issue Jul 2, 2021 · 0 comments
Open

Errors when committing superblock #54

kdave opened this issue Jul 2, 2021 · 0 comments

Comments

@kdave
Copy link

kdave commented Jul 2, 2021

I've never seen this error in practice so it's likely related to zoned mode or it's been cause by frequent bg relocation. The device is emulated zone with 1MiB zone size (10243 in total), workload is git checkout in a loop. I have the raw block device image and the backing zoned file saved.

[11647.268646] BTRFS info (device sdi): reclaiming chunk 1769996288 with 4% used 95% unusable
[11647.277391] BTRFS info (device sdi): relocating block group 1769996288 flags metadata
[11648.244679] BTRFS info (device sdi): found 3 extents, stage: move data extents
[11649.140962] BTRFS info (device sdi): reclaiming chunk 1772093440 with 25% used 75% unusable
[11649.149842] BTRFS info (device sdi): relocating block group 1772093440 flags metadata
[11650.611832] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11651.472763] BTRFS info (device sdi): reclaiming chunk 1774190592 with 23% used 76% unusable
[11651.481587] BTRFS info (device sdi): relocating block group 1774190592 flags metadata
[11652.402776] BTRFS info (device sdi): found 13 extents, stage: move data extents
[11653.280440] BTRFS info (device sdi): reclaiming chunk 1775239168 with 23% used 76% unusable
[11653.289266] BTRFS info (device sdi): relocating block group 1775239168 flags metadata
[11654.251984] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11655.170664] BTRFS info (device sdi): reclaiming chunk 1776287744 with 23% used 76% unusable
[11655.179520] BTRFS info (device sdi): relocating block group 1776287744 flags metadata
[11656.167540] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11657.061457] BTRFS info (device sdi): reclaiming chunk 1777336320 with 23% used 76% unusable
[11657.070311] BTRFS info (device sdi): relocating block group 1777336320 flags metadata
[11657.983090] BTRFS info (device sdi): found 13 extents, stage: move data extents
[11658.868494] BTRFS info (device sdi): reclaiming chunk 1778384896 with 23% used 76% unusable
[11658.877311] BTRFS info (device sdi): relocating block group 1778384896 flags metadata
[11659.772862] BTRFS info (device sdi): found 12 extents, stage: move data extents
[11692.739276] BTRFS info (device sdi): reclaiming chunk 1758461952 with 15% used 84% unusable
[11692.748449] BTRFS info (device sdi): relocating block group 1758461952 flags metadata
[11692.888585] BTRFS error (device sdi): system chunk array too small 0 < 97
[11692.895613] BTRFS error (device sdi): super block corruption detected before writing it to disk
[11692.904481] BTRFS: error (device sdi) in write_all_supers:4163: errno=-117 Filesystem corrupted (unexpected superblock corruption detected)
[11692.917160] BTRFS info (device sdi): forced readonly
[11692.917168] BTRFS warning (device sdi): Skipping commit of aborted transaction.
[11692.917173] ------------[ cut here ]------------
[11692.917175] BTRFS: Transaction aborted (error -117)
[11692.917255] WARNING: CPU: 5 PID: 10532 at fs/btrfs/transaction.c:1946 cleanup_transaction+0x6b/0xc0 [btrfs]
[11692.949374] Modules linked in: iscsi_target_mod tcm_loop target_core_pscsi target_core_file target_core_iblock nfs lockd grace sunrpc fscache target_core_user uio target_core_mod netfs af_packet i2c_algo_bit drm_ttm_helper ttm br_netfilter bridge drm_kms_helper btrfs stp llc syscopyarea sysfillrect sysimgblt fb_sys_fops drm iscsi_ibft blake2b_generic libcrc32c iscsi_boot_sysfs xor zstd_decompress zstd_compress tg3 tpm_infineon tpm_tis tpm_tis_core tpm libphy xxhash raid6_pq acpi_cpufreq mptctl button i2c_piix4 k10temp ext4 mbcache jbd2 ohci_pci ehci_pci ohci_hcd mptsas ata_generic ehci_hcd scsi_transport_sas mptscsih serio_raw sata_svw mptbase usbcore pata_serverworks sg scsi_dh_rdac scsi_dh_emc scsi_dh_alua
[11693.012551] CPU: 5 PID: 10532 Comm: kworker/u16:4 Not tainted 5.13.0-git+ #784
[11693.012559] Hardware name: empty empty/S3993, BIOS PAQEX0-3 02/24/2008
[11693.012564] Workqueue: events_unbound btrfs_reclaim_bgs_work [btrfs]
[11693.033318] RIP: 0010:cleanup_transaction+0x6b/0xc0 [btrfs]
[11693.033430] Code: 00 1c 00 00 03 72 25 41 83 fc fb 0f 84 d8 f9 0b 00 41 83 fc e2 0f 84 ce f9 0b 00 44 89 e6 48 c7 c7 38 b2 fa c0 e8 89 db a3 e9 <0f> 0b 44 89 e1 ba 9a 07 00 00 49 8d 5e 28 48 89 ef 48 c7 c6 b0 9e
[11693.033435] RSP: 0018:ffff9d524b2d3c20 EFLAGS: 00010282
[11693.033440] RAX: 0000000000000000 RBX: 00000000ffffff8b RCX: ffff8b8b679d9978
[11693.033443] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff8b8b679d9970
[11693.033445] RBP: ffff8b8a8022ba78 R08: 0000000000000000 R09: 0000000000000000
[11693.033448] R10: ffffffffaa1106c8 R11: 00000000312d2072 R12: 00000000ffffff8b
[11693.092366] R13: ffff8b8a88054000 R14: ffff8b8a6bd0d000 R15: ffff8b8a88054000
[11693.092370] FS:  0000000000000000(0000) GS:ffff8b8b67800000(0000) knlGS:0000000000000000
[11693.092373] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[11693.092375] CR2: 00007ff2ad38e4c0 CR3: 0000000191cec000 CR4: 00000000000006e0
[11693.092379] Call Trace:
[11693.092388]  btrfs_commit_transaction.cold+0x3e/0xe2 [btrfs]
[11693.129246]  relocate_block_group+0x55/0x580 [btrfs]
[11693.129356]  ? btrfs_wait_nocow_writers+0x33/0xc0 [btrfs]
[11693.129470]  btrfs_relocate_block_group+0x177/0x340 [btrfs]
[11693.145810]  btrfs_relocate_chunk+0x38/0x120 [btrfs]
[11693.145915]  btrfs_reclaim_bgs_work.cold+0x73/0x167 [btrfs]
[11693.156746]  process_one_work+0x262/0x600
[11693.160871]  worker_thread+0x4c/0x320
[11693.160875]  ? process_one_work+0x600/0x600
[11693.160880]  kthread+0x135/0x160
[11693.172307]  ? set_kthread_struct+0x40/0x40
[11693.172312]  ret_from_fork+0x1f/0x30
[11693.172323] irq event stamp: 707214
[11693.183906] hardirqs last  enabled at (707213): [<ffffffffaa996974>] _raw_spin_unlock_irq+0x24/0x40
[11693.183915] hardirqs last disabled at (707214): [<ffffffffaa98e7f4>] __schedule+0x3d4/0x860
[11693.201644] softirqs last  enabled at (707146): [<ffffffffaac00307>] __do_softirq+0x307/0x502
[11693.201651] softirqs last disabled at (707139): [<ffffffffaa08ab27>] __irq_exit_rcu+0xa7/0xd0
[11693.201657] ---[ end trace 7ec09b227c3735c2 ]---
[11693.201662] BTRFS: error (device sdi) in cleanup_transaction:1946: errno=-117 Filesystem corrupted
[11693.202230] BTRFS error (device sdi): error relocating chunk 1758461952
[11693.239552] BTRFS info (device sdi): reclaiming chunk 1779433472 with 17% used 82% unusable
[11693.239580] BTRFS error (device sdi): error relocating chunk 1779433472

btrfs fi df after it got stuck:

Data, single: total=238.00MiB, used=192.06MiB, zone_unusable=31.75MiB
System, single: total=1.00MiB, used=48.00KiB, zone_unusable=432.00KiB
Metadata, single: total=13.00MiB, used=2.06MiB, zone_unusable=8.94MiB
GlobalReserve, single: total=3.25MiB, used=16.00KiB

system chunk array too small 0 < 97 the zero probably means the page is all zeros, I don't have other data as it's all in memory.

# btrfs check --force /dev/sdi
Opening filesystem to check...
WARNING: filesystem mounted, continuing because of --force
No mapping for 1773486080-1773502464
Couldn't map the block 1773486080
Couldn't map the block 1773486080
bad tree block 1773486080, bytenr mismatch, want=1773486080, have=0
ERROR: cannot read chunk root
ERROR: cannot open file system

Note where the mapping error is: the chunk got reclaimed close to the point it crashed (look for in 1772093440 in the dump).

# btrfs inspect dump-super /dev/sdi
superblock: bytenr=65536, device=/dev/sdi
---------------------------------------------------------
csum_type               0 (crc32c)
csum_size               4
csum                    0x7afc6e5c [match]
bytenr                  65536
flags                   0x1
                        ( WRITTEN )
magic                   _BHRfS_M [match]
fsid                    59447fd2-f548-443d-abe8-79cdf178321f
metadata_uuid           59447fd2-f548-443d-abe8-79cdf178321f
label                   TEST-ZONED
generation              3507
root                    1781022720
sys_array_size          97
chunk_root_generation   3507
root_level              0
chunk_root              1773486080
chunk_root_level        1
log_root                0
log_root_transid        0
log_root_level          0
total_bytes             10740056064
bytes_used              203599872
sectorsize              4096
nodesize                16384
leafsize (deprecated)   16384
stripesize              4096
root_dir                6
num_devices             1
compat_flags            0x0
compat_ro_flags         0x3
                        ( FREE_SPACE_TREE |
                          FREE_SPACE_TREE_VALID )
incompat_flags          0x1361
                        ( MIXED_BACKREF |
                          BIG_METADATA |
                          EXTENDED_IREF |
                          SKINNY_METADATA |
                          NO_HOLES |
                          ZONED )
cache_generation        0
uuid_tree_generation    3507
dev_item.uuid           434d4002-e9cf-4063-9045-082fea25ee85
dev_item.fsid           59447fd2-f548-443d-abe8-79cdf178321f [match]
dev_item.type           0
dev_item.total_bytes    10740056064
dev_item.bytes_used     577765376
dev_item.io_align       4096
dev_item.io_width       4096
dev_item.sector_size    4096
dev_item.devid          1
dev_item.dev_group      0
dev_item.seek_speed     0
dev_item.bandwidth      0
dev_item.generation     0
naota pushed a commit that referenced this issue Mar 9, 2022
…_transaction()

We are seeing crashes similar to the following trace:

[38.969182] WARNING: CPU: 20 PID: 2105 at fs/btrfs/relocation.c:4070 btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.973556] CPU: 20 PID: 2105 Comm: btrfs Not tainted 5.17.0-rc4 #54
[38.974580] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[38.976539] RIP: 0010:btrfs_relocate_block_group+0x2dc/0x340 [btrfs]
[38.980336] RSP: 0000:ffffb0dd42e03c20 EFLAGS: 00010206
[38.981218] RAX: ffff96cfc4ede800 RBX: ffff96cfc3ce0000 RCX: 000000000002ca14
[38.982560] RDX: 0000000000000000 RSI: 4cfd109a0bcb5d7f RDI: ffff96cfc3ce0360
[38.983619] RBP: ffff96cfc309c000 R08: 0000000000000000 R09: 0000000000000000
[38.984678] R10: ffff96cec0000001 R11: ffffe84c80000000 R12: ffff96cfc4ede800
[38.985735] R13: 0000000000000000 R14: 0000000000000000 R15: ffff96cfc3ce0360
[38.987146] FS:  00007f11c15218c0(0000) GS:ffff96d6dfb00000(0000) knlGS:0000000000000000
[38.988662] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[38.989398] CR2: 00007ffc922c8e60 CR3: 00000001147a6001 CR4: 0000000000370ee0
[38.990279] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[38.991219] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[38.992528] Call Trace:
[38.992854]  <TASK>
[38.993148]  btrfs_relocate_chunk+0x27/0xe0 [btrfs]
[38.993941]  btrfs_balance+0x78e/0xea0 [btrfs]
[38.994801]  ? vsnprintf+0x33c/0x520
[38.995368]  ? __kmalloc_track_caller+0x351/0x440
[38.996198]  btrfs_ioctl_balance+0x2b9/0x3a0 [btrfs]
[38.997084]  btrfs_ioctl+0x11b0/0x2da0 [btrfs]
[38.997867]  ? mod_objcg_state+0xee/0x340
[38.998552]  ? seq_release+0x24/0x30
[38.999184]  ? proc_nr_files+0x30/0x30
[38.999654]  ? call_rcu+0xc8/0x2f0
[39.000228]  ? __x64_sys_ioctl+0x84/0xc0
[39.000872]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[39.001973]  __x64_sys_ioctl+0x84/0xc0
[39.002566]  do_syscall_64+0x3a/0x80
[39.003011]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[39.003735] RIP: 0033:0x7f11c166959b
[39.007324] RSP: 002b:00007fff2543e998 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[39.008521] RAX: ffffffffffffffda RBX: 00007f11c1521698 RCX: 00007f11c166959b
[39.009833] RDX: 00007fff2543ea40 RSI: 00000000c4009420 RDI: 0000000000000003
[39.011270] RBP: 0000000000000003 R08: 0000000000000013 R09: 00007f11c16f94e0
[39.012581] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff25440df3
[39.014046] R13: 0000000000000000 R14: 00007fff2543ea40 R15: 0000000000000001
[39.015040]  </TASK>
[39.015418] ---[ end trace 0000000000000000 ]---
[43.131559] ------------[ cut here ]------------
[43.132234] kernel BUG at fs/btrfs/extent-tree.c:2717!
[43.133031] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[43.133702] CPU: 1 PID: 1839 Comm: btrfs Tainted: G        W         5.17.0-rc4 #54
[43.134863] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[43.136426] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.139913] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.140629] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.141604] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.142645] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.143669] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.144657] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.145686] FS:  00007f7657dd68c0(0000) GS:ffff96d6df640000(0000) knlGS:0000000000000000
[43.146808] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.147584] CR2: 00007f7fe81bf5b0 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.148589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.149581] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[43.150559] Call Trace:
[43.150904]  <TASK>
[43.151253]  btrfs_finish_extent_commit+0x88/0x290 [btrfs]
[43.152127]  btrfs_commit_transaction+0x74f/0xaa0 [btrfs]
[43.152932]  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
[43.153786]  btrfs_ioctl+0x1edc/0x2da0 [btrfs]
[43.154475]  ? __check_object_size+0x150/0x170
[43.155170]  ? preempt_count_add+0x49/0xa0
[43.155753]  ? __x64_sys_ioctl+0x84/0xc0
[43.156437]  ? btrfs_ioctl_get_supported_features+0x30/0x30 [btrfs]
[43.157456]  __x64_sys_ioctl+0x84/0xc0
[43.157980]  do_syscall_64+0x3a/0x80
[43.158543]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[43.159231] RIP: 0033:0x7f7657f1e59b
[43.161819] RSP: 002b:00007ffda5cd1658 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[43.162702] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f7657f1e59b
[43.163526] RDX: 0000000000000000 RSI: 0000000000009408 RDI: 0000000000000003
[43.164358] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
[43.165208] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[43.166029] R13: 00005621b91c3232 R14: 00005621b91ba580 R15: 00007ffda5cd1800
[43.166863]  </TASK>
[43.167125] Modules linked in: btrfs blake2b_generic xor pata_acpi ata_piix libata raid6_pq scsi_mod libcrc32c virtio_net virtio_rng net_failover rng_core failover scsi_common
[43.169552] ---[ end trace 0000000000000000 ]---
[43.171226] RIP: 0010:unpin_extent_range+0x37a/0x4f0 [btrfs]
[43.174767] RSP: 0000:ffffb0dd4216bc70 EFLAGS: 00010246
[43.175600] RAX: 0000000000000000 RBX: ffff96cfc34490f8 RCX: 0000000000000001
[43.176468] RDX: 0000000080000001 RSI: 0000000051d00000 RDI: 00000000ffffffff
[43.177357] RBP: 0000000000000000 R08: 0000000000000000 R09: ffff96cfd07dca50
[43.178271] R10: ffff96cfc46e8a00 R11: fffffffffffec000 R12: 0000000041d00000
[43.179178] R13: ffff96cfc3ce0000 R14: ffffb0dd4216bd08 R15: 0000000000000000
[43.180071] FS:  00007f7657dd68c0(0000) GS:ffff96d6df800000(0000) knlGS:0000000000000000
[43.181073] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43.181808] CR2: 00007fe09905f010 CR3: 00000001093ee004 CR4: 0000000000370ee0
[43.182706] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[43.183591] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400

We first hit the WARN_ON(rc->block_group->pinned > 0) in
btrfs_relocate_block_group() and then the BUG_ON(!cache) in
unpin_extent_range(). This tells us that we are exiting relocation and
removing the block group with bytes still pinned for that block group.
This is supposed to be impossible: the last thing relocate_block_group()
does is commit the transaction to get rid of pinned extents.

Commit d0c2f4f ("btrfs: make concurrent fsyncs wait less when
waiting for a transaction commit") introduced an optimization so that
commits from fsync don't have to wait for the previous commit to unpin
extents. This was only intended to affect fsync, but it inadvertently
made it possible for any commit to skip waiting for the previous commit
to unpin. This is because if a call to btrfs_commit_transaction() finds
that another thread is already committing the transaction, it waits for
the other thread to complete the commit and then returns. If that other
thread was in fsync, then it completes the commit without completing the
previous commit. This makes the following sequence of events possible:

Thread 1____________________|Thread 2 (fsync)_____________________|Thread 3 (balance)___________________
btrfs_commit_transaction(N) |                                     |
  btrfs_run_delayed_refs    |                                     |
    pin extents             |                                     |
  ...                       |                                     |
  state = UNBLOCKED         |btrfs_sync_file                      |
                            |  btrfs_start_transaction(N + 1)     |relocate_block_group
                            |                                     |  btrfs_join_transaction(N + 1)
                            |  btrfs_commit_transaction(N + 1)    |
  ...                       |  trans->state = COMMIT_START        |
                            |                                     |  btrfs_commit_transaction(N + 1)
                            |                                     |    wait_for_commit(N + 1, COMPLETED)
                            |  wait_for_commit(N, SUPER_COMMITTED)|
  state = SUPER_COMMITTED   |  ...                                |
  btrfs_finish_extent_commit|                                     |
    unpin_extent_range()    |  trans->state = COMPLETED           |
                            |                                     |    return
                            |                                     |
    ...                     |                                     |Thread 1 isn't done, so pinned > 0
                            |                                     |and we WARN
                            |                                     |
                            |                                     |btrfs_remove_block_group
    unpin_extent_range()    |                                     |
      Thread 3 removed the  |                                     |
      block group, so we BUG|                                     |

There are other sequences involving SUPER_COMMITTED transactions that
can cause a similar outcome.

We could fix this by making relocation explicitly wait for unpinning,
but there may be other cases that need it. Josef mentioned ENOSPC
flushing and the free space cache inode as other potential victims.
Rather than playing whack-a-mole, this fix is conservative and makes all
commits not in fsync wait for all previous transactions, which is what
the optimization intended.

Fixes: d0c2f4f ("btrfs: make concurrent fsyncs wait less when waiting for a transaction commit")
CC: stable@vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant