Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebalancing and Filesystem hangs / stops writing data #677

Open
EvilDragon opened this issue May 12, 2024 · 1 comment
Open

Rebalancing and Filesystem hangs / stops writing data #677

EvilDragon opened this issue May 12, 2024 · 1 comment

Comments

@EvilDragon
Copy link

EvilDragon commented May 12, 2024

Might be related to issue #673, but could be someting different.

Some general information about my system:

Kernel: 6.8.9-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 May 2024 17:49:46 +0000 x86_64 GNU/Linux

bcachefs fs usage:
[root@shiranui ~]# bcachefs fs usage /data/Server/
Filesystem: a82b31db-a070-4836-adb9-8cbc9da9d7de
Size: 46001864919040
Used: 33063644488704
Online reserved: 2174943232

Data type Required/total Durability Devices
btree: 1/2 2 [sdc sdd] 97009532928
btree: 1/2 2 [sdd sda] 46555201536
btree: 1/2 2 [sdc sda] 35396780032
btree: 1/2 2 [sda sdb] 4547674112
user: 1/2 2 [sdc sdd] 21990185719296
user: 1/2 2 [sdd sdb] 5945180940800
user: 1/2 2 [sdc sdb] 1050840915456
user: 1/2 2 [sdc sda] 444664465408
user: 1/2 2 [sdd sda] 3414381998080
user: 1/2 2 [sda sdb] 6886801920
cached: 1/1 1 [sdb] 172820480

hdd.12tb (device 0): sdc rw
data buckets fragmented
free: 181705113600 346575
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 66203156480 135945 5071175680
user: 11742861336576 22397729 167936
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 12000138625024 22888448

hdd.16tb (device 1): sdd rw
data buckets fragmented
free: 244386365440 466130
sb: 3149824 7 520192
journal: 4294967296 8192
btree: 71782367232 147326 5458886656
user: 15674973954560 29897641 512000
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 16000900661248 30519296

hdd.20tb (device 3): sdb rw
data buckets fragmented
free: 16487426293760 15723635
sb: 3149824 4 1044480
journal: 8589934592 8192
btree: 2273837056 2421 264765440
user: 3501317553664 3339117 512000
cached: 172820480 679
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 20000588955648 19074048

ssd.2tb (device 2): sda rw
data buckets fragmented
free: 916455424 874
sb: 3149824 4 1044480
journal: 8589934592 8192
btree: 43249827840 55218 14650441728
user: 1932987996160 1843441
cached: 0 0
parity: 0 0
stripe: 0 0
need_gc_gens: 0 0
need_discard: 0 0
capacity: 2000398843904 1907729

What did I do? (not sure it's necessary, but maybe it helps debugging)

My server so far was running with on 20TB HDD and a 16TB HDD, both on btrfs. The 16TB was just a backup of the most important stuff (nightly rsync).

My plan is to switch over to bcachefs with multiple drives, using replica=2 and snapshots to be able to recover a file in case I accidentally delete it.

I had another 12tb and 8tb hdd at hand which I wanted to use to increase the storage size and copy stuff over.

My final planned setup was:
bcachefs: Use the 20TB, 16TB and 12TB hdd with replicas (I later had the idea to use a 2TB ssd as cache as well).

My steps to migrate to bcachefs was:

  1. Create a bcachefs with the 12TB and 16TB HD
    bcachefs format \ --label=ssd.16tb /dev/sda \ --label=hdd.12tb /dev/sdb \ --replicas=2 \ --foreground_target=ssd \ --promote_target=ssd \ --background_target=hdd

    (I did make an SSD target as I planned to add a 2TB SSD the next days and I wasn't sure setting the targets work well when done later). Note: Don't mind the drive devices, these got mixed up later when I switched and added the devices, to they're no accurate to what fs usage returned)

  2. Copy 12TB of data to the new bcachefs filesystem (yeah, that's pretty close to being full).

  3. Copy the remaining data to the 8TB HDD (temporary)

  4. Add the newly bought SSD and change the group of the existing hdd:
    bcachefs device add --label=ssd.2tb /data/Server /dev/sdd
    echo hdd.16tb > /sys/fs/bcachefs/a82b31db-a070-4836-adb9-8cbc9da9d7de/dev-1/label

  5. After that, I added the 20tb HD as well:
    bcachefs device add --label=hdd.20tb /data/Server /dev/sdc
    This finalized my new bcachefs setup.

  6. Finally, I started to copy everything from the 8TB HDD over to the bcachefs array as well. This worked well for the first 5TB - after that, bcachefs started to act up - and since then, I couldn't get it to work properly.

What's the problem now?

I've got multiple problems, but maybe these are related. I can mount the bcachefs without any problems and access it as wel.
However, the rebalancing task crashes early on:

[ 245.507447] INFO: task bch-rebalance/a:1676 blocked for more than 122 seconds.
[ 245.507459] Not tainted 6.8.9-arch1-1 #1
[ 245.507464] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.507468] task:bch-rebalance/a state:D stack:0 pid:1676 tgid:1676 ppid:2 flags:0x00004000
[ 245.507479] Call Trace:
[ 245.507484]
[ 245.507494] __schedule+0x3e6/0x1520
[ 245.507520] schedule+0x32/0xd0
[ 245.507530] __closure_sync+0x82/0x160
[ 245.507545] __bch2_write+0x1154/0x13b0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.507691] ? psi_group_change+0x213/0x3c0
[ 245.507704] ? srso_return_thunk+0x5/0x5f
[ 245.507711] ? psi_task_switch+0x122/0x230
[ 245.507719] ? srso_return_thunk+0x5/0x5f
[ 245.507725] ? local_clock_noinstr+0xd/0xb0
[ 245.507732] ? srso_return_thunk+0x5/0x5f
[ 245.507737] ? srso_return_thunk+0x5/0x5f
[ 245.507747] ? bch2_moving_ctxt_do_pending_writes+0x11c/0x230 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.507871] bch2_moving_ctxt_do_pending_writes+0x11c/0x230 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508002] bch2_move_ratelimit+0x1d0/0x480 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508127] ? __pfx_autoremove_wake_function+0x10/0x10
[ 245.508139] do_rebalance+0x1a1/0x8c0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508261] ? srso_return_thunk+0x5/0x5f
[ 245.508267] ? __lruvec_stat_mod_folio+0x81/0xa0
[ 245.508291] ? srso_return_thunk+0x5/0x5f
[ 245.508297] ? local_clock_noinstr+0xd/0xb0
[ 245.508303] ? srso_return_thunk+0x5/0x5f
[ 245.508308] ? srso_return_thunk+0x5/0x5f
[ 245.508314] ? __bch2_trans_get+0x177/0x260 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508414] ? srso_return_thunk+0x5/0x5f
[ 245.508422] ? __pfx_bch2_rebalance_thread+0x10/0x10 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508542] bch2_rebalance_thread+0x66/0xb0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508664] ? bch2_rebalance_thread+0x5c/0xb0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 245.508794] kthread+0xe8/0x120
[ 245.508803] ? __pfx_kthread+0x10/0x10
[ 245.508812] ret_from_fork+0x34/0x50
[ 245.508821] ? __pfx_kthread+0x10/0x10
[ 245.508829] ret_from_fork_asm+0x1b/0x30
[ 245.508844]

It does that regularly. I take it the rebalancer should move some files from the 12TB and 16TB HDDs over to the 20TB HDD but crashes while doing that?
It would explain why both of them are nearly full whereas the 20TB is only filled with a couple of TB (as you can see in the fs usage above).

The second problem is, that the filesystem itself also crashes after a while when I try to write to it, especially with files 1GB or bigger. I was not able to finish copying the remaining 2 - 3 TB data from the 8TB temporary disk.

This is the dmesg:

[ 480.540505] ------------[ cut here ]------------
[ 480.540515] btree trans held srcu lock (delaying memory reclaim) for 15 seconds
[ 480.540545] WARNING: CPU: 3 PID: 1670 at fs/bcachefs/btree_iter.c:2825 bch2_trans_srcu_unlock+0x120/0x130 [bcachefs]
[ 480.540663] Modules linked in: xt_multiport ip6table_filter ip6_tables xt_conntrack xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 x>
[ 480.540831] crypto_simd drm_exec snd_acp_config cryptd mdio_devres snd gpu_sched snd_soc_acpi sp5100_tco i2c_amd_mp2_pci drm_suballoc_helper rapl snd_pci_acp3x pcspkr acpi_cpufreq libphy soundcore k10t>
[ 480.540943] CPU: 3 PID: 1670 Comm: bch-reclaim/a82 Not tainted 6.8.9-arch1-1 #1 b1154d98cdfe0792477fb31da8bf552e17e27d3c
[ 480.540951] Hardware name: Default string Default string/Default string, BIOS 5.0.1.3 08/08/2019
[ 480.540955] RIP: 0010:bch2_trans_srcu_unlock+0x120/0x130 [bcachefs]
[ 480.541074] Code: 48 8b 15 33 f6 6c cf 48 c7 c7 e8 84 0c c2 48 b8 07 3a 6d a0 d3 06 3a 6d 48 29 ca 48 f7 e2 48 89 d6 48 c1 ee 07 e8 70 1a 98 cd <0f> 0b e9 5c ff ff ff 0f 0b e9 6b ff ff ff 66 90 90 90 90>
[ 480.541081] RSP: 0018:ffffaf73cd54bbd8 EFLAGS: 00010282
[ 480.541088] RAX: 0000000000000000 RBX: ffff9d37a1aa4000 RCX: 0000000000000027
[ 480.541094] RDX: ffff9d3950ee19c8 RSI: 0000000000000001 RDI: ffff9d3950ee19c0
[ 480.541098] RBP: ffff9d3659f00000 R08: 0000000000000000 R09: ffffaf73cd54ba68
[ 480.541103] R10: ffffaf73cd54ba60 R11: 0000000000000003 R12: ffffaf73cd54bcd0
[ 480.541108] R13: ffff9d37a1aa4000 R14: ffff9d3659f036d0 R15: ffff9d3659f26c40
[ 480.541113] FS: 0000000000000000(0000) GS:ffff9d3950ec0000(0000) knlGS:0000000000000000
[ 480.541120] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 480.541125] CR2: 000075b2fcd98000 CR3: 000000010b012000 CR4: 00000000003506f0
[ 480.541131] Call Trace:
[ 480.541138]
[ 480.541143] ? bch2_trans_srcu_unlock+0x120/0x130 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541277] ? __warn+0x81/0x130
[ 480.541289] ? bch2_trans_srcu_unlock+0x120/0x130 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541391] ? report_bug+0x171/0x1a0
[ 480.541401] ? prb_read_valid+0x1b/0x30
[ 480.541409] ? srso_return_thunk+0x5/0x5f
[ 480.541420] ? handle_bug+0x3c/0x80
[ 480.541428] ? exc_invalid_op+0x17/0x70
[ 480.541435] ? asm_exc_invalid_op+0x1a/0x20
[ 480.541450] ? bch2_trans_srcu_unlock+0x120/0x130 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541562] ? bch2_trans_srcu_unlock+0x120/0x130 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541679] bch2_trans_begin+0x63b/0x690 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541793] ? bch2_trans_begin+0xe5/0x690 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.541901] ? srso_return_thunk+0x5/0x5f
[ 480.541908] ? __schedule+0x3ee/0x1520
[ 480.541917] ? sysvec_apic_timer_interrupt+0xe/0x90
[ 480.541930] bch2_btree_write_buffer_flush_locked+0x6b/0x980 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542056] ? __pfx_bch2_btree_write_buffer_journal_flush+0x10/0x10 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542172] btree_write_buffer_flush_seq+0x258/0x2a0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542286] ? srso_return_thunk+0x5/0x5f
[ 480.542292] ? local_clock_noinstr+0xd/0xb0
[ 480.542301] ? __pfx_bch2_btree_write_buffer_journal_flush+0x10/0x10 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542415] bch2_btree_write_buffer_journal_flush+0x35/0x60 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542530] journal_flush_pins.constprop.0+0x1ad/0x2d0 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542669] __bch2_journal_reclaim+0x1d1/0x360 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542809] bch2_journal_reclaim_thread+0x7f/0x170 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.542946] ? __pfx_bch2_journal_reclaim_thread+0x10/0x10 [bcachefs 267e5ec8bb3d305ae65d058b667703fa6d81777c]
[ 480.543079] kthread+0xe8/0x120
[ 480.543089] ? __pfx_kthread+0x10/0x10
[ 480.543098] ret_from_fork+0x34/0x50
[ 480.543106] ? __pfx_kthread+0x10/0x10
[ 480.543114] ret_from_fork_asm+0x1b/0x30
[ 480.543131]
[ 480.543134] ---[ end trace 0000000000000000 ]---
[ 2468.010057] ------------[ cut here ]------------

After that happens, I can still read the filesystem without any problem, but trying to write anything to it simply hangs.
Additionally, the cache doesn't empty anymore (also see fs usage above) so the copied data is not written to the disk (the files have a size of 0 after a reboot).

This also makes it impossible to unmount or sync the filesystem, so I can only do a forced shutdown.

After a reboot, the filesystem mounts again without any problems, the rebalancer crashes again and copying more files to the bcachefs filesystem also crashes again after a few seconds.

One thing I noticed is that the filesystem doesn't crash if I copy a directory FROM the bcachefs array TO the bcachefs array (the same one). It reliably creates copies, regardless the size.

Not sure what the difference here is.

Let me know if I should post you any more output / logs.

Any ideas how to fix this?

@EvilDragon EvilDragon changed the title Filesystem hangs / stops writing data Rebalancing and Filesystem hangs / stops writing data May 12, 2024
@EvilDragon
Copy link
Author

Seems to be related / a duplicate to #680

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant