-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bcachefs hangs when doing lots of writes to a nocow folder #680
Comments
I'm seeing what seems to be the same issue, even though I'm not using nocow. There's a lot of both reads and writes happening to the drive, but interestingly minio is barely doing anything.
EDIT(2024-05-19): I tried a couple things - did
EDIT(2024-05-19): I did an experiment and deleted a big file - the processes that were stuck got unstuck. I suspect they will come back to being stuck again, but hopefully this provides some fuel for investigation. |
6.9 seems to be causing issues with stablity, wait till koverstreet/bcachefs#680 is resolved to return to newer kernels.
It just happened again. I am running:
My rootfs thankfully is btrfs so the machine booted fine. I will have to pause my heavy writes until this is fixed. :( @ramonacat which kernel version did you downgrade to? Did you encounter any more hangs? I will try to remove the nocow attribute and see if that helps a bit. |
I downgraded to 6.8.10 (from 6.9.0). But it does not seem to have changed the situation. The issue in my case seems to be that the filesystem gets stuck instead of moving buckets around to use the free space. |
This seems to be very similar or the same I have with #677 I've only posted that on IRC already. Regardless whether the filesystem / a folder is cow or nocow, regardless whether I'm using compression or not: As soon as I write something to the bcachefs drives, it goes slower and slower and eventually stalls. In my case, whenever this happens, all writes to other filesystems are slow as well (around 70 - 300kB/s), so whatever bcachefs does, it's affecting the whole system. It happened both on kernel 6.8 and 6.9. And I've now let it run for 5 days without writing to it, which works fine, so it definitely has something to do with writing. As can be seen #677, it also happened to me after adding a new drive, and rebalancing has the same hangs, so this could be a pointer in the right direction. |
I managed to temporarily work around this by adding a couple drives I had lying around to the array., |
Could you elaborate some more info? I can add some temporary external drives, but would like to remove them again after the issue is gone, that's why I'm asking :) |
One SSD and one HDD, tho I don't think it matters. I don't think I can remove them, I think there's just some problem with allocations. |
Quick update: I added a second 2TB SSD to my setup and it also resolved the issue. |
I don't have any nocow data, but I did try enabling compression attribute on a folder with replicas=1 and then upping replicas to 2, I don't know if that plays into it With bcachefs-for-upstream v6.10-rc2-4-ga9cf489be39f I get this frequently. [ 1330.210841] INFO: task bch-rebalance/3:1481 blocked for more than 1208 seconds. |
I had a borg cache directory on the bcachefs and a backup consistently triggered kernel thread timeouts (hung threads). Nocow did not seem to make a difference. I moved the cache to a different drive for now. PS: Is there a way to remove nocow with |
I noticed the whole fs was stuck and checked dmesg:
Bcachefs show-super:
Bcachefs fs usage:
It recovered fine after a forced reboot.
Sadly the kernel is tainted due to nvidia.
The text was updated successfully, but these errors were encountered: