Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starting with 6.7, multi-device bcachefs turn r/o on first write access #638

Closed
daduke opened this issue Jan 22, 2024 · 5 comments
Closed

Comments

@daduke
Copy link

daduke commented Jan 22, 2024

hey there,

we've been playing around with bcachefs for over a year as a possible future candidate for our multi-PB storage setup. We regularly compile upstream kernels and test tiered (HDD hardware RAID + SSD cache) file system configurations. Starting right around the 6.7 release, we noticed that the first write to such a file system causes an error and the FS goes r/o:

Jan 22 11:13:46 phd-test-bcache kernel: bcachefs (sdb): error writing journal entry 25: operation not supported
Jan 22 11:13:46 phd-test-bcache kernel: bcachefs (sdc): error writing journal entry 25: operation not supported
Jan 22 11:13:46 phd-test-bcache kernel: bcachefs (7699265a-0282-4890-b460-1ffdfec996ad): unable to write journal to sufficient devices
Jan 22 11:13:46 phd-test-bcache kernel: bcachefs (7699265a-0282-4890-b460-1ffdfec996ad): fatal error - emergency read only

this is on a Debian Bookworm system with upstream kernel and latest bcachefs-tools. I stripped down the mkfs and the minimal failing configuration is

bcachefs format  --label=ssd  /dev/sdb /dev/sdc

while

bcachefs format  --label=ssd  /dev/sdb

works fine. I haven't found any similar bug description (neither GH issues nor mailing list), so it might well be something particular to our machine.

Any help would be greatly appreciated.

thanks,
-Christian

@boomshroom
Copy link

boomshroom commented Jan 28, 2024

I only started encountering this in 6.8, but 6.7 worked just fine for me. Despite the different versions that triggered it though, the error message I got is otherwise identical to yours (barring the exact devices and journal entry involved).

I managed to track down the error to journal_io.c:journal_write_endio, with the operation not supported error code coming from bio. The strangest part to me is that it appears for all three of my drives despite them being very different from each other (one NVME, one SATA SSD, and one HDD), so if it was a hardware capability problem, I'd expect at least one of said drives to be able to handle it. With that in mind, it's probably more likely something from bio itself.

Edit: After doing some digging, it seems like a likely candidate for the cause was actually already addressed in this commit. I'll see if I end up compiling the kernel from a version past that to see if it fixes the issue. If not, guess we wait for 6.8-rc2.

@kode54
Copy link
Contributor

kode54 commented Jan 28, 2024

Cool, I was about to post this issue, I didn't realize it had already cropped up. Adding my log:

Jan 27 22:25:55 mrgency kernel: bcachefs (sda1): error writing journal entry 1090234: operation not supported
Jan 27 22:25:55 mrgency kernel: bcachefs (sdb1): error writing journal entry 1090234: operation not supported
Jan 27 22:25:55 mrgency kernel: bcachefs (b546dee3-ba04-4def-b057-b12f7c9e2e82): unable to write journal to sufficient devices
Jan 27 22:25:55 mrgency kernel: bcachefs (b546dee3-ba04-4def-b057-b12f7c9e2e82): fatal error - emergency read only

Never happened for me on 6.7 so far, or with 6.7 with patches from master before 6.8 was merged into it. This only started when I attempted to use a mostly vanilla 6.8-rc1 kernel without any bcachefs related patches or commits applied out of tree.

@koverstreet
Copy link
Owner

This should be fixed now in Linus's tree

@kode54
Copy link
Contributor

kode54 commented Jan 29, 2024

Should be in 6.8-rc2, which was just tagged about 35 minutes ago, but is a while from hitting kernel.org main page. Does it need to be backported to 6.7 as well?

@daduke
Copy link
Author

daduke commented Jan 29, 2024

I can confirm that the issue has been resolved. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants