Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bcachefs journal update failure: infinite loop when moving to v2 structure #189

Closed
jpsollie opened this issue Feb 4, 2021 · 13 comments
Closed

Comments

@jpsollie
Copy link
Contributor

jpsollie commented Feb 4, 2021

Fixing ticket #188 introduces another issue:

[52085.550911]` bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): alloc_v2 feature bit not set, fsck required
[52095.208082] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): journal read done, 0 keys in 1 entries, seq 1483271
[52096.114931] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): starting mark and sweep
[52147.784274] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 4 has wrong buckets_unavailable: got 0, should be 5205, fixing
[52147.784277] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[52147.784278] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[52147.784280] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[52147.784281] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal buckets: got 0, should be 942, fixing
[52147.784282] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal sectors: got 0, should be 482304, fixing
[52147.784283] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 3 has wrong btree buckets: got 0, should be 4262, fixing
[52147.784284] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 3 has wrong btree sectors: got 0, should be 2182144, fixing
[52147.784285] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 4 has wrong user fragmented: got 0, should be 865, fixing
[52147.784286] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 5 has wrong cached buckets: got 0, should be 228715, fixing
[52147.784287] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 5 has wrong cached sectors: got 0, should be 117101215, fixing
[52147.786287] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 7 has wrong buckets_unavailable: got 0, should be 7496, fixing
[52147.786290] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[52147.786291] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[52147.786292] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[52147.786293] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal buckets: got 0, should be 1024, fixing
[52147.786293] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal sectors: got 0, should be 524288, fixing
[52147.786294] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 3 has wrong btree buckets: got 0, should be 6471, fixing
[52147.786295] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 3 has wrong btree sectors: got 0, should be 3313152, fixing
[52147.786296] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 4 has wrong user fragmented: got 0, should be 1814, fixing
[52147.786297] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 5 has wrong cached buckets: got 0, should be 343134, fixing
[52147.786297] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 5 has wrong cached sectors: got 0, should be 175682794, fixing
[52147.788330] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 7 has wrong buckets_unavailable: got 0, should be 7617, fixing
[52147.788332] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[52147.788332] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[52147.788333] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[52147.788334] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal buckets: got 0, should be 1024, fixing
[52147.788335] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Ratelimiting new instances of previous error
[52147.788335] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 2 has wrong journal sectors: got 0, should be 524288, fixing
[52147.788336] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Ratelimiting new instances of previous error
[52147.788337] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 4 has wrong user fragmented: got 0, should be 4514, fixing
[52147.789181] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 7 has wrong buckets_unavailable: got 0, should be 3378, fixing
[52147.789184] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[52147.789186] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 4 has wrong user fragmented: got 0, should be 1090, fixing
[52148.243930] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 7 has wrong buckets_unavailable: got 0, should be 6295400, fixing
[52148.243933] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 1 has wrong sb fragmented: got 0, should be 760, fixing
[52148.243935] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): dev 3 has wrong btree fragmented: got 0, should be 512, fixing
[52148.243936] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Ratelimiting new instances of previous error
[52148.243944] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): fs has wrong hidden: got 0, should be 2887680, fixing
[52164.894861] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Unable to allocate journal write
[52164.894865] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): emergency read only
[52164.894866] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): unable to write journal to sufficient devices
[52164.894868] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): error writing alloc info
[52164.894872] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Error in recovery: error writing out alloc info (-30)
[52172.576909] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): journal read done, 4685929 keys in 514 entries, seq 1483800
[52182.577440] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Unable to allocate journal write
[52182.577445] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): emergency read only
[52182.577447] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): unable to write journal to sufficient devices
[52182.578984] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Unable to allocate journal write
[52182.578986] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): unable to write journal to sufficient devices
[52189.471712] bcachefs (e6245b31-be22-42bc-a2d6-b9fe28b7533b): Error in recovery: journal replay failed (-5)

additionally, there's a kworker/u64 thread named btree_update with 100% cpu useage.

bcachefs superblock layout:

External UUID:                  e6245b31-be22-42bc-a2d6-b9fe28b7533b
Internal UUID:                  baef68dc-f6de-4d95-a410-6c56b41eb044
Device index:                   4
Label:                          bigdata
Version:                        11
Created:                        Thu Sep 10 15:08:28 2020
Squence number:                 301
Block_size:                     512
Btree node size:                256.0K
Error action:                   remount-ro
Clean:                          0
Features:                       atomic_nlink,journal_seq_blacklist_v3,reflink,new_siphash,inline_data,new_extent_overwrite,btree_ptr_v2,extents_above_btree_updates,btree_updates_journalled,alloc_v2
Metadata replicas:              1
Data replicas:                  1
Metadata checksum type:         crc32c (1)
Data checksum type:             crc32c (1)
Compression type:               none (0)
Foreground write target:        Group 0 (SSD)
Background write target:        Group 1 (HDD)
Promote target:                 Group 0 (SSD)
String hash type:               siphash (2)
32 bit inodes:                  0
GC reserve percentage:          8%
Root reserve percentage:        0%
Devices:                        5 live, 5 total
Sections:                       journal,members,replicas_v0,disk_groups,clean,journal_seq_blacklist
Superblock size:                11376

Members (size 288):
  Device 0:
    UUID:                       54e8d69b-6e8a-4872-8779-089791a1fc06
    Size:                       58.9G
    Bucket size:                256.0K
    First bucket:               0
    Buckets:                    241308
    Last mount:                 Thu Feb  4 06:56:56 2021
    State:                      readwrite
    Group:                      SSD (0)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    1
  Device 1:
    UUID:                       839f2f3c-fa00-4229-97be-0229b0dbcc00
    Size:                       88.5G
    Bucket size:                256.0K
    First bucket:               0
    Buckets:                    362388
    Last mount:                 Thu Feb  4 06:56:56 2021
    State:                      readwrite
    Group:                      SSD (0)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    1
  Device 2:
    UUID:                       eba710ca-efc2-47ad-b023-50204710be24
    Size:                       89.4G
    Bucket size:                256.0K
    First bucket:               0
    Buckets:                    366296
    Last mount:                 Thu Feb  4 06:56:56 2021
    State:                      readwrite
    Group:                      SSD (0)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    1
  Device 3:
    UUID:                       d2c18ef7-94f5-4790-858f-7c43efce6fe6
    Size:                       37.3G
    Bucket size:                256.0K
    First bucket:               0
    Buckets:                    152588
    Last mount:                 Thu Feb  4 06:56:56 2021
    State:                      readwrite
    Group:                      SSD (0)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    1
  Device 4:
    UUID:                       4a34b67d-d454-45e3-938c-1cf8fc1360fb
    Size:                       40.0T
    Bucket size:                512.0K
    First bucket:               0
    Buckets:                    83916780
    Last mount:                 Thu Feb  4 06:56:56 2021
    State:                      readwrite
    Group:                      HDD (1)
    Data allowed:               journal,btree,user
    Has data:                   (none)
    Replacement policy:         lru
    Discard:                    0
linuxserver /home/janpieter/bcachetools/bcachefs-tools #
@jpsollie
Copy link
Contributor Author

jpsollie commented Feb 4, 2021

running fsck manually did not give any result either:

bcachefs: bch2_fs_open()
bcachefs: bch2_read_super()
bcachefs: bch2_read_super() ret 0
bcachefs: bch2_read_super()
bcachefs: bch2_read_super() ret 0
bcachefs: bch2_read_super()
bcachefs: bch2_read_super() ret 0
bcachefs: bch2_read_super()
bcachefs: bch2_read_super() ret 0
bcachefs: bch2_read_super()
bcachefs: bch2_read_super() ret 0
bcachefs: bch2_fs_alloc()
bcachefs: bch2_fs_journal_init()
bcachefs: bch2_fs_journal_init() ret 0
bcachefs: bch2_fs_btree_cache_init()
bcachefs: bch2_fs_btree_cache_init() ret 0
bcachefs: bch2_fs_encryption_init()
bcachefs: bch2_fs_encryption_init() ret 0
bcachefs: __bch2_fs_compress_init()
bcachefs: __bch2_fs_compress_init() ret 0
bcachefs: bch2_dev_alloc()
bcachefs: bch2_dev_alloc() ret 0
bcachefs: bch2_dev_alloc()
bcachefs: bch2_dev_alloc() ret 0
bcachefs: bch2_dev_alloc()
bcachefs: bch2_dev_alloc() ret 0
bcachefs: bch2_dev_alloc()
bcachefs: bch2_dev_alloc() ret 0
bcachefs: bch2_dev_alloc()
bcachefs: bch2_dev_alloc() ret 0
bcachefs: bch2_fs_alloc() ret 0
journal read done, 4687295 keys in 532 entries, seq 1483845
starting alloc read
alloc read done
starting stripes_read
stripes_read done
starting mark and sweep
mark and sweep done
starting journal replay
Unable to allocate journal write
emergency read only
unable to write journal to sufficient devices
Error in recovery: journal replay failed (-5)
shutting down
flushing journal and stopping allocators

.... and bcachefs hangs (at 100% cpu usage) and eats 8GB of ram ...

@derlaft
Copy link

derlaft commented Feb 5, 2021

It seems it's the same for me:

# fsck.bcachefs /dev/mapper/DataNVMe /dev/mapper/DataSSD /dev/mapper/DataHDD /dev/mapper/DataHDD2
journal read done, 3815259 keys in 520 entries, seq 852630
dropped unflushed entries 852595-852629
starting mark and sweep
Unable to allocate journal write
emergency read only
unable to write journal to sufficient devices
Error in recovery: journal replay failed (-5)

dmesg also has similar errors.

If I can provide any useful info for debugging, please let me know.

@derlaft
Copy link

derlaft commented Feb 5, 2021

dmesg.txt

@jpsollie
Copy link
Contributor Author

jpsollie commented Feb 8, 2021

looks like temporaly reverting didn't work any longer:

bcachefs fsck -n /dev/disk/by-partlabel/nvmSecStorCache /dev/disk/by-partlabel/sashomecache /dev/disk/by-partlabel/satasec /dev/disk/by-id/scsi-2326581a600d00000
recovering from clean shutdown, journal seq 150175
journal read done, 0 keys in 1 entries, seq 150176
starting mark and sweep
starting fsck
mounted with opts: data_checksum=none,foreground_target=SSD,background_target=HDD,promote_target=SSD,degraded,fsck,fix_errors,nochanges
linuxserver /home/janpieter/bcachetools/bcachefs-tools # mount /data/securestorage
linuxserver /home/janpieter/bcachetools/bcachefs-tools # bcachefs fsck -n /dev/disk/by-partlabel/nvmBigDataCache /dev/disk/by-partlabel/sasbigdatacache /dev/disk/by-partlabel/sas2bigdatacache /dev/disk/by-partlabel/satabigdata /dev/disk/by-id/scsi-2b3e68cf400d00000
bcachefs: bch2_fs_open() bch_fs_open err opening /dev/disk/by-partlabel/nvmBigDataCache: Filesystem has incompatible features
error opening /dev/disk/by-partlabel/nvmBigDataCache: Invalid argument

... so I hope it could be fixed soon, as there is no way to read the data stored on it :s

@derlaft
Copy link

derlaft commented Feb 8, 2021

@jpsollie there is: I am able to mount fs with nochanges,norecovery,ro mount options. Data seems to be fine and I'm using the system (with temporary mounting overlayfs-tmpfs on top of ro-partition).

@derlaft
Copy link

derlaft commented Feb 10, 2021

The error slightly changes with the latest revision (mount -o fix_errors):

[  453.205036] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): journal read done, 3818720 keys in 557 entries, seq 852916
[  453.205038] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): dropped unflushed entries 852912-852915
[  460.197045] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): Unable to allocate journal write
[  460.197049] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): emergency read only
[  460.197051] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): unable to write journal to sufficient devices
[  464.536963] bcachefs (e0b92041-c725-48f6-9678-cbc8d9201b0d): Error in recovery: journal replay failed (-5)
[  613.548713] INFO: task kworker/3:0:2370 blocked for more than 122 seconds.
[  613.548715]       Not tainted 5.10.0bcachefs+ #22
[  613.548716] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  613.548717] task:kworker/3:0     state:D stack:    0 pid: 2370 ppid:     2 flags:0x00004000
[  613.548724] Workqueue: events_long bch2_fs_read_only_work
[  613.548725] Call Trace:
[  613.548731]  __schedule+0x295/0x810
[  613.548734]  schedule+0x5b/0xc0
[  613.548737]  rwsem_down_write_slowpath+0x265/0x4e0
[  613.548740]  bch2_fs_read_only_work+0x1b/0x30
[  613.548743]  process_one_work+0x1d6/0x3a0
[  613.548745]  worker_thread+0x4d/0x3d0
[  613.548747]  ? rescuer_thread+0x410/0x410
[  613.548749]  kthread+0x133/0x150
[  613.548751]  ? __kthread_bind_mask+0x60/0x60
[  613.548754]  ret_from_fork+0x1f/0x30
[  613.548757] INFO: task mount:2372 blocked for more than 122 seconds.
[  613.548758]       Not tainted 5.10.0bcachefs+ #22
[  613.548758] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  613.548759] task:mount           state:D stack:    0 pid: 2372 ppid:  2341 flags:0x00004000
[  613.548761] Call Trace:
[  613.548763]  __schedule+0x295/0x810
[  613.548765]  schedule+0x5b/0xc0
[  613.548767]  __closure_sync+0x9f/0x240
[  613.548769]  __bch2_fs_read_only+0x33a/0x410
[  613.548772]  ? irq_cpu_rmap_notify+0x20/0x20
[  613.548774]  bch2_fs_read_only+0x104/0x230
[  613.548776]  __bch2_fs_stop+0x46/0x2b0
[  613.548778]  bch2_fs_open+0x364/0x410
[  613.548781]  bch2_mount+0x477/0x600
[  613.548785]  legacy_get_tree+0x27/0x40
[  613.548788]  vfs_get_tree+0x25/0xb0
[  613.548790]  path_mount+0x469/0xb20
[  613.548792]  __x64_sys_mount+0x114/0x150
[  613.548794]  do_syscall_64+0x33/0x40
[  613.548796]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  613.548799] RIP: 0033:0x7fd7467143da
[  613.548800] RSP: 002b:00007fff14a84798 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  613.548802] RAX: ffffffffffffffda RBX: 00007fd746835f84 RCX: 00007fd7467143da
[  613.548803] RDX: 000055bf20c0f670 RSI: 000055bf20c11450 RDI: 000055bf20c0f6d0
[  613.548803] RBP: 000055bf20c0f440 R08: 000055bf20c0f690 R09: 0000000000000001
[  613.548804] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  613.548805] R13: 000055bf20c0f6d0 R14: 000055bf20c0f670 R15: 000055bf20c0f440
[  736.405004] INFO: task kworker/3:0:2370 blocked for more than 245 seconds.
[  736.405006]       Not tainted 5.10.0bcachefs+ #22
[  736.405006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  736.405007] task:kworker/3:0     state:D stack:    0 pid: 2370 ppid:     2 flags:0x00004000
[  736.405010] Workqueue: events_long bch2_fs_read_only_work
[  736.405011] Call Trace:
[  736.405013]  __schedule+0x295/0x810
[  736.405015]  schedule+0x5b/0xc0
[  736.405016]  rwsem_down_write_slowpath+0x265/0x4e0
[  736.405018]  bch2_fs_read_only_work+0x1b/0x30
[  736.405020]  process_one_work+0x1d6/0x3a0
[  736.405021]  worker_thread+0x4d/0x3d0
[  736.405022]  ? rescuer_thread+0x410/0x410
[  736.405023]  kthread+0x133/0x150
[  736.405024]  ? __kthread_bind_mask+0x60/0x60
[  736.405025]  ret_from_fork+0x1f/0x30
[  736.405027] INFO: task mount:2372 blocked for more than 245 seconds.
[  736.405028]       Not tainted 5.10.0bcachefs+ #22
[  736.405028] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  736.405028] task:mount           state:D stack:    0 pid: 2372 ppid:  2341 flags:0x00004000
[  736.405030] Call Trace:
[  736.405031]  __schedule+0x295/0x810
[  736.405032]  schedule+0x5b/0xc0
[  736.405033]  __closure_sync+0x9f/0x240
[  736.405034]  __bch2_fs_read_only+0x33a/0x410
[  736.405035]  ? irq_cpu_rmap_notify+0x20/0x20
[  736.405036]  bch2_fs_read_only+0x104/0x230
[  736.405038]  __bch2_fs_stop+0x46/0x2b0
[  736.405039]  bch2_fs_open+0x364/0x410
[  736.405040]  bch2_mount+0x477/0x600
[  736.405042]  legacy_get_tree+0x27/0x40
[  736.405043]  vfs_get_tree+0x25/0xb0
[  736.405044]  path_mount+0x469/0xb20
[  736.405046]  __x64_sys_mount+0x114/0x150
[  736.405047]  do_syscall_64+0x33/0x40
[  736.405048]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  736.405049] RIP: 0033:0x7fd7467143da
[  736.405050] RSP: 002b:00007fff14a84798 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  736.405051] RAX: ffffffffffffffda RBX: 00007fd746835f84 RCX: 00007fd7467143da
[  736.405051] RDX: 000055bf20c0f670 RSI: 000055bf20c11450 RDI: 000055bf20c0f6d0
[  736.405052] RBP: 000055bf20c0f440 R08: 000055bf20c0f690 R09: 0000000000000001
[  736.405052] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  736.405053] R13: 000055bf20c0f6d0 R14: 000055bf20c0f670 R15: 000055bf20c0f440
[  859.261323] INFO: task kworker/3:0:2370 blocked for more than 368 seconds.
[  859.261324]       Not tainted 5.10.0bcachefs+ #22
[  859.261325] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  859.261326] task:kworker/3:0     state:D stack:    0 pid: 2370 ppid:     2 flags:0x00004000
[  859.261328] Workqueue: events_long bch2_fs_read_only_work
[  859.261329] Call Trace:
[  859.261332]  __schedule+0x295/0x810
[  859.261333]  schedule+0x5b/0xc0
[  859.261334]  rwsem_down_write_slowpath+0x265/0x4e0
[  859.261336]  bch2_fs_read_only_work+0x1b/0x30
[  859.261337]  process_one_work+0x1d6/0x3a0
[  859.261339]  worker_thread+0x4d/0x3d0
[  859.261340]  ? rescuer_thread+0x410/0x410
[  859.261341]  kthread+0x133/0x150
[  859.261342]  ? __kthread_bind_mask+0x60/0x60
[  859.261343]  ret_from_fork+0x1f/0x30
[  859.261345] INFO: task mount:2372 blocked for more than 368 seconds.
[  859.261345]       Not tainted 5.10.0bcachefs+ #22
[  859.261346] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  859.261346] task:mount           state:D stack:    0 pid: 2372 ppid:  2341 flags:0x00004000
[  859.261347] Call Trace:
[  859.261349]  __schedule+0x295/0x810
[  859.261350]  schedule+0x5b/0xc0
[  859.261351]  __closure_sync+0x9f/0x240
[  859.261352]  __bch2_fs_read_only+0x33a/0x410
[  859.261353]  ? irq_cpu_rmap_notify+0x20/0x20
[  859.261354]  bch2_fs_read_only+0x104/0x230
[  859.261355]  __bch2_fs_stop+0x46/0x2b0
[  859.261356]  bch2_fs_open+0x364/0x410
[  859.261358]  bch2_mount+0x477/0x600
[  859.261360]  legacy_get_tree+0x27/0x40
[  859.261361]  vfs_get_tree+0x25/0xb0
[  859.261362]  path_mount+0x469/0xb20
[  859.261363]  __x64_sys_mount+0x114/0x150
[  859.261364]  do_syscall_64+0x33/0x40
[  859.261366]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  859.261367] RIP: 0033:0x7fd7467143da
[  859.261367] RSP: 002b:00007fff14a84798 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  859.261368] RAX: ffffffffffffffda RBX: 00007fd746835f84 RCX: 00007fd7467143da
[  859.261369] RDX: 000055bf20c0f670 RSI: 000055bf20c11450 RDI: 000055bf20c0f6d0
[  859.261369] RBP: 000055bf20c0f440 R08: 000055bf20c0f690 R09: 0000000000000001
[  859.261370] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  859.261371] R13: 000055bf20c0f6d0 R14: 000055bf20c0f670 R15: 000055bf20c0f440
[  982.117316] INFO: task kworker/3:0:2370 blocked for more than 491 seconds.
[  982.117318]       Not tainted 5.10.0bcachefs+ #22
[  982.117319] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.117320] task:kworker/3:0     state:D stack:    0 pid: 2370 ppid:     2 flags:0x00004000
[  982.117324] Workqueue: events_long bch2_fs_read_only_work
[  982.117325] Call Trace:
[  982.117329]  __schedule+0x295/0x810
[  982.117331]  schedule+0x5b/0xc0
[  982.117333]  rwsem_down_write_slowpath+0x265/0x4e0
[  982.117334]  bch2_fs_read_only_work+0x1b/0x30
[  982.117336]  process_one_work+0x1d6/0x3a0
[  982.117338]  worker_thread+0x4d/0x3d0
[  982.117339]  ? rescuer_thread+0x410/0x410
[  982.117341]  kthread+0x133/0x150
[  982.117342]  ? __kthread_bind_mask+0x60/0x60
[  982.117344]  ret_from_fork+0x1f/0x30
[  982.117346] INFO: task mount:2372 blocked for more than 491 seconds.
[  982.117346]       Not tainted 5.10.0bcachefs+ #22
[  982.117347] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  982.117347] task:mount           state:D stack:    0 pid: 2372 ppid:  2341 flags:0x00004000
[  982.117349] Call Trace:
[  982.117350]  __schedule+0x295/0x810
[  982.117351]  schedule+0x5b/0xc0
[  982.117353]  __closure_sync+0x9f/0x240
[  982.117354]  __bch2_fs_read_only+0x33a/0x410
[  982.117356]  ? irq_cpu_rmap_notify+0x20/0x20
[  982.117357]  bch2_fs_read_only+0x104/0x230
[  982.117358]  __bch2_fs_stop+0x46/0x2b0
[  982.117359]  bch2_fs_open+0x364/0x410
[  982.117362]  bch2_mount+0x477/0x600
[  982.117364]  legacy_get_tree+0x27/0x40
[  982.117366]  vfs_get_tree+0x25/0xb0
[  982.117367]  path_mount+0x469/0xb20
[  982.117369]  __x64_sys_mount+0x114/0x150
[  982.117370]  do_syscall_64+0x33/0x40
[  982.117372]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  982.117373] RIP: 0033:0x7fd7467143da
[  982.117374] RSP: 002b:00007fff14a84798 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[  982.117375] RAX: ffffffffffffffda RBX: 00007fd746835f84 RCX: 00007fd7467143da
[  982.117376] RDX: 000055bf20c0f670 RSI: 000055bf20c11450 RDI: 000055bf20c0f6d0
[  982.117376] RBP: 000055bf20c0f440 R08: 000055bf20c0f690 R09: 0000000000000001
[  982.117377] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[  982.117377] R13: 000055bf20c0f6d0 R14: 000055bf20c0f670 R15: 000055bf20c0f440
[ 1104.997397] INFO: task kworker/3:0:2370 blocked for more than 614 seconds.
[ 1104.997399]       Not tainted 5.10.0bcachefs+ #22
[ 1104.997399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1104.997400] task:kworker/3:0     state:D stack:    0 pid: 2370 ppid:     2 flags:0x00004000
[ 1104.997402] Workqueue: events_long bch2_fs_read_only_work
[ 1104.997403] Call Trace:
[ 1104.997405]  __schedule+0x295/0x810
[ 1104.997407]  schedule+0x5b/0xc0
[ 1104.997408]  rwsem_down_write_slowpath+0x265/0x4e0
[ 1104.997410]  bch2_fs_read_only_work+0x1b/0x30
[ 1104.997412]  process_one_work+0x1d6/0x3a0
[ 1104.997413]  worker_thread+0x4d/0x3d0
[ 1104.997414]  ? rescuer_thread+0x410/0x410
[ 1104.997415]  kthread+0x133/0x150
[ 1104.997416]  ? __kthread_bind_mask+0x60/0x60
[ 1104.997418]  ret_from_fork+0x1f/0x30
[ 1104.997419] INFO: task mount:2372 blocked for more than 614 seconds.
[ 1104.997420]       Not tainted 5.10.0bcachefs+ #22
[ 1104.997420] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1104.997421] task:mount           state:D stack:    0 pid: 2372 ppid:  2341 flags:0x00004000
[ 1104.997422] Call Trace:
[ 1104.997423]  __schedule+0x295/0x810
[ 1104.997424]  schedule+0x5b/0xc0
[ 1104.997425]  __closure_sync+0x9f/0x240
[ 1104.997426]  __bch2_fs_read_only+0x33a/0x410
[ 1104.997427]  ? irq_cpu_rmap_notify+0x20/0x20
[ 1104.997428]  bch2_fs_read_only+0x104/0x230
[ 1104.997430]  __bch2_fs_stop+0x46/0x2b0
[ 1104.997431]  bch2_fs_open+0x364/0x410
[ 1104.997432]  bch2_mount+0x477/0x600
[ 1104.997434]  legacy_get_tree+0x27/0x40
[ 1104.997435]  vfs_get_tree+0x25/0xb0
[ 1104.997436]  path_mount+0x469/0xb20
[ 1104.997438]  __x64_sys_mount+0x114/0x150
[ 1104.997439]  do_syscall_64+0x33/0x40
[ 1104.997440]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1104.997441] RIP: 0033:0x7fd7467143da
[ 1104.997442] RSP: 002b:00007fff14a84798 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5
[ 1104.997443] RAX: ffffffffffffffda RBX: 00007fd746835f84 RCX: 00007fd7467143da
[ 1104.997443] RDX: 000055bf20c0f670 RSI: 000055bf20c11450 RDI: 000055bf20c0f6d0
[ 1104.997444] RBP: 000055bf20c0f440 R08: 000055bf20c0f690 R09: 0000000000000001
[ 1104.997444] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[ 1104.997445] R13: 000055bf20c0f6d0 R14: 000055bf20c0f670 R15: 000055bf20c0f440

@jpsollie
Copy link
Contributor Author

Thanks to the the latest updates I managed to push a non-damaged bcachefs to the latest version:
First I mounted using nochanges,norecovery, and I got this:

[17641.079597] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): alloc_v2 feature bit not set, fsck required
[17649.721046] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): journal read done, 0 keys in 1 entries, seq 17036
[17649.840187] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): starting mark and sweep
[17650.675164] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong buckets_unavailable: got 0, should be 908, fixing
[17650.675167] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[17650.675168] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[17650.675170] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[17650.675171] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 774, fixing
[17650.675172] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 396288, fixing
[17650.675173] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree buckets: got 0, should be 133, fixing
[17650.675174] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree sectors: got 0, should be 68096, fixing
[17650.675175] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached buckets: got 0, should be 46664, fixing
[17650.675176] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached sectors: got 0, should be 23891968, fixing
[17650.675589] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 339, fixing
[17650.675590] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[17650.675591] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[17650.675592] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[17650.675593] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 298, fixing
[17650.675593] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 152576, fixing
[17650.675594] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree buckets: got 0, should be 40, fixing
[17650.675595] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree sectors: got 0, should be 20480, fixing
[17650.675596] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached buckets: got 0, should be 64669, fixing
[17650.675597] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached sectors: got 0, should be 33110528, fixing
[17650.676009] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 347, fixing
[17650.676010] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[17650.676010] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[17650.676011] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[17650.676012] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 298, fixing
[17650.676013] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): Ratelimiting new instances of previous error
[17650.676013] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 152576, fixing
[17650.676014] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): Ratelimiting new instances of previous error
[17650.676015] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong user fragmented: got 0, should be 298, fixing
[17650.697001] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 75703, fixing
[17650.697004] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 760, fixing
[17650.697006] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong user fragmented: got 0, should be 298, fixing
[17650.697011] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): fs has wrong hidden: got 0, should be 1752576, fixing
[17650.698490] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): mounted with opts: foreground_target=SSD,background_target=HDD,promote_target=SSD,fsck,fix_errors,nochanges,norecovery

And then I mounted without nochanges,norecovery (and let fsck do its job), and I saw this:

[19838.445677] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): alloc_v2 feature bit not set, fsck required
[19847.134171] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): journal read done, 0 keys in 1 entries, seq 17036
[19847.215688] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): starting mark and sweep
[19848.040722] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong buckets_unavailable: got 0, should be 908, fixing
[19848.040726] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[19848.040727] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[19848.040729] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[19848.040730] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 774, fixing
[19848.040731] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 396288, fixing
[19848.040732] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree buckets: got 0, should be 133, fixing
[19848.040733] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree sectors: got 0, should be 68096, fixing
[19848.040734] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached buckets: got 0, should be 46664, fixing
[19848.040734] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached sectors: got 0, should be 23891968, fixing
[19848.041155] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 339, fixing
[19848.041156] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[19848.041156] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[19848.041157] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[19848.041158] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 298, fixing
[19848.041159] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 152576, fixing
[19848.041160] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree buckets: got 0, should be 40, fixing
[19848.041161] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 3 has wrong btree sectors: got 0, should be 20480, fixing
[19848.041162] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached buckets: got 0, should be 64669, fixing
[19848.041162] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 5 has wrong cached sectors: got 0, should be 33110528, fixing
[19848.041578] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 347, fixing
[19848.041579] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[19848.041580] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[19848.041581] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[19848.041582] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal buckets: got 0, should be 298, fixing
[19848.041582] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): Ratelimiting new instances of previous error
[19848.041583] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 2 has wrong journal sectors: got 0, should be 152576, fixing
[19848.041583] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): Ratelimiting new instances of previous error
[19848.041585] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong user fragmented: got 0, should be 298, fixing
[19848.063488] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 7 has wrong buckets_unavailable: got 0, should be 75703, fixing
[19848.063492] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 1 has wrong sb fragmented: got 0, should be 760, fixing
[19848.063494] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): dev 4 has wrong user fragmented: got 0, should be 298, fixing
[19848.063500] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): fs has wrong hidden: got 0, should be 1752576, fixing
[19851.058748] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): starting fsck
[19851.155913] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): mounted with opts: foreground_target=SSD,background_target=HDD,promote_target=SSD,fsck,fix_errors

so now the 3rd time:

[20099.646876] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): recovering from clean shutdown, journal seq 17092
[20099.682977] bcachefs (b7019c0a-694a-4fe0-8397-29b91d16a3ee): mounted with opts: foreground_target=SSD,background_target=HDD,promote_target=SSD,fix_errors

...and I guess it's working now!
The only sad thing is that users who migrated before the latest patches will probably have to reformat their drives.

@derlaft
Copy link

derlaft commented Feb 10, 2021

-o fsck,fix_errors did not fix anything for me

@derlaft
Copy link

derlaft commented Feb 10, 2021

Thanks to the the latest updates I managed to push a non-damaged bcachefs to the latest version:

Ah, I now get it: another undamaged disk.

Anyway, I've recreated my filesystem. Let the fun continue :)

What is curious is that it was pretty badly damaged: it refused to mount in degraded without any single disk.

@jpsollie
Copy link
Contributor Author

Guess the first versions actually destroyed your journal and fsck is not capable of recovering it.
Maybe we should ask koverstreet to make a warning when he tries to add some new feature that could screw up the FS ^^

@derlaft
Copy link

derlaft commented Feb 11, 2021

I think that's something an automated CI can do:

  • Just store last N filesystem dumps, generated and mounted by different bcachefs revisions
  • With each new commit, try to mount, upgrade and verify data on each of them

@Omar007
Copy link

Omar007 commented Feb 12, 2021

I'm not sure it's that screwed up/destroyed tbh. The journal read seems to be fine and I can still mount and access the data by just adding norecovery. Only when it tries to replay the journal (when I leave norecovery out) does it end up hanging somewhere.

Feb 12 16:06:53 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): journal read done, 22129308 keys in 2761 entries, seq 5196301
Feb 12 16:07:20 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): starting alloc read
Feb 12 16:07:34 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): alloc read done
Feb 12 16:07:34 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): starting stripes_read
Feb 12 16:07:34 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): stripes_read done
Feb 12 16:07:34 abydos kernel: bcachefs (93435cc4-d25e-4dcd-8803-f63350255a39): starting journal replay
Feb 12 16:11:36 abydos kernel: INFO: task kworker/u16:17:5226 blocked for more than 122 seconds.
Feb 12 16:11:36 abydos kernel:       Not tainted 5.10.15-arch1-1-bcachefs-git #1
Feb 12 16:11:36 abydos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 12 16:11:36 abydos kernel: task:kworker/u16:17  state:D stack:    0 pid: 5226 ppid:     2 flags:0x00004080
Feb 12 16:11:36 abydos kernel: Workqueue: btree_update btree_interior_update_work [bcachefs]
Feb 12 16:11:36 abydos kernel: Call Trace:
Feb 12 16:11:36 abydos kernel:  __schedule+0x295/0x810
Feb 12 16:11:36 abydos kernel:  schedule+0x5b/0xc0
Feb 12 16:11:36 abydos kernel:  schedule_preempt_disabled+0x11/0x20
Feb 12 16:11:36 abydos kernel:  __mutex_lock.constprop.0+0x17d/0x4f0
Feb 12 16:11:36 abydos kernel:  ? bch2_trans_mark_extent+0x186/0x440 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_trans_commit_error+0x52/0x2c0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  __bch2_trans_commit+0x34d/0x1d10 [bcachefs]
Feb 12 16:11:36 abydos kernel:  btree_update_nodes_written+0x1b1/0x710 [bcachefs]
Feb 12 16:11:36 abydos kernel:  ? btree_update_nodes_written+0x8e/0x710 [bcachefs]
Feb 12 16:11:36 abydos kernel:  btree_interior_update_work+0x51/0x60 [bcachefs]
Feb 12 16:11:36 abydos kernel:  process_one_work+0x1d6/0x3a0
Feb 12 16:11:36 abydos kernel:  worker_thread+0x4d/0x3d0
Feb 12 16:11:36 abydos kernel:  ? rescuer_thread+0x3c0/0x3c0
Feb 12 16:11:36 abydos kernel:  kthread+0x133/0x150
Feb 12 16:11:36 abydos kernel:  ? __kthread_bind_mask+0x60/0x60
Feb 12 16:11:36 abydos kernel:  ret_from_fork+0x22/0x30
Feb 12 16:11:36 abydos kernel: INFO: task mount:5303 blocked for more than 122 seconds.
Feb 12 16:11:36 abydos kernel:       Not tainted 5.10.15-arch1-1-bcachefs-git #1
Feb 12 16:11:36 abydos kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 12 16:11:36 abydos kernel: task:mount           state:D stack:    0 pid: 5303 ppid:  4526 flags:0x00004080
Feb 12 16:11:36 abydos kernel: Call Trace:
Feb 12 16:11:36 abydos kernel:  __schedule+0x295/0x810
Feb 12 16:11:36 abydos kernel:  schedule+0x5b/0xc0
Feb 12 16:11:36 abydos kernel:  __closure_sync+0x9f/0x240
Feb 12 16:11:36 abydos kernel:  bch2_btree_split_leaf+0x175/0x310 [bcachefs]
Feb 12 16:11:36 abydos kernel:  ? irq_cpu_rmap_notify+0x20/0x20
Feb 12 16:11:36 abydos kernel:  bch2_trans_commit_error+0xec/0x2c0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  __bch2_trans_commit+0x34d/0x1d10 [bcachefs]
Feb 12 16:11:36 abydos kernel:  ? bch2_trans_update+0x1da/0x3e0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  btree_key_cache_flush_pos+0x1c9/0x340 [bcachefs]
Feb 12 16:11:36 abydos kernel:  btree_key_cache_journal_flush+0x107/0x130 [bcachefs]
Feb 12 16:11:36 abydos kernel:  ? btree_key_cache_journal_flush+0xee/0x130 [bcachefs]
Feb 12 16:11:36 abydos kernel:  journal_flush_pins+0x13d/0x190 [bcachefs]
Feb 12 16:11:36 abydos kernel:  journal_flush_done+0x4b/0xe0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_journal_flush_pins+0x71/0x110 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_fs_recovery+0xba2/0xfb0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_fs_start+0x211/0x4e0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_fs_open+0x360/0x420 [bcachefs]
Feb 12 16:11:36 abydos kernel:  bch2_mount+0x453/0x5e0 [bcachefs]
Feb 12 16:11:36 abydos kernel:  legacy_get_tree+0x27/0x40
Feb 12 16:11:36 abydos kernel:  vfs_get_tree+0x25/0xb0
Feb 12 16:11:36 abydos kernel:  path_mount+0x469/0xb20
Feb 12 16:11:36 abydos kernel:  __x64_sys_mount+0x114/0x150
Feb 12 16:11:36 abydos kernel:  do_syscall_64+0x33/0x40
Feb 12 16:11:36 abydos kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Feb 12 16:11:36 abydos kernel: RIP: 0033:0x7fbe7122b01e
Feb 12 16:11:36 abydos kernel: RSP: 002b:00007fff5c30b718 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
Feb 12 16:11:36 abydos kernel: RAX: ffffffffffffffda RBX: 00007fbe71354264 RCX: 00007fbe7122b01e
Feb 12 16:11:36 abydos kernel: RDX: 000055f7d67d7700 RSI: 000055f7d67d77c0 RDI: 000055f7d67d7720
Feb 12 16:11:36 abydos kernel: RBP: 000055f7d67d7440 R08: 000055f7d67d76a0 R09: 00007fbe712eca60
Feb 12 16:11:36 abydos kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Feb 12 16:11:36 abydos kernel: R13: 000055f7d67d7720 R14: 000055f7d67d7700 R15: 000055f7d67d7440

This output was from a run with the most recent commit; 641f517

@littlewu2508
Copy link

littlewu2508 commented Mar 13, 2021

On my machine 79847e4 works fine but 8a8844c have trouble adding/removing decvices.

when normal mounting (no option specified), I have these kernel messages:

[   14.804366] bcachefs (4a497448-3617-495a-9e48-a0e070698308): recovering from clean shutdown, journal seq 23
[   14.804367] bcachefs (4a497448-3617-495a-9e48-a0e070698308): alloc_v2 feature bit not set, fsck required
[   21.327108] bcachefs (4a497448-3617-495a-9e48-a0e070698308): journal read done, 0 keys in 1 entries, seq 24
[   21.370134] bcachefs (4a497448-3617-495a-9e48-a0e070698308): starting mark and sweep
[   21.567515] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 4 has wrong buckets_unavailable: got 0, should be 1036, fixing
[   21.567517] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[   21.567518] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[   21.567519] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb fragmented: got 0, should be 248, fixing
[   21.567519] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal buckets: got 0, should be 1024, fixing
[   21.567520] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal sectors: got 0, should be 524288, fixing
[   21.567520] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 3 has wrong btree buckets: got 0, should be 11, fixing
[   21.567520] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 3 has wrong btree sectors: got 0, should be 5632, fixing
[   21.611517] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 7 has wrong buckets_unavailable: got 0, should be 1036, fixing
[   21.611518] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[   21.611519] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[   21.611520] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb fragmented: got 0, should be 760, fixing
[   21.611520] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal buckets: got 0, should be 1024, fixing
[   21.611521] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal sectors: got 0, should be 1048576, fixing
[   21.611521] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 3 has wrong btree buckets: got 0, should be 11, fixing
[   21.611521] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 3 has wrong btree sectors: got 0, should be 5632, fixing
[   21.611522] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 3 has wrong btree fragmented: got 0, should be 5632, fixing
[   21.658989] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 7 has wrong buckets_unavailable: got 0, should be 1025, fixing
[   21.658992] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb buckets: got 0, should be 1, fixing
[   21.658992] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb sectors: got 0, should be 264, fixing
[   21.658993] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 1 has wrong sb fragmented: got 0, should be 760, fixing
[   21.658994] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal buckets: got 0, should be 1024, fixing
[   21.658994] bcachefs (4a497448-3617-495a-9e48-a0e070698308): dev 2 has wrong journal sectors: got 0, should be 1048576, fixing
[   21.658997] bcachefs (4a497448-3617-495a-9e48-a0e070698308): fs has wrong hidden: got 0, should be 2624000, fixing
[   25.265379] bcachefs (4a497448-3617-495a-9e48-a0e070698308): starting fsck
[   25.266771] bcachefs (4a497448-3617-495a-9e48-a0e070698308): mounted with opts: metadata_replicas=2,foreground_target=ssd,background_target=hdd,promote_target=ssd,fsck,fix_errors

when trying to add another device, BCH_IOCTL_DISK_ADD ioctl error: Invalid argument occurs (same as #197 with zombie disks left) with the following messages :

[  359.500599] bucket 3:0 gen 0 different types of data in same bucket: sb, journal
               while marking journal
[  359.500602] bcachefs (4a497448-3617-495a-9e48-a0e070698308): emergency read only
[  359.500603] bcachefs (4a497448-3617-495a-9e48-a0e070698308): Error going rw after adding device: error marking superblock

This situation still exsits NOW (e49b2fc, the current HEAD)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants