Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Remove failed" error when removing failed device #322

Open
Hamled opened this issue Oct 11, 2021 · 1 comment
Open

"Remove failed" error when removing failed device #322

Hamled opened this issue Oct 11, 2021 · 1 comment
Labels
new-version-testing Need to test with newer version. v18 versions 18 and below.

Comments

@Hamled
Copy link

Hamled commented Oct 11, 2021

uname -a
Linux malu 5.13.19-1-bcachefs-git-354636-gc85e27c45512 #1 SMP PREEMPT Thu, 07 Oct 2021 22:20:07 +0000 x86_64 GNU/Linux
bcachefs version
bcachefs tool version v0.1-366-g3785043

As explained in #320 I have a bcachefs filesystem with multiple devices in which one of the devices has failed. I've attempted to remove it with bcachefs device remove and previously encountered a kernel bug which has been fixed as of c85e27c.

However now when I attempt to remove the device, I am encountering other errors (including at one point a kernel bug when I tried to unmount the filesystem after the device failed to be removed from a previous command).

In particular, I did the following:

  • bcachefs unlock <device path>
  • mount -t bcachefs -o rw,very_degraded <devices> <mount point>
    This seems to work as expected:
    [  164.658939] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): journal read done, 0 keys in 1 entries, seq 47303529
    [  184.191998] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): going read-write
    [  184.352568] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): mounted with opts: metadata_replicas=2,data_replicas=2,foreground_target=nvme,background_target=hdd,promote_target=nvme,noinodes_32bit,noshard_inode_numbers,noinodes_use_key_cache,very_degraded
    
  • bcachefs device remove -f 2 <mount point> (device 2 is the one that is missing, according to bcachefs fs usage)
    This results in the following errors in the kernel log:
    [  227.772439] bcachefs (dev-2): btree write error: device removed
    [  227.783574] bcachefs (dev-2): btree write error: device removed
    [  228.603379] bcachefs (dev-2): btree write error: device removed
    [  228.764879] bcachefs (dev-2): btree write error: device removed
    [  228.788595] bcachefs (dev-2): btree write error: device removed
    [  228.794426] bcachefs (dev-2): btree write error: device removed
    [  228.838876] bcachefs (dev-2): btree write error: device removed
    [  228.853879] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): invalid bkey u64s 5 type deleted 4346:8:U32_MAX len 8 ver 73580037 on insert from __bch2_dev_usrdata_drop [bcachefs] -> __bch2_dev_usrdata_drop [bcachefs]: nonzero size field
    [  228.859658] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): emergency read only
    [  228.861161] bcachefs (dev-2): Remove failed: error -22 dropping data
    [  228.861166] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): fatal error writing btree node
    
  • bcachefs device remove -f /dev/sda1 (This is another device which is not failed, but which I wanted to remove so that I could use it to transfer data from the bcachefs filesystem. I think the error is simply because there is still some btree data on it.)
    [  253.441159] bcachefs (ef23d749-eb29-4010-a34a-175a9cce6969): Error updating btree node key: -30
    [  253.443456] bcachefs (sda1): Remove failed: error -30 dropping data
    
  • umount <mount point>
    This caused a kernel bug:
    [  273.522498] BUG: kernel NULL pointer dereference, address: 0000000000000018
    [  273.525821] #PF: supervisor read access in kernel mode
    [  273.528624] #PF: error_code(0x0000) - not-present page
    [  273.531352] PGD 0 P4D 0
    [  273.533987] Oops: 0000 [#1] PREEMPT SMP NOPTI
    [  273.536609] CPU: 0 PID: 1171 Comm: umount Tainted: G           OE     5.13.19-1-bcachefs-git-354636-gc85e27c45512 #1
    [  273.539291] Hardware name: System manufacturer System Product Name/PRIME X470-PRO, BIOS 5406 11/13/2019
    [  273.541989] RIP: 0010:bch2_journal_space_available+0x184/0x480 [bcachefs]
    [  273.544715] Code: f9 74 2d 4c 8b 96 e8 0b 00 00 eb 15 8d 47 01 31 d2 41 f7 f0 89 d7 89 96 fc 0b 00 00 44 39 ca 74 0f 89 f8 49 8b 95 58 13 00 00 <49> 39 14 c2 72 dc 8b 96 f8 0b 00 00 39 fa 74 2c 4c 8b 8e e8 0b 00
    [  273.550273] RSP: 0018:ffffb7e08ffa7d48 EFLAGS: 00010202
    [  273.553032] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000007
    [  273.555788] RDX: 0000000002d1cb81 RSI: ffff9ee7dfcf1000 RDI: 0000000000000003
    [  273.558523] RBP: 0000000000000000 R08: 0000000000002000 R09: 0000000000000013
    [  273.561245] R10: 0000000000000000 R11: ffff9ee89ec0b100 R12: 0000000000000001
    [  273.563944] R13: ffff9ee7de414468 R14: ffff9ee7de403980 R15: ffff9ee928134800
    [  273.566644] FS:  00007f465414d740(0000) GS:ffff9ef6bea00000(0000) knlGS:0000000000000000
    [  273.569338] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  273.571999] CR2: 0000000000000018 CR3: 000000047735c000 CR4: 0000000000350ef0
    [  273.574669] Call Trace:
    [  273.577284]  bch2_journal_pin_drop+0x124/0x130 [bcachefs]
    [  273.579914]  bch2_fs_btree_cache_exit+0x2df/0x340 [bcachefs]
    [  273.582515]  bch2_fs_release+0x8e/0x2a0 [bcachefs]
    [  273.585083]  kobject_put+0x86/0x1d0
    [  273.587582]  deactivate_locked_super+0x36/0xa0
    [  273.590048]  cleanup_mnt+0x131/0x190
    [  273.592487]  task_work_run+0x5c/0x90
    [  273.594853]  exit_to_user_mode_prepare+0x16b/0x170
    [  273.597163]  syscall_exit_to_user_mode+0x23/0x50
    [  273.599453]  do_syscall_64+0x6e/0x80
    [  273.601713]  ? exc_page_fault+0x78/0x180
    [  273.603953]  entry_SYSCALL_64_after_hwframe+0x44/0xae
    [  273.606174] RIP: 0033:0x7f46542d261b
    [  273.608403] Code: 18 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 90 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8 a6 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 21 18 0c 00 f7 d8
    [  273.613046] RSP: 002b:00007fff46086748 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
    [  273.615369] RAX: 0000000000000000 RBX: 00007f46543ff264 RCX: 00007f46542d261b
    [  273.617666] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055aa9bd71e10
    [  273.619915] RBP: 000055aa9bd6d580 R08: 0000000000000000 R09: 00007fff460854c0
    [  273.622147] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [  273.624307] R13: 000055aa9bd71e10 R14: 000055aa9bd6d690 R15: 000055aa9bd71dd0
    [  273.626404] Modules linked in: poly1305_generic libpoly1305 poly1305_x86_64 chacha_generic chacha_x86_64 libchacha xt_MASQUERADE xt_conntrack xt_CHECKSUM ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bridge stp llc 88XXau(OE) snd_usb_audio snd_usbmidi_lib snd_rawmidi snd_seq_device cfg80211 mc intel_rapl_msr joydev mousedev intel_rapl_common edac_mce_amd snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel amdgpu nouveau kvm snd_intel_dspcfg gpu_sched snd_intel_sdw_acpi drm_ttm_helper snd_hda_codec ttm crct10dif_pclmul snd_hda_core crc32_pclmul drm_kms_helper ghash_clmulni_intel nls_iso8859_1 snd_hwdep snd_pcm aesni_intel snd_timer vfat cec fat igb crypto_simd sp5100_tco syscopyarea cryptd usbhid rapl pcspkr k10temp ccp snd i2c_algo_bit sysfillrect i2c_piix4 rng_core i2c_nvidia_gpu sysimgblt fb_sys_fops
    [  273.626443]  soundcore dca gpio_amdpt gpio_generic pinctrl_amd mac_hid acpi_cpufreq eeepc_wmi asus_wmi sparse_keymap rfkill video wmi_bmof mxm_wmi vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) sg drm asus_wmi_sensors(OE) wmi fuse agpgart bpf_preload ip_tables x_tables ext4 crc16 mbcache jbd2 uas usb_storage xhci_pci xhci_pci_renesas bcachefs libcrc32c crc32c_generic crc32c_intel xor crc64 raid6_pq vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio
    [  273.649848] CR2: 0000000000000018
    [  273.652394] ---[ end trace b0767c5729e86981 ]---
    [  273.654889] RIP: 0010:bch2_journal_space_available+0x184/0x480 [bcachefs]
    [  273.657367] Code: f9 74 2d 4c 8b 96 e8 0b 00 00 eb 15 8d 47 01 31 d2 41 f7 f0 89 d7 89 96 fc 0b 00 00 44 39 ca 74 0f 89 f8 49 8b 95 58 13 00 00 <49> 39 14 c2 72 dc 8b 96 f8 0b 00 00 39 fa 74 2c 4c 8b 8e e8 0b 00
    [  273.662492] RSP: 0018:ffffb7e08ffa7d48 EFLAGS: 00010202
    [  273.665059] RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000007
    [  273.667656] RDX: 0000000002d1cb81 RSI: ffff9ee7dfcf1000 RDI: 0000000000000003
    [  273.670259] RBP: 0000000000000000 R08: 0000000000002000 R09: 0000000000000013
    [  273.672851] R10: 0000000000000000 R11: ffff9ee89ec0b100 R12: 0000000000000001
    [  273.675407] R13: ffff9ee7de414468 R14: ffff9ee7de403980 R15: ffff9ee928134800
    [  273.677956] FS:  00007f465414d740(0000) GS:ffff9ef6bea00000(0000) knlGS:0000000000000000
    [  273.680526] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  273.683075] CR2: 0000000000000018 CR3: 000000047735c000 CR4: 0000000000350ef0
    [  273.685644] note: umount[1171] exited with preempt_count 1
    
@koverstreet
Copy link
Owner

I pushed some fixes - can you retest and see if that did it?

@YellowOnion YellowOnion added new-version-testing Need to test with newer version. v18 versions 18 and below. labels Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-version-testing Need to test with newer version. v18 versions 18 and below.
Projects
None yet
Development

No branches or pull requests

3 participants