Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keys must be loaded to remove top-level vdev #10939

Closed
geudrik opened this issue Sep 16, 2020 · 6 comments
Closed

keys must be loaded to remove top-level vdev #10939

geudrik opened this issue Sep 16, 2020 · 6 comments
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@geudrik
Copy link

geudrik commented Sep 16, 2020

System information

Type Linux
Distribution Name Ubuntu
Distribution Version 20.04
Linux Kernel Linux valkyrie 5.4.0-47-generic #51-Ubuntu SMP Fri Sep 4 19:50:52 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture x86-64
ZFS Version 0.8.3-1ubuntu12.2
SPL Version 0.8.3-1ubuntu12.2

Describe the problem you're observing

I have a pool of mirrored drives. I am unable to remove any of the mirrors due to zpool reporting "permission denied". Yes, I am running the remove as root.

root@valkyrie:/tmp# zpool remove -n tank mirror-2
Memory that will be used after removing mirror-2: 9.27M
root@valkyrie:/tmp# 
root@valkyrie:/tmp# zpool remove tank mirror-2
cannot remove mirror-2: permission denied
root@valkyrie:/tmp# 

Describe how to reproduce the problem

Honestly, I have no idea. I created a pool with mirrors and removed them successfully prior to creating this pool. All ashifts are 12 (and it would puke a different error anyway)

image

The vdevs all have matched disks within themselves (same make and model per vdev), but vdevs are a 3, 4, and 8 TB in size. All ashifts were set to 12 when the mirrors were added to the pool. They're all WD Reds. It's a dual socket SuperMicro X10DRi-T4+ with 2x E5-2640 v3 and 128gigs of ram (ddr4 @2133)

Include any warning/errors/backtraces from the system logs

Nothing from the kernel, just zfs refusing to listen to the bigger hammer 😢

What can I do to help further debug this issue?

@geudrik geudrik added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels Sep 16, 2020
@ahrens
Copy link
Member

ahrens commented Sep 17, 2020

You could start by checking what error code in being returned, with strace zpool remove .... I'm guessing EPERM. Then you could see where zfs is doing SET_ERROR() with that code, by either using bpftrace on __set_error(), or doing echo 512 >/sys/module/zfs/parameters/zfs_flags, then trying the removal and then checking /proc/spl/kstat/zfs/dbgmsg.

@geudrik
Copy link
Author

geudrik commented Sep 17, 2020

stat("/sys/module/zfs/features.pool/org.illumos:edonr", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/com.delphix:device_removal", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/com.delphix:obsolete_counts", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/org.zfsonlinux:userobj_accounting", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/com.datto:bookmark_v2", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/com.datto:encryption", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/org.zfsonlinux:project_quota", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/org.zfsonlinux:allocation_classes", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
stat("/sys/module/zfs/features.pool/com.datto:resilver_defer", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
ioctl(3, _IOC(_IOC_NONE, 0x5a, 0x5, 0), 0x7ffe4e4b4630) = -1 ENOMEM (Cannot allocate memory)
mmap(NULL, 319488, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fe74ef4a000
ioctl(3, _IOC(_IOC_NONE, 0x5a, 0x5, 0), 0x7ffe4e4b4630) = 0
brk(0x563e01ffc000)                     = 0x563e01ffc000
brk(0x563e0201d000)                     = 0x563e0201d000
brk(0x563e0203e000)                     = 0x563e0203e000
munmap(0x7fe74ef4a000, 319488)          = 0
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 6
fstat(6, {st_mode=S_IFREG|0644, st_size=2996, ...}) = 0
read(6, "# Locale name alias data base.\n#"..., 4096) = 2996
read(6, "", 4096)                       = 0
close(6)                                = 0
openat(AT_FDCWD, "/usr/share/locale/en_US.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en.UTF-8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en.utf8/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/zfs-linux-user.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
brk(0x563e02085000)                     = 0x563e02085000
ioctl(3, _IOC(_IOC_NONE, 0x5a, 0x27, 0), 0x7ffe4e4b0c40) = 0
ioctl(3, _IOC(_IOC_NONE, 0x5a, 0xc, 0), 0x7ffe4e4b4280) = -1 EACCES (Permission denied)
write(2, "cannot remove mirror-2: permissi"..., 42cannot remove mirror-2: permission denied
) = 42
close(3)                                = 0
close(4)                                = 0
close(5)                                = 0
exit_group(1)                           = ?
+++ exited with 1 +++

Looks like EACCES, but there's also a confusing (because I truthfully am not at all experienced with strace output) cannot allocate memory line (included above).

Echoing 512 and looking at kstat gives me the following

1600349987   spa_history.c:305:spa_history_log_sync(): command: zfs destroy tank/backups/icehouse@autosnap_2020-09-16_16:00:05_hourly
1600350154   spa.c:7592:spa_async_request(): spa=tank async request task=256
1600350154   spa.c:7592:spa_async_request(): spa=tank async request task=512
1600350154   spa.c:7592:spa_async_request(): spa=tank async request task=1024
1600350409   zap.c:770:fzap_checksize(): error 22
1600350410   zap.c:770:fzap_checksize(): error 22
1600350414   zap.c:770:fzap_checksize(): error 22
1600350415   zap.c:770:fzap_checksize(): error 22
1600350417   vdev_removal.c:2247:spa_removal_get_stats(): error 2
1600350417   spa_checkpoint.c:167:spa_checkpoint_get_stats(): error 1026
1600350417   zfs_ioctl.c:1418:put_nvlist(): error 12
1600350417   vdev_removal.c:2247:spa_removal_get_stats(): error 2
1600350417   spa_checkpoint.c:167:spa_checkpoint_get_stats(): error 1026
1600350417   zap_micro.c:1611:zap_cursor_retrieve(): error 2
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap.c:770:fzap_checksize(): error 22
1600350417   zap_micro.c:988:zap_lookup_impl(): error 75
1600350417   zap_leaf.c:510:zap_entry_read_name(): error 75
1600350417   zap_micro.c:1611:zap_cursor_retrieve(): error 2
1600350417   zap_micro.c:988:zap_lookup_impl(): error 75
1600350417   zap_leaf.c:510:zap_entry_read_name(): error 75
1600350417   zap_leaf.c:426:zap_leaf_lookup(): error 2
1600350417   zap_leaf.c:426:zap_leaf_lookup(): error 2
1600350417   zap_leaf.c:426:zap_leaf_lookup(): error 2
1600350417   zap_leaf.c:426:zap_leaf_lookup(): error 2
1600350417   zap_leaf.c:426:zap_leaf_lookup(): error 2
... truncated, lots more of similar to above

@geudrik
Copy link
Author

geudrik commented Sep 17, 2020

Am I just stupid and need to have all datasets mounted (keys loaded) to be able to remove a vdev? That doesn't make a lot of sense to me, given what I think I know about how ZFS works, but .. idfk 😭

@geudrik
Copy link
Author

geudrik commented Sep 17, 2020

WELP. Turns out, that is indeed the issue. I mounted every dataset and re-ran the vdev remove, and it works fine.

@geudrik geudrik closed this as completed Sep 17, 2020
@ahrens
Copy link
Member

ahrens commented Sep 17, 2020

Ah, right. The keys have to be loaded so that we can reset the ZIL logs, so that we don't write to the the device that's being removed. If you didn't want to mount the filesystems, you could use zfs load-key. The error message could definitely be improved!

@ahrens ahrens changed the title Removing a mirror from a pool of mirrors returns "permission denied" keys must be loaded to remove top-level vdev Sep 17, 2020
@geudrik
Copy link
Author

geudrik commented Sep 17, 2020

I was going to look at updating the docs to include that requirement for a top level removal, but ultimately wasn't sure about whether it was a key loaded req or a mounted one.

Appreciate the help Matthew! I'll open an mr this weekend to include that requirement in the man pages. Agree with you, too that the error message could deff be improved

geudrik added a commit to geudrik/zfs that referenced this issue Sep 18, 2020
This change is relation to openzfs#10939
behlendorf pushed a commit that referenced this issue Sep 28, 2020
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
behlendorf pushed a commit that referenced this issue Oct 1, 2020
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #10939
Closes #10948
jsai20 pushed a commit to jsai20/zfs that referenced this issue Mar 30, 2021
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#10939
Closes openzfs#10948
sempervictus pushed a commit to sempervictus/zfs that referenced this issue May 31, 2021
The error returned by `zpool remove` when the encryption keys aren't
loaded isn't very helpful.  Furthermore, the man pages make no
mention that the keys need to be loaded. This change doesn't resolve
the error message but it does update the man page to mention this
requirement.

Authored-by: grodik <pat@litke.dev>
Reviewed-by: Matthew Ahrens <mahrens@delphix.com>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#10939
Closes openzfs#10948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants