Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No more "can only attach to mirrors and top-level disks" protection, crashing zdb #15536

Closed
ocochard opened this issue Nov 16, 2023 · 6 comments
Assignees
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression

Comments

@ocochard
Copy link

System information

Type Version/Name
Distribution Name FreeBSD
Distribution Version FreeBSD 15.0-CURRENT #0 main-n266452-070d9e3540e6: Thu Nov 16 17:53:15 CET 2023
Architecture amd64
OpenZFS Version zfs-2.2.99-202-FreeBSD_g887a3c533 (887a3c533)

Describe the problem you're observing

Regression since about 1-2 months:
Previously we could not attach a disk to a non-top-level disks, and even a FreeBSD regression tests was checking for this limitation.
https://cgit.freebsd.org/src/tree/tests/sys/cddl/zfs/tests/replacement/replacement_002_pos.ksh

But since recently on FreeBSD current, we could create some mirror. But later zdb is panicing trying to check those zpool.

Describe how to reproduce the problem

Here is a FreeBSD shell script to reproduce it:

#!/bin/sh                                                                                                                           [6/4439]
# Reproduce zdb core dump on:
# FreeBSD 15.0-CURRENT #0 main-n266452-070d9e3540e6: Thu Nov 16 17:53:15 CET 2023
# This include  "zfs: merge openzfs/zfs@887a3c533"
set -eu

rm -f zdb.core
# Create 4 virtual disks (Need 4GB available in /tmp)
for i in $(jot 4); do
  truncate -s 1G /tmp/disk$i.img
  mdconfig -a -t vnode -u $i -f /tmp/disk$i.img
done

# Reproduce a simpler version of cddl/zfs/tests/replacement/replacement_002_pos.ksh
zpool create -f testpool raidz1 /dev/md1 /dev/md2 /dev/md3
zfs create testpool/testfs
zfs set mountpoint=/testdir testpool/testfs
echo "Convert /dev/md1 into a mirror by attaching a new disk to it...This should be prevented"
# Previous version "cannot attach /dev/md4 to /dev/md1: can only attach to mirrors and top-level disks"
# works on: 15.0-CURRENT FreeBSD 15.0-CURRENT #11 main-n265732-166a655fcf1: Thu Oct  5 03:22:24 PDT 2023
zpool attach testpool /dev/md1 /dev/md4 && echo "First bug triggered: attach worked" || echo "First bug not triggered!"
zpool export testpool
zpool import testpool
zfs umount testpool/testfs
echo "zdb verifying checksum... should not panic"
zdb -cdui testpool/testfs && echo "Second bug not triggered (zdb didn't fails)" || echo "Second bug triggered (zdb fails)"
if [ -f zdb.core ]; then
        echo "Second bug triggered: Core file found"
else
        echo "Second bug not triggered: No core file found"
fi
# Cleanup
zpool export testpool
for i in $(jot 4); do
  mdconfig -d -u $i
  rm /tmp/disk$i.img
done

Include any warning/errors/backtraces from the system logs

zdb panic error message:

zio->io_vd->vdev_ops == &vdev_replacing_ops || zio->io_vd->vdev_ops == &vdev_spare_ops
ASSERT at /usr/src/sys/contrib/openzfs/module/zfs/vdev_mirror.c:796:vdev_mirror_io_done()zio->io_vd->vdev_ops == &vdev_replacing_ops || zio->io_vd->vdev_ops == &vdev_spare_ops
ASSERT at /usr/src/sys/contrib/openzfs/module/zfs/vdev_mirror.c:796:vdev_mirror_io_done()zio->io_vd->vdev_ops == &vdev_replacing_ops || zio->io_vd->vdev_ops == &vdev_spare_ops
ASSERT at /usr/src/sys/contrib/openzfs/module/zfs/vdev_mirror.c:796:vdev_mirror_io_done()Abort trap (core dumped)


Core was generated by `zdb -cdui testpool/testfs'.
Program terminated with signal SIGABRT, Aborted.
Sent by thr_kill() from pid 46583 and user 0.

warning: Section `.reg-xstate/203090' in core file too small.
#0  thr_kill () at thr_kill.S:4
4       RSYSCALL(thr_kill)
[Current thread is 1 (LWP 203090)]

(gdb) bt
#0  thr_kill () at thr_kill.S:4
#1  0x0000224b1ffd65c4 in __raise (s=s@entry=6) at /usr/src/lib/libc/gen/raise.c:50
#2  0x0000224b20088989 in abort () at /usr/src/lib/libc/stdlib/abort.c:64
#3  0x0000224b1c66812f in libspl_assertf (file=0x224b1ea49975 "/usr/src/sys/contrib/openzfs/module/zfs/vdev_mirror.c",
    func=0x224b1ea4f60a "vdev_mirror_io_done", line=line@entry=796, format=<optimized out>)
    at /usr/src/sys/contrib/openzfs/lib/libspl/assert.c:54
#4  0x0000224b1ec2179b in libspl_assert (buf=0x0, file=0x31952 <error: Cannot access memory at address 0x31952>,
    func=0x6 <error: Cannot access memory at address 0x6>, line=796) at /usr/src/sys/contrib/openzfs/lib/libspl/include/assert.h:59
#5  vdev_mirror_io_done (zio=0x37ea4c22d400) at /usr/src/sys/contrib/openzfs/module/zfs/vdev_mirror.c:795
#6  0x0000224b1eca0407 in zio_vdev_io_done (zio=0x37ea4c22d400) at /usr/src/sys/contrib/openzfs/module/zfs/zio.c:4064
#7  0x0000224b1ec98b80 in __zio_execute (zio=0x37ea4c22d400) at /usr/src/sys/contrib/openzfs/module/zfs/zio.c:2316
#8  zio_execute (zio=<optimized out>) at /usr/src/sys/contrib/openzfs/module/zfs/zio.c:2227
#9  0x0000224b1eafbed1 in taskq_thread (arg=arg@entry=0x37ea40c1d900) at /usr/src/sys/contrib/openzfs/lib/libzpool/taskq.c:240
#10 0x0000224b1eaf9249 in zk_thread_wrapper (arg=<optimized out>) at /usr/src/sys/contrib/openzfs/lib/libzpool/kernel.c:90
#11 0x0000224b258338eb in thread_start (curthread=0x37ea40da3200) at /usr/src/lib/libthr/thread/thr_create.c:290
#12 0x0000000000000000 in ?? ()
Backtrace stopped: Cannot access memory at address 0x224b7fa21000
@ocochard ocochard added the Type: Defect Incorrect behavior (e.g. crash, hang) label Nov 16, 2023
@rincebrain rincebrain added Component: Test Suite Indicates an issue with the test framework or a test case and removed Component: Test Suite Indicates an issue with the test framework or a test case labels Nov 17, 2023
@ocochard
Copy link
Author

After some bisection, here is some commit range.

A known FreeBSD good version:

commit 14c2e0a0c57e48a41433fdca668fac8882fb04df
Merge: f25b0d6dd75 9198de8f107
Author: Martin Matuska <mm@FreeBSD.org>
Date:   Wed Nov 8 09:16:01 2023 +0100

    zfs: merge openzfs/zfs@9198de8f1

    Notable upstream pull request merges:
     #15197 3bd4df384 Improve ZFS objset sync parallelism
     #15455 020f6fd09 FreeBSD: Implement taskq_init_ent()
     #15476 3d86999c7 sa_lookup() ignores buffer size
     #15478 2a154b848 Fix accounting error for pending sync IO ops in
                      zpool iostat
     #15484 dc45a00ea Add kern.features.zfs
     #15486 e36ff84c3 Update the kstat dataset_name when renaming a zvol
     #15491 f4cd1bac7 Make abd_raidz_gen_iterate() pass an initialized
                      pointer to the callback
     #15495 58398cbd0 FreeBSD: Optimize large kstat outputs

    Obtained from:  OpenZFS
    OpenZFS commit: 9198de8f1079a8bbb837de3e3f8e236777b1375d

First known FreeBSD bad version:

commit e716630d4cf89e69ec3f675ebfceee09f1a85e05 (HEAD)
Merge: f5b3e686292 887a3c533b9
Author: Martin Matuska <mm@FreeBSD.org>
Date:   Thu Nov 9 11:42:33 2023 +0100

    zfs: merge openzfs/zfs@887a3c533

    Notable upstream pull request merges:
     #15022 5caeef02f RAID-Z expansion feature
     #15457 887a3c533 Increase L2ARC write rate and headroom
     #15504 1c1be60fa Unbreak FreeBSD world build after 3bd4df384

    Obtained from:  OpenZFS
    OpenZFS commit: 887a3c533b94a4b70075e310f15c45b9dee19410

@amotin
Copy link
Member

amotin commented Nov 17, 2023

I guess it may be a fallout from #15022 , since AFAIK it used zpool attach to add new disks to existing RAIDZ. @don-brady ^^^

@don-brady
Copy link
Contributor

don-brady commented Nov 22, 2023

Will take a look on Linux using the above script...

I can reproduce on Linux with a modified version of the script (to account for device management differences).

I have an idea what's going on and should be able to fix.

@don-brady
Copy link
Contributor

The test, replacement_002_pos.ksh, is not in the OpenZFS repo.

@don-brady
Copy link
Contributor

I isolated the issue and have a fix. I'll open a PR after some additional testing. We don't seem to have a test case covering attaching to a child of a raidz vdev.

Here's the proposed fix:

diff --git a/module/zfs/spa.c b/module/zfs/spa.c
index 20225640f..62a3a59bd 100644
--- a/module/zfs/spa.c
+++ b/module/zfs/spa.c
@@ -7063,12 +7063,13 @@ spa_vdev_attach(spa_t *spa, uint64_t guid, nvlist_t *nvroot, int replacing,

        if (!replacing) {
                /*
-                * For attach, the only allowable parent is a mirror or the root
-                * vdev.
+                * For attach, the only allowable parent is a mirror
+                * or the root vdev. A raidz vdev can be attached to,
+                * but you cannot attach to a raidz child.
                 */
                if (pvd->vdev_ops != &vdev_mirror_ops &&
-                   pvd->vdev_ops != &vdev_raidz_ops &&
-                   pvd->vdev_ops != &vdev_root_ops)
+                   pvd->vdev_ops != &vdev_root_ops &&
+                   !raidz)
                        return (spa_vdev_exit(spa, newrootvd, txg, ENOTSUP));

                pvops = &vdev_mirror_ops;

@don-brady don-brady added the Type: Regression Indicates a functional regression label Nov 23, 2023
@don-brady don-brady self-assigned this Nov 23, 2023
@don-brady
Copy link
Contributor

attach-issue-linux.txt
Here's a version of the reproducer for Linux.

lundman pushed a commit to openzfsonwindows/openzfs that referenced this issue Dec 12, 2023
Reviewed-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Closes openzfs#15536
Closes openzfs#15564
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang) Type: Regression Indicates a functional regression
Projects
None yet
Development

No branches or pull requests

4 participants