Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool clear strangeness #12090

Open
ghoshdipak opened this issue May 20, 2021 · 11 comments
Open

zpool clear strangeness #12090

ghoshdipak opened this issue May 20, 2021 · 11 comments
Assignees
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@ghoshdipak
Copy link

ghoshdipak commented May 20, 2021

CentOS8.2
4.18.0-193.28.1.x5.0.22.x86_64
modinfo zfs | grep -iw version
version: 2.1.0-x.5.0_520_ge8c59ac5
modinfo spl | grep -iw version
version: 2.1.0-x.5.0_520_ge8c59ac5

Describe the problem you're observing

Intentional damage (zero'd out the entire device) was done on a single vdev of a draid2 pool with 2 spares. After running IO
to the mounted file system on this pool, the pool became degraded, as expected. Distributed spare got engaged and resilver started. Scrub finished with zero bytes repaired. Issuing 'zpool clear' brings the DEGRADED pool and vdev to ONLINE state.
If we run IO or scrub the pool becomes degraded again.

Describe how to reproduce the problem

echo 1 > /sys/module/zfs/parameters/zfs_prefetch_disable
cat /sys/module/zfs/parameters/zfs_prefetch_disable
1
truncate -s 1G file1 file2 file3 file4 file5 file6 file7 file8 file9 file10 file11 file12 file13 file14 file15
pwd
/root/test/files
zpool create -f -o cachefile=none -o failmode=panic -O canmount=off pool-oss0 draid2:11d:15c:2s /root/test/files/file1 /root/test/files/file2 /root/test/files/file3 /root/test/files/file4 /root/test/files/file5 /root/test/files/file6 /root/test/files/file7 /root/test/files/file8 /root/test/files/file9 /root/test/files/file10 /root/test/files/file11 /root/test/files/file12 /root/test/files/file13 /root/test/files/file14 /root/test/files/file15
zfs create -o mountpoint=/mnt/ost0 pool-oss0/ost0
[root@localhost files]#
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=write --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=0 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting
test: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4
fio-3.7
Starting 1 process
test: Laying out IO file (1 file / 8192MiB)
Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=66.6MiB/s][r=0,w=66 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=10718: Wed May 19 22:26:15 2021
write: IOPS=89, BW=89.0MiB/s (93.3MB/s)(8192MiB/92043msec)
slat (usec): min=206, max=2843.9k, avg=11218.74, stdev=128346.34
clat (usec): min=3, max=2848.3k, avg=33712.67, stdev=221020.14
lat (usec): min=326, max=2852.2k, avg=44932.45, stdev=254463.05
clat percentiles (usec):
| 1.00th=[ 701], 5.00th=[ 750], 10.00th=[ 791],
| 20.00th=[ 865], 30.00th=[ 955], 40.00th=[ 1029],
| 50.00th=[ 1106], 60.00th=[ 1221], 70.00th=[ 1418],
| 80.00th=[ 2212], 90.00th=[ 8979], 95.00th=[ 16909],
| 99.00th=[1434452], 99.50th=[1602225], 99.90th=[2499806],
| 99.95th=[2566915], 99.99th=[2835350]
bw ( KiB/s): min=14307, max=411212, per=100.00%, avg=230380.26, stdev=94722.67, samples=72
iops : min= 13, max= 401, avg=224.57, stdev=92.53, samples=72
lat (usec) : 4=0.01%, 500=0.01%, 750=5.02%, 1000=31.23%
lat (msec) : 2=42.64%, 4=4.21%, 10=8.28%, 20=4.36%, 50=1.94%
lat (msec) : 100=0.22%, 750=0.04%, 1000=0.29%
cpu : usr=0.07%, sys=3.70%, ctx=2300, majf=0, minf=13
IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,8192,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
WRITE: bw=89.0MiB/s (93.3MB/s), 89.0MiB/s-89.0MiB/s (93.3MB/s-93.3MB/s), io=8192MiB (8590MB), run=92043-92043msec
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost files]# sync
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=read --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=1 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [V(1)][100.0%][r=503MiB/s,w=0KiB/s][r=503,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=27966: Wed May 19 22:26:59 2021
read: IOPS=353, BW=354MiB/s (371MB/s)(8192MiB/23161msec)
slat (usec): min=650, max=502000, avg=2625.01, stdev=13294.92
clat (usec): min=121, max=533083, avg=8483.66, stdev=23148.47
lat (usec): min=1190, max=534382, avg=11111.37, stdev=26851.01
clat percentiles (msec):
| 1.00th=[ 4], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 5],
| 30.00th=[ 5], 40.00th=[ 6], 50.00th=[ 6], 60.00th=[ 7],
| 70.00th=[ 8], 80.00th=[ 9], 90.00th=[ 12], 95.00th=[ 16],
| 99.00th=[ 44], 99.50th=[ 117], 99.90th=[ 502], 99.95th=[ 506],
| 99.99th=[ 535]
bw ( KiB/s): min=39844, max=610304, per=99.63%, avg=360838.15, stdev=151393.14, samples=46
iops : min= 38, max= 596, avg=352.26, stdev=147.95, samples=46
lat (usec) : 250=0.01%
lat (msec) : 2=0.01%, 4=19.31%, 10=66.26%, 20=11.69%, 50=1.78%
lat (msec) : 100=0.37%, 250=0.34%, 500=0.10%, 750=0.12%
cpu : usr=5.66%, sys=39.62%, ctx=33035, majf=0, minf=525
IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
READ: bw=354MiB/s (371MB/s), 354MiB/s-354MiB/s (371MB/s-371MB/s), io=8192MiB (8590MB), run=23161-23161msec
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost files]#
[root@localhost files]#
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# dd if=/dev/zero of=file1 bs=1M count=1024 oflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.812825 s, 1.3 GB/s
[root@localhost files]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost files]# echo 3 > /proc/sys/vm/drop_caches
[root@localhost files]#
[root@localhost files]#
[root@localhost files]#
[root@localhost files]# fio --name=test --ioengine=libaio --fallocate=none --iodepth=4 --rw=read --bs=1M --size=8G --numjobs=1 --allow_mounted_write=1 --do_verify=1 --verify=pattern --verify_pattern=0x0A0B0C0D --filename=/mnt/ost0/testfile --group_reporting
test: (g=0): rw=read, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=4
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [V(1)][100.0%][r=501MiB/s,w=0KiB/s][r=501,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=72740: Wed May 19 22:28:32 2021
read: IOPS=184, BW=185MiB/s (194MB/s)(8192MiB/44367msec)
slat (usec): min=548, max=501887, avg=5117.17, stdev=16143.65
clat (usec): min=131, max=1124.1k, avg=16247.34, stdev=36912.74
lat (usec): min=1040, max=1404.9k, avg=21368.78, stdev=45991.99
clat percentiles (msec):
| 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 5],
| 30.00th=[ 6], 40.00th=[ 7], 50.00th=[ 8], 60.00th=[ 11],
| 70.00th=[ 14], 80.00th=[ 19], 90.00th=[ 33], 95.00th=[ 49],
| 99.00th=[ 138], 99.50th=[ 218], 99.90th=[ 506], 99.95th=[ 726],
| 99.99th=[ 1116]
bw ( KiB/s): min= 2043, max=913848, per=99.76%, avg=188614.28, stdev=158911.91, samples=88
iops : min= 1, max= 892, avg=183.91, stdev=155.26, samples=88
lat (usec) : 250=0.01%
lat (msec) : 2=0.01%, 4=15.56%, 10=43.88%, 20=22.18%, 50=13.67%
lat (msec) : 100=3.20%, 250=1.15%, 500=0.18%, 750=0.10%, 1000=0.02%
cpu : usr=3.78%, sys=21.27%, ctx=30665, majf=0, minf=524
IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=8192,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=4

Run status group 0 (all jobs):
READ: bw=185MiB/s (194MB/s), 185MiB/s-185MiB/s (194MB/s-194MB/s), io=8192MiB (8590MB), run=44367-44367msec
[root@localhost files]#

[root@localhost ~]# zpool status -v 2
pool: pool-oss0
state: ONLINE
config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    ONLINE       0     0     0
      draid2:11d:15c:2s-0        ONLINE       0     0     0
        /root/test/files/file1   ONLINE       0     0     0
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

When there is no IO repeats the same status as above because the pool did not recognize the failure.

While running IO from another terminal, the following status observed:

pool: pool-oss0
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    DEGRADED     0     0     0
      draid2:11d:15c:2s-0        DEGRADED     0     0     0
        /root/test/files/file1   DEGRADED     0     0 1.88K  too many errors
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

..............................

pool: pool-oss0
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver (draid2:11d:15c:2s-0) in progress since Wed May 19 22:27:49 2021
9.75G scanned at 714M/s, 8.31G issued 608M/s, 9.75G total
655M resilvered, 100.00% done, 00:00:00 to go
config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 4.72K
          /root/test/files/file1  DEGRADED     0     0 2.23K  too many errors
          draid2-0-0              ONLINE       0     0     0  (resilvering)
        /root/test/files/file2    ONLINE       0     0     0  (resilvering)
        /root/test/files/file3    ONLINE       0     0     0  (resilvering)
        /root/test/files/file4    ONLINE       0     0     0  (resilvering)
        /root/test/files/file5    ONLINE       0     0     0  (resilvering)
        /root/test/files/file6    ONLINE       0     0     0  (resilvering)
        /root/test/files/file7    ONLINE       0     0     0  (resilvering)
        /root/test/files/file8    ONLINE       0     0     0  (resilvering)
        /root/test/files/file9    ONLINE       0     0     0  (resilvering)
        /root/test/files/file10   ONLINE       0     0     0  (resilvering)
        /root/test/files/file11   ONLINE       0     0     0  (resilvering)
        /root/test/files/file12   ONLINE       0     0     0  (resilvering)
        /root/test/files/file13   ONLINE       0     0     0  (resilvering)
        /root/test/files/file14   ONLINE       0     0     0  (resilvering)
        /root/test/files/file15   ONLINE       0     0     0  (resilvering)
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

pool: pool-oss0
state: DEGRADED
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub in progress since Wed May 19 22:28:03 2021
9.75G scanned at 4.88G/s, 855M issued at 428M/s, 9.75G total
0B repaired, 8.56% done, 00:00:21 to go
scan: resilvered (draid2:11d:15c:2s-0) 655M in 00:00:14 with 0 errors on Wed May 19 22:28:03 2021
config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 4.74K
          /root/test/files/file1  DEGRADED     0     0 2.23K  too many errors
          draid2-0-0              ONLINE       0     0     0
        /root/test/files/file2    ONLINE       0     0     0
        /root/test/files/file3    ONLINE       0     0     0
        /root/test/files/file4    ONLINE       0     0     0
        /root/test/files/file5    ONLINE       0     0     0
        /root/test/files/file6    ONLINE       0     0     0
        /root/test/files/file7    ONLINE       0     0     0
        /root/test/files/file8    ONLINE       0     0     0
        /root/test/files/file9    ONLINE       0     0     0
        /root/test/files/file10   ONLINE       0     0     0
        /root/test/files/file11   ONLINE       0     0     0
        /root/test/files/file12   ONLINE       0     0     0
        /root/test/files/file13   ONLINE       0     0     0
        /root/test/files/file14   ONLINE       0     0     0
        /root/test/files/file15   ONLINE       0     0     0
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

[root@localhost ~]# zpool clear pool-oss0
[root@localhost ~]# zpool status -v
pool: pool-oss0
state: ONLINE
scan: scrub repaired 0B in 00:00:20 with 0 errors on Wed May 19 22:28:23 2021
scan: resilvered (draid2:11d:15c:2s-0) 655M in 00:00:14 with 0 errors on Wed May 19 22:28:03 2021
config:

    NAME                         STATE     READ WRITE CKSUM
    pool-oss0                    ONLINE       0     0     0
      draid2:11d:15c:2s-0        ONLINE       0     0     0
        /root/test/files/file1   ONLINE       0     0     0
        /root/test/files/file2   ONLINE       0     0     0
        /root/test/files/file3   ONLINE       0     0     0
        /root/test/files/file4   ONLINE       0     0     0
        /root/test/files/file5   ONLINE       0     0     0
        /root/test/files/file6   ONLINE       0     0     0
        /root/test/files/file7   ONLINE       0     0     0
        /root/test/files/file8   ONLINE       0     0     0
        /root/test/files/file9   ONLINE       0     0     0
        /root/test/files/file10  ONLINE       0     0     0
        /root/test/files/file11  ONLINE       0     0     0
        /root/test/files/file12  ONLINE       0     0     0
        /root/test/files/file13  ONLINE       0     0     0
        /root/test/files/file14  ONLINE       0     0     0
        /root/test/files/file15  ONLINE       0     0     0
    spares
      draid2-0-0                 AVAIL
      draid2-0-1                 AVAIL

errors: No known data errors

When IO ran from another terminal the following status observed:

[root@localhost ~]# zpool status -v
pool: pool-oss0
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: scrub repaired 0B in 00:00:20 with 0 errors on Wed May 19 22:28:23 2021
scan: resilver (draid2:11d:15c:2s-0) in progress since Wed May 19 22:29:00 2021
2.21G scanned at 693M/s, 1.83G issued 575M/s, 9.75G total
100M resilvered, 22.65% done, 00:00:11 to go
config:

    NAME                          STATE     READ WRITE CKSUM
    pool-oss0                     DEGRADED     0     0     0
      draid2:11d:15c:2s-0         DEGRADED     0     0     0
        spare-0                   DEGRADED     0     0 1.20K
          /root/test/files/file1  DEGRADED     0     0 1.04K  too many errors
          draid2-0-0              ONLINE       0     0     0  (resilvering)
        /root/test/files/file2    ONLINE       0     0     0
        /root/test/files/file3    ONLINE       0     0     0  (resilvering)
        /root/test/files/file4    ONLINE       0     0     0
        /root/test/files/file5    ONLINE       0     0     0
        /root/test/files/file6    ONLINE       0     0     0  (resilvering)
        /root/test/files/file7    ONLINE       0     0     0  (resilvering)
        /root/test/files/file8    ONLINE       0     0     0  (resilvering)
        /root/test/files/file9    ONLINE       0     0     0  (resilvering)
        /root/test/files/file10   ONLINE       0     0     0  (resilvering)
        /root/test/files/file11   ONLINE       0     0     0  (resilvering)
        /root/test/files/file12   ONLINE       0     0     0
        /root/test/files/file13   ONLINE       0     0     0  (resilvering)
        /root/test/files/file14   ONLINE       0     0     0  (resilvering)
        /root/test/files/file15   ONLINE       0     0     0  (resilvering)
    spares
      draid2-0-0                  INUSE     currently in use
      draid2-0-1                  AVAIL

errors: No known data errors

@ghoshdipak ghoshdipak added Status: Triage Needed New issue which needs to be triaged Type: Defect Incorrect behavior (e.g. crash, hang) labels May 20, 2021
@behlendorf behlendorf removed the Status: Triage Needed New issue which needs to be triaged label May 20, 2021
@stale
Copy link

stale bot commented May 31, 2022

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label May 31, 2022
@ghoshdipak
Copy link
Author

We are still experiencing the same issue with OpenZFS 2.1.4

@stale stale bot removed the Status: Stale No recent activity for issue label Jun 1, 2022
@behlendorf
Copy link
Contributor

Thanks for retesting this with OpenZFS 2.1.4, I'll look in to what's exactly going on here. Somewhat related to this you might want to look at PR #13499 which slightly changes the ZED behavior. This fix was queued up for 2.1.5.

@behlendorf behlendorf self-assigned this Jun 1, 2022
@akashb-22
Copy link
Contributor

The same issue is seen when tried with the PR:13499
zpool_clear.txt

@behlendorf
Copy link
Contributor

@akashb-22 would you mind verifying the fix in #13555 resolves this issue.

@akashb-22
Copy link
Contributor

  1. Case 1: The Issue on damage to a single vdev is working fine with the fix.
  2. Case 2: When the same issue is tested when damage is done to 2 child vdevs on draid2, permanent errors are reported on zpool status -v, but the actual data is correct(verified using md5sum).
  3. Case 3: When the same issue is tested when damage is done to 2 child vdevs on draid2, the system PANICs and goes into a weird state.
    Anyone of case2 or case3 is observed when choosing 2 different child vdevs. The same is not observed on raidz2.
    zpool_clear1.txt

@behlendorf
Copy link
Contributor

Thanks for the quick turn around. I see what's going on with case 2 and 3 and will work on updating the PR to take a slightly different approach.

@behlendorf
Copy link
Contributor

I've updated the PR. Would you mind testing the updated patch.

@akashb-22
Copy link
Contributor

Case 1: Damage to a single vdev is working fine with the fix.
Case 2: Damage to 2 child vdevs on draid2 multiple times along with zpool clear is working fine with the fix.
Case3: Damage to 2 child vdevs simultaneously one after other on draid2 + 2 dspare results in all the child vdevs going to DEGRADED state and eventually leads to data loss, but after zpool-clear and a scrub brings the zpool to ONLINE state and all the data files are present in the filesystem with no loss. This case works fine with raidz2.
Also, this error is seen on zpool status momentarily during the tests(case3):

munmap_chunk(): invalid pointer
Aborted (core dumped)

zpool_clear2.txt

@behlendorf
Copy link
Contributor

behlendorf commented Jun 17, 2022

That third test case really is a pretty nasty absolute worst case scenario. I was able to reproduce it locally but it'll take me a while to get to a root cause. In the meanwhile, let's move forward with the current PR which at least sorts out those first two test cases.

@akashb-22
Copy link
Contributor

akashb-22 commented Feb 23, 2024

@behlendorf
Another case is observed where after the rebuild once the scrub is in-progress on issuing a zpool_clear on DEBUG builds panic at this point. however, I do not see any issue in the normal builds. I have a testcase where it is 100% reproducible.
This is seen in zfs-2.1-release / zfs-2.2-release and master.
stacktrace:

[Tue Feb 13 00:55:35 2024] VERIFY(spa_config_held(os->os_spa, SCL_ALL, RW_WRITER) == 0 || (spa_is_root(os->os_spa) && spa_config_held(os->os_spa, SCL_STATE, RW_WRITER))) failed
[Tue Feb 13 00:55:35 2024] PANIC at dnode.c:1422:dnode_hold_impl()
[Tue Feb 13 00:55:35 2024] Showing stack for process 186367
[Tue Feb 13 00:55:35 2024] CPU: 4 PID: 186367 Comm: zpool Kdump: loaded Tainted: P           OE    --------- -  - 4.18.0-372.9.1.el8.x86_64 #1
[Tue Feb 13 00:55:35 2024] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.10.2-1ubuntu1 04/01/2014
[Tue Feb 13 00:55:35 2024] Call Trace:
[Tue Feb 13 00:55:35 2024]  dump_stack+0x41/0x60
[Tue Feb 13 00:55:35 2024]  spl_panic+0xd0/0xe8 [spl]
[Tue Feb 13 00:55:35 2024]  ? virtqueue_add+0xace/0xb40
[Tue Feb 13 00:55:35 2024]  dnode_hold_impl+0x755/0x11c0 [zfs]
[Tue Feb 13 00:55:35 2024]  ? cpumask_next_and+0x1a/0x20
[Tue Feb 13 00:55:35 2024]  ? update_sd_lb_stats.constprop.121+0xca/0x830
[Tue Feb 13 00:55:35 2024]  dmu_buf_hold_noread+0x37/0x140 [zfs]
[Tue Feb 13 00:55:35 2024]  dmu_buf_hold+0x37/0x80 [zfs]
[Tue Feb 13 00:55:35 2024]  zap_lockdir+0x54/0x110 [zfs]
[Tue Feb 13 00:55:35 2024]  ? find_busiest_group+0x41/0x360
[Tue Feb 13 00:55:35 2024]  zap_lookup_norm+0x5d/0xd0 [zfs]
[Tue Feb 13 00:55:35 2024]  zap_lookup+0x12/0x20 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_dirent_lock+0x5b5/0x7c0 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_dirlook+0x8a/0x300 [zfs]
[Tue Feb 13 00:55:35 2024]  ? zfs_zaccess+0x21e/0x440 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_lookup+0x24b/0x400 [zfs]
[Tue Feb 13 00:55:35 2024]  zpl_lookup+0xc2/0x290 [zfs]
[Tue Feb 13 00:55:35 2024]  __lookup_slow+0x97/0x150
[Tue Feb 13 00:55:35 2024]  lookup_slow+0x35/0x50
[Tue Feb 13 00:55:35 2024]  walk_component+0x1bf/0x2f0
[Tue Feb 13 00:55:35 2024]  ? legitimize_path.isra.42+0x2d/0x60
[Tue Feb 13 00:55:35 2024]  path_lookupat.isra.47+0x75/0x200
[Tue Feb 13 00:55:35 2024]  ? filename_lookup.part.61+0xe0/0x170
[Tue Feb 13 00:55:35 2024]  ? kmem_cache_free+0x116/0x300
[Tue Feb 13 00:55:35 2024]  filename_lookup.part.61+0xa0/0x170
[Tue Feb 13 00:55:35 2024]  ? __switch_to_asm+0x35/0x70
[Tue Feb 13 00:55:35 2024]  ? getname_kernel+0x2c/0x100
[Tue Feb 13 00:55:35 2024]  ? kmem_cache_alloc+0x13f/0x280
[Tue Feb 13 00:55:35 2024]  ? vdev_draid_open_spares+0x30/0x30 [zfs]
[Tue Feb 13 00:55:35 2024]  ? vdev_draid_open_spares+0x30/0x30 [zfs]
[Tue Feb 13 00:55:35 2024]  lookup_bdev.part.46+0x28/0xa0
[Tue Feb 13 00:55:35 2024]  zvol_is_zvol_impl+0xb/0x40 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x18/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x3d/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_uses_zvols+0x3d/0x70 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_open_children_impl+0xaa/0x120 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_draid_open+0x63/0x1f0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_open+0xe4/0xa10 [zfs]
[Tue Feb 13 00:55:35 2024]  ? mutex_lock+0xe/0x30
[Tue Feb 13 00:55:35 2024]  vdev_reopen+0x4e/0x1f0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x188/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  vdev_clear+0x87/0x2b0 [zfs]
[Tue Feb 13 00:55:35 2024]  zfs_ioc_clear+0x18b/0x380 [zfs]
[Tue Feb 13 00:55:35 2024]  ? strlcpy+0x2d/0x40
[Tue Feb 13 00:55:35 2024]  zfsdev_ioctl_common+0x656/0x980 [zfs]
[Tue Feb 13 00:55:35 2024]  ? __kmalloc_node+0x26e/0x2a0
[Tue Feb 13 00:55:35 2024]  ? __handle_mm_fault+0x7a6/0x7e0
[Tue Feb 13 00:55:35 2024]  zfsdev_ioctl+0x4f/0xe0 [zfs]
[Tue Feb 13 00:55:35 2024]  do_vfs_ioctl+0xa4/0x680
[Tue Feb 13 00:55:35 2024]  ? handle_mm_fault+0xc1/0x1e0
[Tue Feb 13 00:55:35 2024]  ? syscall_trace_enter+0x1fb/0x2c0
[Tue Feb 13 00:55:35 2024]  ksys_ioctl+0x60/0x90
[Tue Feb 13 00:55:35 2024]  __x64_sys_ioctl+0x16/0x20
[Tue Feb 13 00:55:35 2024]  do_syscall_64+0x5b/0x1a0
[Tue Feb 13 00:55:35 2024]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[Tue Feb 13 00:55:35 2024] RIP: 0033:0x7fea9f6836db
[Tue Feb 13 00:55:35 2024] Code: 73 01 c3 48 8b 0d ad 57 38 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 7d 57 38 00 f7 d8 64 89 01 48
[Tue Feb 13 00:55:35 2024] RSP: 002b:00007ffd5047ce08 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[Tue Feb 13 00:55:35 2024] RAX: ffffffffffffffda RBX: 00007ffd5047d250 RCX: 00007fea9f6836db
[Tue Feb 13 00:55:35 2024] RDX: 00007ffd5047d250 RSI: 0000000000005a21 RDI: 0000000000000003
[Tue Feb 13 00:55:35 2024] RBP: 00007ffd50480830 R08: 0000000000000040 R09: 0000000001b9721c
[Tue Feb 13 00:55:35 2024] R10: 000000000000001c R11: 0000000000000246 R12: 0000000001b5db70
[Tue Feb 13 00:55:35 2024] R13: 0000000000000000 R14: 0000000001b61bc0 R15: 00007ffd5047ce50

Further testing shows this issue is

  1. not seen in in raidz2+2s
  2. not seen in draid w/o dspares
  3. not seen in draid w/ HOT spares

Issue is always 100% hit w/ above testcase in draid2 w/ 2 dspares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants