-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
System information
| Type | Version/Name |
|---|---|
| Distribution Name | Ubuntu |
| Distribution Version | 18.04 |
| Linux Kernel | 5.4.0-42 |
| Architecture | x86 |
| ZFS Version | 2.0++ |
| Commit | 8f158ae |
Describe the problem you're observing
A zloop run failed without producing a core file. ztest.out shows that the failure comes from zdb (attempting to verify a pool) returning an error (EINVAL):
Executing zdb -bccsv -G -d -Y -e -y -p /var/tmp/os-ztest/zloop-run ztest
zdb: can't open 'ztest': Invalid argument`
% grep -i einval /usr/include/asm-generic/errno-base.h
#define EINVAL 22 /* Invalid argument */
The actual error is coming from a call to vdev_open() (which occurs prior to the call to spa_load_failed() that generated the error message). This function is returning the EINVAL error being reported. Within vdev_open(), the error is being set because asize < vdev_min_asize for a top-level vdev:
/*
* Make sure the allocatable size hasn't shrunk too much.
*/
if (asize < vd->vdev_min_asize) {
vdev_set_state(vd, B_TRUE, VDEV_STATE_CANT_OPEN,
VDEV_AUX_BAD_LABEL);
return (SET_ERROR(EINVAL));
}
Here are the current values being compared (note that asize comes from osize in this function):
(gdb) print osize
$21 = 4798283776
(gdb) print vd->vdev_min_asize
$22 = 4831838208
The top-level vdev here is of type draid. The asize of a draid vdev is computed by summing the asize its children (minus the space reserved for distributed spares). Note that there is a spare device currently deployed in this top-level vdev as child vdev 4:
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 0: root, guid: 7397569113689487881, path: N/A, can't open
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 0: draid, guid: 4525148408276004100, path: N/A, can't open
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 0: file, guid: 16492265217803933395, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.0a, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 1: file, guid: 13189481552791461187, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.1a, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 2: file, guid: 1960318212727225725, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.2b, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 3: file, guid: 795303241160842783, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.3a, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 4: spare, guid: 17473580923192177435, path: N/A, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 0: file, guid: 5222868485229018091, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.4a, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 1: dspare, guid: 2629688768943859102, path: draid2-0-1, healthy
vdev.c:195:vdev_dbgmsg_print_tree(): vdev 5: file, guid: 17532226906533716578, path: /net/pharos/export/bugs/DLPX-73135/vdev/ztest.5a, healthy
...
Looking at the sizes for this spare and its children we see:
(gdb) print vd->vdev_child[4]->vdev_asize
$17 = 483131392
(gdb) print vd->vdev_child[4]->vdev_children
$18 = 2
(gdb) print vd->vdev_child[4]->vdev_child[0]->vdev_asize
$19 = 532152320
(gdb) print vd->vdev_child[4]->vdev_child[1]->vdev_asize
$20 = 483131392
The asize for the dspare is significantly smaller than the asize of the device it is sparing! The spare parent vdev reports the smaller vdev size. All other child vdevs in this draid report the larger size:
(gdb) set $i = 0
(gdb) p vd->vdev_child[$i++]->vdev_asize
$36 = 532152320
(gdb)
$37 = 532152320
(gdb)
$38 = 483131392
(gdb)
$39 = 532152320
(gdb)
$40 = 483131392
(gdb)
$41 = 532152320
(gdb)
$42 = 532152320
(gdb)
$43 = 532152320
(gdb)
$44 = 532152320
(gdb)
$45 = 532152320
(gdb)
$46 = 532152320
(gdb)
$47 = 532152320
(gdb)
$48 = 532152320
This difference in asize likely explains the unexpected small asize for the top-level vdev that generated this error. However, more investigation is needed to determine why the dspare has a smaller than expected size here.
Describe how to reproduce the problem
This is reproducible with zloop.