Skip to content
This repository has been archived by the owner on Nov 7, 2019. It is now read-only.

Commit

Permalink
9075 Improve ZFS pool import/load process and corrupted pool recovery
Browse files Browse the repository at this point in the history
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>

Some work has been done lately to improve the debugability of the ZFS pool
load (and import) process. This includes:

	7638 Refactor spa_load_impl into several functions
	8961 SPA load/import should tell us why it failed
	7277 zdb should be able to print zfs_dbgmsg's

To iterate on top of that, there's a few changes that were made to make the
import process more resilient and crash free. One of the first tasks during the
pool load process is to parse a config provided from userland that describes
what devices the pool is composed of. A vdev tree is generated from that config,
and then all the vdevs are opened.

The Meta Object Set (MOS) of the pool is accessed, and several metadata objects
that are necessary to load the pool are read. The exact configuration of the
pool is also stored inside the MOS. Since the configuration provided from
userland is external and might not accurately describe the vdev tree
of the pool at the txg that is being loaded, it cannot be relied upon to safely
operate the pool. For that reason, the configuration in the MOS is read early
on. In the past, the two configurations were compared together and if there was
a mismatch then the load process was aborted and an error was returned.

The latter was a good way to ensure a pool does not get corrupted, however it
made the pool load process needlessly fragile in cases where the vdev
configuration changed or the userland configuration was outdated. Since the MOS
is stored in 3 copies, the configuration provided by userland doesn't have to be
perfect in order to read its contents. Hence, a new approach has been adopted:
The pool is first opened with the untrusted userland configuration just so that
the real configuration can be read from the MOS. The trusted MOS configuration
is then used to generate a new vdev tree and the pool is re-opened.

When the pool is opened with an untrusted configuration, writes are disabled
to avoid accidentally damaging it. During reads, some sanity checks are
performed on block pointers to see if each DVA points to a known vdev;
when the configuration is untrusted, instead of panicking the system if those
checks fail we simply avoid issuing reads to the invalid DVAs.

This new two-step pool load process now allows rewinding pools accross
vdev tree changes such as device replacement, addition, etc. Loading a pool
from an external config file in a clustering environment also becomes much
safer now since the pool will import even if the config is outdated and didn't,
for instance, register a recent device addition.

With this code in place, it became relatively easy to implement a
long-sought-after feature: the ability to import a pool with missing top level
(i.e. non-redundant) devices. Note that since this almost guarantees some loss
of data, this feature is for now restricted to a read-only import.

Closes #539
  • Loading branch information
pzakha authored and Prakash Surya committed Feb 13, 2018
1 parent d544209 commit 619c012
Show file tree
Hide file tree
Showing 34 changed files with 2,792 additions and 542 deletions.
3 changes: 3 additions & 0 deletions usr/src/cmd/mdb/common/modules/zfs/zfs.c
Original file line number Diff line number Diff line change
Expand Up @@ -1569,6 +1569,9 @@ do_print_vdev(uintptr_t addr, int flags, int depth, boolean_t recursive,
case VDEV_AUX_SPLIT_POOL:
aux = "SPLIT_POOL";
break;
case VDEV_AUX_CHILDREN_OFFLINE:
aux = "CHILDREN_OFFLINE";
break;
default:
aux = "UNKNOWN";
break;
Expand Down
9 changes: 9 additions & 0 deletions usr/src/cmd/zpool/zpool_main.c
Original file line number Diff line number Diff line change
Expand Up @@ -1562,6 +1562,10 @@ print_status_config(zpool_handle_t *zhp, const char *name, nvlist_t *nv,
(void) printf(gettext("split into new pool"));
break;

case VDEV_AUX_CHILDREN_OFFLINE:
(void) printf(gettext("all children offline"));
break;

default:
(void) printf(gettext("corrupted data"));
break;
Expand Down Expand Up @@ -1649,6 +1653,10 @@ print_import_config(const char *name, nvlist_t *nv, int namewidth, int depth)
(void) printf(gettext("too many errors"));
break;

case VDEV_AUX_CHILDREN_OFFLINE:
(void) printf(gettext("all children offline"));
break;

default:
(void) printf(gettext("corrupted data"));
break;
Expand Down Expand Up @@ -2296,6 +2304,7 @@ zpool_do_import(int argc, char **argv)
idata.poolname = searchname;
idata.guid = searchguid;
idata.cachefile = cachefile;
idata.policy = policy;

pools = zpool_search_import(g_zfs, &idata);

Expand Down
1 change: 1 addition & 0 deletions usr/src/lib/libzfs/common/libzfs.h
Original file line number Diff line number Diff line change
Expand Up @@ -388,6 +388,7 @@ typedef struct importargs {
int can_be_active : 1; /* can the pool be active? */
int unique : 1; /* does 'poolname' already exist? */
int exists : 1; /* set on return if pool already exists */
nvlist_t *policy; /* rewind policy (rewind txg, etc.) */
} importargs_t;

extern nvlist_t *zpool_search_import(libzfs_handle_t *, importargs_t *);
Expand Down
19 changes: 17 additions & 2 deletions usr/src/lib/libzfs/common/libzfs_import.c
Original file line number Diff line number Diff line change
Expand Up @@ -412,7 +412,8 @@ vdev_is_hole(uint64_t *hole_array, uint_t holes, uint_t id)
* return to the user.
*/
static nvlist_t *
get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boolean_t active_ok)
get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boolean_t active_ok,
nvlist_t *policy)
{
pool_entry_t *pe;
vdev_entry_t *ve;
Expand Down Expand Up @@ -746,6 +747,12 @@ get_configs(libzfs_handle_t *hdl, pool_list_t *pl, boolean_t active_ok)
continue;
}

if (policy != NULL) {
if (nvlist_add_nvlist(config, ZPOOL_REWIND_POLICY,
policy) != 0)
goto nomem;
}

if ((nvl = refresh_config(hdl, config)) == NULL) {
nvlist_free(config);
config = NULL;
Expand Down Expand Up @@ -1251,7 +1258,7 @@ zpool_find_import_impl(libzfs_handle_t *hdl, importargs_t *iarg)
goto error;
}

ret = get_configs(hdl, &pools, iarg->can_be_active);
ret = get_configs(hdl, &pools, iarg->can_be_active, iarg->policy);

error:
for (pe = pools.pools; pe != NULL; pe = penext) {
Expand Down Expand Up @@ -1381,6 +1388,14 @@ zpool_find_import_cached(libzfs_handle_t *hdl, const char *cachefile,
if (active)
continue;

if (nvlist_add_string(src, ZPOOL_CONFIG_CACHEFILE,
cachefile) != 0) {
(void) no_memory(hdl);
nvlist_free(raw);
nvlist_free(pools);
return (NULL);
}

if ((dst = refresh_config(hdl, src)) == NULL) {
nvlist_free(raw);
nvlist_free(pools);
Expand Down
5 changes: 3 additions & 2 deletions usr/src/lib/libzfs/common/libzfs_pool.c
Original file line number Diff line number Diff line change
Expand Up @@ -1808,8 +1808,9 @@ zpool_import_props(libzfs_handle_t *hdl, nvlist_t *config, const char *newname,
nvlist_lookup_nvlist(nvinfo,
ZPOOL_CONFIG_MISSING_DEVICES, &missing) == 0) {
(void) printf(dgettext(TEXT_DOMAIN,
"The devices below are missing, use "
"'-m' to import the pool anyway:\n"));
"The devices below are missing or "
"corrupted, use '-m' to import the pool "
"anyway:\n"));
print_vdev_tree(hdl, NULL, missing, 2);
(void) printf("\n");
}
Expand Down
10 changes: 10 additions & 0 deletions usr/src/lib/libzpool/common/kernel.c
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,16 @@ kernel_fini(void)
system_taskq_fini();
}

/* ARGSUSED */
uint32_t
zone_get_hostid(void *zonep)
{
/*
* We're emulating the system's hostid in userland.
*/
return (strtoul(hw_serial, NULL, 10));
}

int
z_uncompress(void *dst, size_t *dstlen, const void *src, size_t srclen)
{
Expand Down
1 change: 1 addition & 0 deletions usr/src/lib/libzpool/common/sys/zfs_context.h
Original file line number Diff line number Diff line change
Expand Up @@ -317,6 +317,7 @@ typedef struct callb_cpr {

#define zone_dataset_visible(x, y) (1)
#define INGLOBALZONE(z) (1)
extern uint32_t zone_get_hostid(void *zonep);

extern int zfs_secpolicy_snapshot_perms(const char *name, cred_t *cr);
extern int zfs_secpolicy_rename_perms(const char *from, const char *to,
Expand Down
33 changes: 33 additions & 0 deletions usr/src/pkg/manifests/system-test-zfstest.mf
Original file line number Diff line number Diff line change
Expand Up @@ -1465,10 +1465,43 @@ file \
mode=0444
file path=opt/zfs-tests/tests/functional/cli_root/zpool_import/cleanup \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_added \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_removed \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_device_replaced \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_attached \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_mirror_detached \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_cachefile_shared_device \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_devices_missing \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_paths_changed \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_rewind_config_changed \
mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/import_rewind_device_replaced \
mode=0555
file path=opt/zfs-tests/tests/functional/cli_root/zpool_import/setup mode=0555
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.cfg \
mode=0444
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import.kshlib \
mode=0444
file \
path=opt/zfs-tests/tests/functional/cli_root/zpool_import/zpool_import_001_pos \
mode=0555
Expand Down
12 changes: 11 additions & 1 deletion usr/src/test/zfs-tests/runfiles/delphix.run
Original file line number Diff line number Diff line change
Expand Up @@ -279,7 +279,17 @@ tests = ['zpool_import_001_pos', 'zpool_import_002_pos',
'zpool_import_features_001_pos', 'zpool_import_features_002_neg',
'zpool_import_features_003_pos', 'zpool_import_missing_001_pos',
'zpool_import_missing_002_pos', 'zpool_import_missing_003_pos',
'zpool_import_rename_001_pos']
'zpool_import_rename_001_pos',
'import_cachefile_device_added',
'import_cachefile_device_removed',
'import_cachefile_mirror_attached',
'import_cachefile_mirror_detached',
'import_cachefile_device_replaced',
'import_rewind_config_changed',
'import_rewind_device_replaced',
'import_cachefile_shared_device',
'import_paths_changed',
'import_devices_missing']

[/opt/zfs-tests/tests/functional/cli_root/zpool_labelclear]
tests = ['zpool_labelclear_active', 'zpool_labelclear_exported']
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
#!/usr/bin/ksh -p

#
# This file and its contents are supplied under the terms of the
# Common Development and Distribution License ("CDDL"), version 1.0.
# You may only use this file in accordance with the terms of version
# 1.0 of the CDDL.
#
# A full copy of the text of the CDDL should have accompanied this
# source. A copy of the CDDL is also available via the Internet at
# http://www.illumos.org/license/CDDL.
#

#
# Copyright (c) 2016 by Delphix. All rights reserved.
#

. $STF_SUITE/tests/functional/cli_root/zpool_import/zpool_import.kshlib

#
# DESCRIPTION:
# A pool should be importable using an outdated cachefile that is unaware
# that one or two top-level vdevs were added.
#
# STRATEGY:
# 1. Create a pool with some devices and an alternate cachefile.
# 2. Backup the cachefile.
# 3. Add a device/mirror/raid to the pool.
# 4. Export the pool.
# 5. Verify that we can import the pool using the backed-up cachefile.
#

verify_runnable "global"

log_onexit cleanup

function test_add_vdevs
{
typeset poolcreate="$1"
typeset addvdevs="$2"
typeset poolcheck="$3"

log_note "$0: pool '$poolcreate', add $addvdevs."

log_must zpool create -o cachefile=$CPATH $TESTPOOL1 $poolcreate

log_must cp $CPATH $CPATHBKP

log_must zpool add -f $TESTPOOL1 $addvdevs

log_must zpool export $TESTPOOL1

log_must zpool import -c $CPATHBKP $TESTPOOL1
log_must check_pool_config $TESTPOOL1 "$poolcheck"

# Cleanup
log_must zpool destroy $TESTPOOL1
log_must rm -f $CPATH $CPATHBKP

log_note ""
}

test_add_vdevs "$VDEV0" "$VDEV1" "$VDEV0 $VDEV1"
test_add_vdevs "$VDEV0 $VDEV1" "$VDEV2" "$VDEV0 $VDEV1 $VDEV2"
test_add_vdevs "$VDEV0" "$VDEV1 $VDEV2" "$VDEV0 $VDEV1 $VDEV2"
test_add_vdevs "$VDEV0" "mirror $VDEV1 $VDEV2" \
"$VDEV0 mirror $VDEV1 $VDEV2"
test_add_vdevs "mirror $VDEV0 $VDEV1" "mirror $VDEV2 $VDEV3" \
"mirror $VDEV0 $VDEV1 mirror $VDEV2 $VDEV3"
test_add_vdevs "$VDEV0" "raidz $VDEV1 $VDEV2 $VDEV3" \
"$VDEV0 raidz $VDEV1 $VDEV2 $VDEV3"
test_add_vdevs "$VDEV0" "log $VDEV1" "$VDEV0 log $VDEV1"
test_add_vdevs "$VDEV0 log $VDEV1" "$VDEV2" "$VDEV0 $VDEV2 log $VDEV1"
test_add_vdevs "$VDEV0" "$VDEV1 log $VDEV2" "$VDEV0 $VDEV1 log $VDEV2"

log_pass "zpool import -c cachefile_unaware_of_add passed."
Loading

0 comments on commit 619c012

Please sign in to comment.