Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hostid deprecation causes zpool.cache mismatch and zpool import failure #2794

Closed
dajhorn opened this issue Oct 13, 2014 · 1 comment
Closed
Milestone

Comments

@dajhorn
Copy link
Contributor

dajhorn commented Oct 13, 2014

Commit openzfs/spl@acf0ade deprecates the /etc/hostid file, relaxes its handler, and sets a default of zero. The new default breaks userland imports by causing the /etc/zfs/zpool.cache file to be updated without a ZPOOL_CONFIG_HOSTID nvpair.

When the pool cache is in this state, invoked pool imports always fail. For example:

# zpool create tank ...
# zfs umount -a
# rmmod zfs
# modprobe zfs zfs_autoimport_disable=1
# zpool import tank
cannot import 'tank': pool may be in use from other system
use '-f' to import anyway

This happens because:

  1. The zpool program assumes that ZPOOL_CONFIG_HOSTID exists in the configuration at https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_main.c#L1919
  2. The zpool program still calls gethostid() from the system library, which causes an SPA mismatch if a generated value is returned at https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_main.c#L1921
  3. Unlike zpool, the libzfs library uses zero to skip its mismatch check at https://github.com/zfsonlinux/zfs/blob/master/lib/libzfs/libzfs_status.c#L222
  4. The zfs module uses zero to skip a VERIFY at https://github.com/zfsonlinux/zfs/blob/master/module/zfs/spa_config.c#L407

Automatic pool import is unaffected because it runs in the kernel on a different code path. This corner-case is easier to notice when the proposal in #2779 is enabled. (ie: zfs_autoimport_disable=1 is the system default.)

Fuzzing this behavior on a test bench sometimes causes the additional disappearance of ZPOOL_CONFIG_HOSTNAME, which causes assertion failures later.

Solaris 11 updates its /etc/zfs/zpool.cache file identically when its /etc/hostid file is forced to zero using the "_I________" string, but imports are not broken when the hostid is missing or the hostname is empty.

For code consistency, zpool could be patched to skip its mismatch check on zero too.

Alternatively, given that zero is a valid hostid on Linux and seems to be something special on Solaris, the solution could be one or more of:

  • Disable the hostid checks entirely.
  • Never write out the affected nvpair unless the /etc/hostid file actually exists.
  • Wrap the system gethostid() so that it behaves like zone_get_hostid() in the SPL.
  • Use any number except zero as the dummy value, preferably with upper bits that fit into the nvpair but are masked out by glibc.
dajhorn added a commit to dajhorn/zfs that referenced this issue Oct 13, 2014
Change the zpool program to skip its hostid mismatch check in the same way that
libzfs already does.

Invoked imports fail if the ZPOOL_CONFIG_HOSTID nvpair is missing in the
/etc/zfs/zpool.cache file, which can happen as of the /etc/hostid deprecation
in commit openzfs/spl@acf0ade.

Closes: openzfs#2794
dajhorn added a commit to dajhorn/zfs that referenced this issue Oct 13, 2014
Change the zpool program to skip its hostid mismatch check in the same way that
libzfs already does.

Invoked imports fail if the ZPOOL_CONFIG_HOSTID nvpair is missing in the
/etc/zfs/zpool.cache file, which can happen as of the /etc/hostid deprecation
in commit openzfs/spl@acf0ade.

Closes: openzfs#2794
@behlendorf behlendorf added this to the 0.6.4 milestone Oct 13, 2014
@behlendorf
Copy link
Contributor

@dajhorn Thanks for getting to the bottom on this. This clearly explains why this caused issues on some systems and not others.

For code consistency, zpool could be patched to skip its mismatch check on zero too.

I think we should definitely do this. For better or worse the upstream code is designed to use a hostid of 0 to disable these checks and it would be desirable to remain compatible with that logic.

Disable the hostid checks entirely.

Sadly we can't disable these checks entirely. For sites which are using ZFS in a legitimate failover configuration it's the only multimount protect they have. At least until a robust system is implemented like that described in #745.

Never write out the affected nvpair unless the /etc/hostid file actually exists.
Wrap the system gethostid() so that it behaves like zone_get_hostid() in the SPL.

I like this idea a lot. Unifying the behavior between user space and kernel space will simplify things. For example, we'd be able to remove the conditional logic here and here if we provided a zone_get_hostid() wrapper. It looks like we could retire the hw_serial global entirely and replace it with a global hostid variable just like was done in the SPL.

Can you propose a patch for this?

ryao pushed a commit to ryao/zfs that referenced this issue Nov 29, 2014
Change the zpool program to skip its hostid mismatch check in the
same way that libzfs already does.

Invoked imports fail if the ZPOOL_CONFIG_HOSTID nvpair is missing in
the /etc/zfs/zpool.cache file, which can happen as of the /etc/hostid
deprecation in commit openzfs/spl@acf0ade.

Signed-off-by: Darik Horn <dajhorn@vanadac.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#2794
wmertens referenced this issue in NixOS/nixpkgs Dec 16, 2014
The old boot.spl.hostid option was not working correctly due to an
upstream bug.

Instead, now we will create the /etc/hostid file so that all applications
(including the ZFS kernel modules, ZFS user-space applications and other
unrelated programs) pick-up the same system-wide host id. Note that glibc
(and by extension, the `hostid` program) also respect the host id configured in
/etc/hostid, if it exists.

The hostid option is now mandatory when using ZFS because otherwise, ZFS will
require you to force-import your ZFS pools if you want to use them, which is
undesirable because it disables some of the checks that ZFS does to make sure it
is safe to import a ZFS pool.

The /etc/hostid file must also exist when booting the initrd, before the SPL
kernel module is loaded, so that ZFS picks up the hostid correctly.

The complexity in creating the /etc/hostid file is due to having to
write the host ID as a 32-bit binary value, taking into account the
endianness of the machine, while using only shell commands and/or simple
utilities (to avoid exploding the size of the initrd).
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Jun 9, 2015
Change the zpool program to skip its hostid mismatch check in the
same way that libzfs already does.

Invoked imports fail if the ZPOOL_CONFIG_HOSTID nvpair is missing in
the /etc/zfs/zpool.cache file, which can happen as of the /etc/hostid
deprecation in commit openzfs/spl@acf0ade.

Signed-off-by: Darik Horn <dajhorn@vanadac.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#2794
demizer added a commit to archzfs/archzfs that referenced this issue May 29, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue May 29, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue Jun 11, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue Jun 13, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue Jul 2, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue Sep 4, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
demizer added a commit to archzfs/archzfs that referenced this issue Sep 10, 2016
According to

openzfs/spl@acf0ade
openzfs/zfs#2794

the hostid handling is not needed anymore. If /etc/hostid does not
exist, then spl treats it as 0 and continues operation.

Closes #60
Closes #31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants