Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to import pool by 0.6.2 after resilver by 2014-01-16 git version #2094

Closed
AndCycle opened this issue Feb 1, 2014 · 6 comments
Closed
Milestone

Comments

@AndCycle
Copy link

AndCycle commented Feb 1, 2014

tl;dr

after resilver by git version,
the pool is no longer be able to be imported by zol 0.6.2/freebsd/solaris,
report as pool metadata is corrupted

longer version

here is raw dd dump for analysis and some related info,
you can use the raw dd dump for detail diagnosis,
http://www.andcycle.idv.tw/~andcycle/tmp/tmp/zol/20140201/

I got similar issue as this thread discussed, I don't have label issue,
http://comments.gmane.org/gmane.linux.file-systems.zfs.user/12105

Gentoo
sys-kernel/gentoo-sources-3.12.7
sys-kernel/gentoo-sources-3.12.8

timeline

longtime ago

zfs-0.6.2
create zroot
create zmess

2013/12/19

create ztmp

2013/12/21

zfs-0.6.2-r1

2014/01/16

zfs-0.6.2-r2

2014/01/16

zfs-9999

2014/01/23

zroot resilvered

2014/01/27

ztmp resilvered

2014/01/30

create zstor-past

2014/02/01

zfs-0.6.2-r3
zroot state FAULTED, The pool metadata is corrupted.
ztmp state FAULTED, The pool metadata is corrupted.
zmess state ONLINE
zstor-past state ONLINE

2014/02/01

zfs-9999
zroot state ONLINE
ztmp state ONLINE
zmess state ONLINE
zstor-past state ONLINE

that's all, I think,
I just messed up my production server,
now I am trapped with git version,
tell me if there is anymore information needed

@dweeezil
Copy link
Contributor

dweeezil commented Feb 1, 2014

@AndCycle I'm going to try to recreate this on my own while I'm downloading your sda2.xz image but I'd like to clarify the steps that caused the corruption. Am I correct in saying that your "ztmp" pool was created to demonstrate the problem and that the problem was caused by running a "zpool scrub" on it? Did the pool import properly under 0.6.2/FreeBSD/Solaris prior to running the scrub? Also, could you please post the exact version of the code you're running with cat /sys/module/zfs/version.

What did you do with the "ztmp" pool other than creating the 2 filesystems on it before running the scrub? Did you copy any files on to it?

@dweeezil
Copy link
Contributor

dweeezil commented Feb 1, 2014

@AndCycle I was able to reproduce the problem rather easily on my test system. No need for your image or anything else. I'm going to look into this issue now that I can reproduce it.

@dweeezil
Copy link
Contributor

dweeezil commented Feb 1, 2014

The problem is that in 1421c89 the size of a zbookmark_t was expanded. Unfortunately, the size of a zbookmark_t is, effectively, part of the expected on-disk format by the current ZFS code base. I've worked up a patch in dweeezil/zfs@c5223f5 which should only be considered a work-in-progress at this point. I have, however, tested it and it seems just fine. Read the commit message for details.

@AndCycle
Copy link
Author

AndCycle commented Feb 1, 2014

@dweeezil thanks for looking into this problem, probably will tryout your patch when I get free time :)

@ryao
Copy link
Contributor

ryao commented Feb 2, 2014

@dweeezil Nice work. This was on the short list of things that I wanted to debug before tagging a new Gentoo patch set. I planned to debug it before filing an issue, but @AndCycle beat me to reporting it and you appear to have beaten me to debugging it. :)

@AndCycle Nice description of the issue. Your report contains information that is far more useful than the information that I had to use. Last Monday, I discovered that HEAD broke backward compatibility. That put this on my short list of things to debug, but I had no idea what the trigger was and would not have spent much time on it until next week.

That being said, I am in a position to test and review this, but I might not have time until Monday.

dweeezil added a commit to dweeezil/zfs that referenced this issue Feb 2, 2014
Commit 1421c89 expanded the size of a zbookmark_t from 24 to 25 64-bit
values which similarly expands the size of the "scan" entry in the pool's
object directory and causes the pool to become un-importable by other
OpenZFS implementations.

This commit replaces the zbookmark_t member of dsl_scan_phys_t with 4
new members representing the contents of the zbookmark_t which insulates
the physical layout from changes to that structure.

NOTE: The same could be done with the ddt_bookmark_t in dsl-scan_phys_t.

Add a comptibility shim to dsl_scan_init() to support importing broken
pools that have the larger-than-normal "scan" entry.

Fixes openzfs#2094
dweeezil added a commit to dweeezil/zfs that referenced this issue Feb 3, 2014
…ctory.

Commit 1421c89 expanded the size of a zbookmark_t from 24 to 25 64-bit
values which similarly expands the size of the "scan" entry in the pool's
object directory and causes the pool to become un-importable by other
OpenZFS implementations.

This commit renames "struct zbookmark" to "struct zbookmark_phys" since
it is related to an on-disk format and adds a new "struct zbookmark" that
contains the former as its first member.  The effect is that the "struct
zbookmark" items written to the object directory in both the "scan" and
"bptree_obj" entries contain only the correct subset of the bookmark.

Fixes openzfs#2094
ryao added a commit to ryao/zfs that referenced this issue Feb 5, 2014
ZoL commit 1421c89 unintentionally
changed the disk format in a forward-compatible, but not backward
compatible way. This was accomplished by adding an entry to zbookmark_t,
which is included in a couple of on-disk structures. That lead to the
creation of pools with incorrect dsl_scan_phys_t objects that could only
be imported by versions of ZFSOnLinux containing that commit. Such pools
cannot be imported by other versions of ZFS. That including past and
future versions of ZoL. This corrects that problem by removing the field
that was added to zbookmark_t and also implementing a workaround that can
detect that error in a way that permits pool import to continue. A
kernel alert describing this will be printed to dmesg asking the system
administrator to run a scrub. A subsequent scrub will replace the
damaged dsl_scan_phys_t object, repairing the pool.

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

The function the field removed from zbookmark_t served is left
unimplemented until a later patch that will reimplement the field in a
way that does not affect the disk format. Lastly, it is expected that no
user data has been lost as a result of this erratum.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Original-patch-by: Tim Chase <tim@chase2k.com>
ryao added a commit to ryao/zfs that referenced this issue Feb 5, 2014
ZoL commit 1421c89 unintentionally
changed the disk format in a forward-compatible, but not backward
compatible way. This was accomplished by adding an entry to zbookmark_t,
which is included in a couple of on-disk structures. That lead to the
creation of pools with incorrect dsl_scan_phys_t objects that could only
be imported by versions of ZFSOnLinux containing that commit. Such pools
cannot be imported by other versions of ZFS. That including past and
future versions of ZoL. The additional field has been removed by a prior
patch. This patch permits affected pools to be imported and repaired by
implementing a workaround for the erratum. When this workaround is
triggered, a kernel alert describing this will be printed to dmesg
asking the system administrator to run a scrub. A subsequent scrub will
replace the damaged dsl_scan_phys_t object, repairing the pool.

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Closes openzfs#2094

Signed-off-by: Richard Yao <ryao@gentoo.org>
Original-patch-by: Tim Chase <tim@chase2k.com>
ryao added a commit to ryao/zfs that referenced this issue Feb 5, 2014
ZoL commit 1421c89 unintentionally
changed the disk format in a forward-compatible, but not backward
compatible way. This was accomplished by adding an entry to zbookmark_t,
which is included in a couple of on-disk structures. That lead to the
creation of pools with incorrect dsl_scan_phys_t objects that could only
be imported by versions of ZFSOnLinux containing that commit. Such pools
cannot be imported by other versions of ZFS. That includes past and
future versions of ZoL. The additional field has been removed by a prior
patch. This patch permits affected pools to be imported and repaired by
implementing a workaround for the erratum. When this workaround is
triggered, a kernel alert describing this will be printed to dmesg
asking the system administrator to run a scrub. A subsequent scrub will
replace the damaged dsl_scan_phys_t object, repairing the pool.

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Closes openzfs#2094

Original-patch-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 7, 2014
Allow the caller of the zpool-create.sh script to override
the default path where file vdevs are created.  This allows
for greated flexibilty when scripting.

Additionally, update the default path from /tmp/ to /var/tmp/
because these days /tmp/ is likely a ramdisk.  Even though
these files are sparse they may grow large in which case they
should be backed by a physical device.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 7, 2014
NOTES: This script still needs some polish and additional testing but
I wanted to make it available for review.  I'm also concerned about the
size of the reference pools in this patch.  I've attempted to make
them small and interesting but they double the size of the zfs.x.y.z
tarball (from 2MB to 4MB).  This will only get worse as more images
are added.  We may not want them to be part of the core git tree.

---

Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.2 master installed" \
     -p "zevo-1.1.1 zol-0.6.2 zol-0.6.2-173 master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.2       master          0.6.2-175_g36eb554
-----------------------------------------------------------------
Clone SPL       Local         Local           Skip
Clone ZFS       Local         Local           Skip
Build SPL       Pass          Pass            Skip
Build ZFS       Pass          Pass            Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass          Pass            Pass
zol-0.6.2       Pass          Pass            Pass
zol-0.6.2-173   Fail          Pass            Pass
master          Pass          Pass            Pass
installed       Pass          Pass            Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 8, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.2 master installed" \
     -p "zevo-1.1.1 zol-0.6.2 zol-0.6.2-173 master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.2       master          0.6.2-175_g36eb554
-----------------------------------------------------------------
Clone SPL       Local         Local           Skip
Clone ZFS       Local         Local           Skip
Build SPL       Pass          Pass            Skip
Build ZFS       Pass          Pass            Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass          Pass            Pass
zol-0.6.2       Pass          Pass            Pass
zol-0.6.2-173   Fail          Pass            Pass
master          Pass          Pass            Pass
installed       Pass          Pass            Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
Allow the caller of the zpool-create.sh script to override
the default path where file vdevs are created.  This allows
for greated flexibilty when scripting.

Additionally, update the default path from /tmp/ to /var/tmp/
because these days /tmp/ is likely a ramdisk.  Even though
these files are sparse they may grow large in which case they
should be backed by a physical device.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.2 master installed" \
     -p "zevo-1.1.1 zol-0.6.2 zol-0.6.2-173 master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.2       master          0.6.2-175_g36eb554
-----------------------------------------------------------------
Clone SPL       Local         Local           Skip
Clone ZFS       Local         Local           Skip
Build SPL       Pass          Pass            Skip
Build ZFS       Pass          Pass            Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass          Pass            Pass
zol-0.6.2       Pass          Pass            Pass
zol-0.6.2-173   Fail          Pass            Pass
master          Pass          Pass            Pass
installed       Pass          Pass            Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool status' command can be used to determine if a pool is impacted.
A message similar to the following means your pool must be scrubbed to
restore compatibility by replacing the damaged dsl_scan_phys_t object.

  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:

	NAME                              STATE     READ WRITE CKSUM
	zol-0.6.2-173                     ONLINE       0     0     0
	  raidz1-0                        ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev0  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev1  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev2  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev3  ONLINE       0     0     0

errors: No known data errors

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.2 master installed" \
     -p "zevo-1.1.1 zol-0.6.2 zol-0.6.2-173 master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.2       master          0.6.2-175_g36eb554
-----------------------------------------------------------------
Clone SPL       Local         Local           Skip
Clone ZFS       Local         Local           Skip
Build SPL       Pass          Pass            Skip
Build ZFS       Pass          Pass            Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass          Pass            Pass
zol-0.6.2       Pass          Pass            Pass
zol-0.6.2-173   Fail          Pass            Pass
master          Pass          Pass            Pass
installed       Pass          Pass            Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 11, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool status' command can be used to determine if a pool is impacted.
A message similar to the following means your pool must be scrubbed to
restore compatibility by replacing the damaged dsl_scan_phys_t object.

  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:

	NAME                              STATE     READ WRITE CKSUM
	zol-0.6.2-173                     ONLINE       0     0     0
	  raidz1-0                        ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev0  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev1  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev2  ONLINE       0     0     0
	    /var/tmp/zol-0.6.2-173/vdev3  ONLINE       0     0     0

errors: No known data errors

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
dweeezil added a commit to dweeezil/zfs that referenced this issue Feb 12, 2014
Add a new pool status, ZPOOL_STATUS_SCAN_ERRATA which indicates
the openzfs#2094 erratum has been detected.  This allows for a "zpool
import" or a "zpool status" to report the problem.
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 13, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.2 master installed" \
     -p "zevo-1.1.1 zol-0.6.2 zol-0.6.2-173 master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.2       master          0.6.2-175_g36eb554
-----------------------------------------------------------------
Clone SPL       Local         Local           Skip
Clone ZFS       Local         Local           Skip
Build SPL       Pass          Pass            Skip
Build ZFS       Pass          Pass            Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass          Pass            Pass
zol-0.6.2       Pass          Pass            Pass
zol-0.6.2-173   Fail          Pass            Pass
master          Pass          Pass            Pass
installed       Pass          Pass            Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 13, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 13, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can be imported using its name or numeric identifier,
	however there is a compatibility issue which should be corrected
	by running 'zpool scrub'
   see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can not be imported with this version of ZFS due to an
	active asynchronous destroy.  Revert to an earlier version and
	allow the destroy to complete before updating.
   see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 14, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.1 zfs-0.6.2 master installed" \
     -p "all master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.1       zfs-0.6.2       master          0.6.2-180
-----------------------------------------------------------------
Clone SPL       Skip		Skip		Skip		Skip
Clone ZFS       Skip		Skip		Skip		Skip
Build SPL       Skip		Skip		Skip		Skip
Build ZFS       Skip		Skip		Skip		Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass		Pass		Pass		Pass
zol-0.6.1       Pass		Pass		Pass		Pass
zol-0.6.2-173   Fail		Fail		Pass		Pass
zol-0.6.2       Pass		Pass		Pass		Pass
master          Fail		Fail		Pass		Pass
installed       Pass		Pass		Pass		Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 14, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 14, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can be imported using its name or numeric identifier,
	however there is a compatibility issue which should be corrected
	by running 'zpool scrub'
   see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can not be imported with this version of ZFS due to an
	active asynchronous destroy.  Revert to an earlier version and
	allow the destroy to complete before updating.
   see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release or any Gentoo backports. Only those using HEAD were
affected. However, this patch has a few limitations. There is no way to
detect a damaged dsl_scan_phys_t object when it has occurred on 32-bit
systems due to integer rounding that wrote incorrect information, but
avoided the overflow on them. Correcting such issues requires triggering
a scrub.

In addition, bptree_entry_phys_t objects are also affected. These
objects only exist during an asynchronous destroy and automating repair
of damaged bptree_entry_phys_t objects is non-trivial. Any pools that
have been imported by an affected version of ZoL must have all
asynchronous destroy operations finish before export and subsequent
import by a version containing this commit.  Failure to do that will
prevent pool import. The presence of any background destroys on any
imported pools can be checked by running `zpool get freeing` as root.
This will display a non-zero value for any pool with an active
asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 21, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.1 zfs-0.6.2 master installed" \
     -p "all master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.1       zfs-0.6.2       master          0.6.2-180
-----------------------------------------------------------------
Clone SPL       Skip		Skip		Skip		Skip
Clone ZFS       Skip		Skip		Skip		Skip
Build SPL       Skip		Skip		Skip		Skip
Build ZFS       Skip		Skip		Skip		Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass		Pass		Pass		Pass
zol-0.6.1       Pass		Pass		Pass		Pass
zol-0.6.2-173   Fail		Fail		Pass		Pass
zol-0.6.2       Pass		Pass		Pass		Pass
master          Fail		Fail		Pass		Pass
installed       Pass		Pass		Pass		Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 21, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 21, 2014
Both the zpool_import_status() and zpool_get_status() functions
return the zpool_status_t enum.  This explicit type should be used
rather than the more generic int type.

This patch makes no functional change and should only be considered
code cleanup.  It happens to have been done in the context of openzfs#2094
because that's when I noticed this issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue openzfs#2094
behlendorf added a commit to behlendorf/zfs that referenced this issue Feb 21, 2014
From time to time it may be nessisary to inform the pool administator
about an errata which impacts their pool.  These errata will by shown
to the administrator through the 'zpool status' and 'zpool import'
output as appropriate.  The errata must clearly describe the issue
detected, how the pool is impacted, and what action should be taken
to resolve the situation.  Additional information for each errata will
be provided at http://zfsonlinux.org/msg/ZFS-8000-ER.

To accomplish the above this patch adds the required infrastructure to
allow the kernel modules to notify the utilities that an errata has
been detected.  This is done through the ZPOOL_CONFIG_ERRATA uint64_t
which has been added to the pool configuration nvlist.

To add a new errata the following changes must be made:

* A new errata identifier must be assigned by adding a new enum value
  to the zpool_errata_t type.  New enums must be added to the end to
  preserve the existing ordering.

* Code must be added to detect the issue.  This does not strictly
  need to be done at pool import time but doing so will make the
  errata visible in 'zpool import' as well as 'zpool status'.  Once
  detected the spa->spa_errata member should be set to the new enum.

* If possible code should be added to clear the spa->spa_errata member
  once the errata has been resolved.

* The show_import() and status_callback() functions must be updated
  to include an informational message describing the errata.  This
  should include an action message decribing what an administrator
  should do to address the errata.

* The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be
  updated to describe the errata.  This space can be used to provide
  as much additional information as needed to fully describe the errata.
  A link to this documentation will be automatically generated in the
  output of 'zpool import' and 'zpool status'.

Original-idea-by: Time Chaos <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue openzfs#2094
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Feb 21, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can be imported using its name or numeric identifier,
         however there is a compatibility issue which should be corrected
         by running 'zpool scrub'
    see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs#2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #2 detected.
 action: The pool can not be imported with this version of ZFS due to an
         active asynchronous destroy.  Revert to an earlier version and
         allow the destroy to complete before updating.
         see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release, Gentoo backports, Ubuntu, Debian, Fedora, or EPEL
stable respositorys.  Only those using the HEAD version directly from
Github after the 0.6.2 but before the 0.6.3 tag are affected.

This patch does have one limitation that should be mentioned.  It will not
detect errata #2 on a pool unless errata #1 is also present.  It expected
this will not be a significant problem because pools impacted by errata #2
have a high probably of being impacted by errata #1.

End users can ensure they do no hit this unlikely case by waiting for all
asynchronous destroy operations to complete before updating ZoL.  The
presence of any background destroys on any imported pools can be checked
by running `zpool get freeing` as root.  This will display a non-zero
value for any pool with an active asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Reworked-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#2094
behlendorf added a commit that referenced this issue Feb 21, 2014
Verify that an assortment of known good reference pools can be imported
using different versions of the ZoL code.

By default references pools for the major ZFS implementation will be
checked against the most recent ZoL tags and the master development branch.
Alternate tags or branches may be verified with the '-s <src-tag> option.
Passing the keyword "installed" will instruct the script to test whatever
version is installed.

Preferentially a reference pool is used for all tests.  However, if one
does not exist and the pool-tag matches one of the src-tags then a new
reference pool will be created using binaries from that source build.
This is particularly useful when you need to test your changes before
opening a pull request.

New reference pools may be added by placing a bzip2 compressed tarball
of the pool in the scripts/zpool-example directory and then passing
the -p <pool-tag> option.  To increase the test coverage reference pools
should be collected for all the major ZFS implementations.  Having these
pools easily available is also helpful to the developers.

Care should be taken to run these tests with a kernel supported by all
the listed tags.  Otherwise build failure will cause false positives.

EXAMPLES:

The following example will verify the zfs-0.6.2 tag, the master branch,
and the installed zfs version can correctly import the listed pools.
Note there is no reference pool available for master and installed but
because binaries are available one is automatically constructed.  The
working directory is also preserved between runs (-k) preventing the
need to rebuild from source for multiple runs.

 zimport.sh -k -f /var/tmp/zimport \
     -s "zfs-0.6.1 zfs-0.6.2 master installed" \
     -p "all master installed"

--------------------- ZFS on Linux Source Versions --------------
                zfs-0.6.1       zfs-0.6.2       master          0.6.2-180
-----------------------------------------------------------------
Clone SPL       Skip		Skip		Skip		Skip
Clone ZFS       Skip		Skip		Skip		Skip
Build SPL       Skip		Skip		Skip		Skip
Build ZFS       Skip		Skip		Skip		Skip
-----------------------------------------------------------------
zevo-1.1.1      Pass		Pass		Pass		Pass
zol-0.6.1       Pass		Pass		Pass		Pass
zol-0.6.2-173   Fail		Fail		Pass		Pass
zol-0.6.2       Pass		Pass		Pass		Pass
master          Fail		Fail		Pass		Pass
installed       Pass		Pass		Pass		Pass

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Issue #2094
behlendorf pushed a commit that referenced this issue Feb 21, 2014
Commit 1421c89 added a field to
zbookmark_t that unintentinoally caused a disk format change. This
negatively affected backward compatibility and platform portability.
Therefore, this field is being removed.

The function that field permitted is left unimplemented until a later
patch that will reimplement the field in a way that does not affect the
disk format.

Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2094
behlendorf added a commit that referenced this issue Feb 21, 2014
Both the zpool_import_status() and zpool_get_status() functions
return the zpool_status_t enum.  This explicit type should be used
rather than the more generic int type.

This patch makes no functional change and should only be considered
code cleanup.  It happens to have been done in the context of #2094
because that's when I noticed this issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue #2094
behlendorf added a commit that referenced this issue Feb 21, 2014
From time to time it may be necessary to inform the pool administrator
about an errata which impacts their pool.  These errata will by shown
to the administrator through the 'zpool status' and 'zpool import'
output as appropriate.  The errata must clearly describe the issue
detected, how the pool is impacted, and what action should be taken
to resolve the situation.  Additional information for each errata will
be provided at http://zfsonlinux.org/msg/ZFS-8000-ER.

To accomplish the above this patch adds the required infrastructure to
allow the kernel modules to notify the utilities that an errata has
been detected.  This is done through the ZPOOL_CONFIG_ERRATA uint64_t
which has been added to the pool configuration nvlist.

To add a new errata the following changes must be made:

* A new errata identifier must be assigned by adding a new enum value
  to the zpool_errata_t type.  New enums must be added to the end to
  preserve the existing ordering.

* Code must be added to detect the issue.  This does not strictly
  need to be done at pool import time but doing so will make the
  errata visible in 'zpool import' as well as 'zpool status'.  Once
  detected the spa->spa_errata member should be set to the new enum.

* If possible code should be added to clear the spa->spa_errata member
  once the errata has been resolved.

* The show_import() and status_callback() functions must be updated
  to include an informational message describing the errata.  This
  should include an action message describing what an administrator
  should do to address the errata.

* The documentation at http://zfsonlinux.org/msg/ZFS-8000-ER must be
  updated to describe the errata.  This space can be used to provide
  as much additional information as needed to fully describe the errata.
  A link to this documentation will be automatically generated in the
  output of 'zpool import' and 'zpool status'.

Original-idea-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue #2094
behlendorf pushed a commit that referenced this issue Feb 21, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #1 detected.
 action: The pool can be imported using its name or numeric identifier,
         however there is a compatibility issue which should be corrected
         by running 'zpool scrub'
    see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: #2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata #2 detected.
 action: The pool can not be imported with this version of ZFS due to an
         active asynchronous destroy.  Revert to an earlier version and
         allow the destroy to complete before updating.
         see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release, Gentoo backports, Ubuntu, Debian, Fedora, or EPEL
stable respositorys.  Only those using the HEAD version directly from
Github after the 0.6.2 but before the 0.6.3 tag are affected.

This patch does have one limitation that should be mentioned.  It will not
detect errata #2 on a pool unless errata #1 is also present.  It expected
this will not be a significant problem because pools impacted by errata #2
have a high probably of being impacted by errata #1.

End users can ensure they do no hit this unlikely case by waiting for all
asynchronous destroy operations to complete before updating ZoL.  The
presence of any background destroys on any imported pools can be checked
by running `zpool get freeing` as root.  This will display a non-zero
value for any pool with an active asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Reworked-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2094
@behlendorf
Copy link
Contributor

Merged as:

3965d2e Merge branch 'issue-2094'
4f2dcb3 Add erratum for issue #2094
ffe9d38 Add generic errata infrastructure
731782e Use expected zpool_status_t type
ed9e836 Revert changes to zbookmark_t
a16bc6b Add zimport.sh compatibility test script

ryao pushed a commit to ryao/zfs that referenced this issue Apr 9, 2014
Both the zpool_import_status() and zpool_get_status() functions
return the zpool_status_t enum.  This explicit type should be used
rather than the more generic int type.

This patch makes no functional change and should only be considered
code cleanup.  It happens to have been done in the context of openzfs#2094
because that's when I noticed this issue.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.or
Issue openzfs#2094
siboulet pushed a commit to siboulet/pkg-zfs that referenced this issue Jul 19, 2014
ZoL commit 1421c89 unintentionally changed the disk format in a forward-
compatible, but not backward compatible way. This was accomplished by
adding an entry to zbookmark_t, which is included in a couple of
on-disk structures. That lead to the creation of pools with incorrect
dsl_scan_phys_t objects that could only be imported by versions of ZoL
containing that commit.  Such pools cannot be imported by other versions
of ZFS or past versions of ZoL.

The additional field has been removed by the previous commit.  However,
affected pools must be imported and scrubbed using a version of ZoL with
this commit applied.  This will return the pools to a state in which they
may be imported by other implementations.

The 'zpool import' or 'zpool status' command can be used to determine if
a pool is impacted.  A message similar to one of the following means your
pool must be scrubbed to restore compatibility.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata zfsonlinux#1 detected.
 action: The pool can be imported using its name or numeric identifier,
         however there is a compatibility issue which should be corrected
         by running 'zpool scrub'
    see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

$ zpool status
  pool: zol-0.6.2-173
 state: ONLINE
  scan: pool compatibility issue detected.
   see: openzfs/zfs#2094
action: To correct the issue run 'zpool scrub'.
config:
...

If there was an async destroy in progress 'zpool import' will prevent
the pool from being imported.  Further advice on how to proceed will be
provided by the error message as follows.

$ zpool import
   pool: zol-0.6.2-173
     id: 1165955789558693437
  state: ONLINE
 status: Errata zfsonlinux#2 detected.
 action: The pool can not be imported with this version of ZFS due to an
         active asynchronous destroy.  Revert to an earlier version and
         allow the destroy to complete before updating.
         see: http://zfsonlinux.org/msg/ZFS-8000-ER
 config:
 ...

Pools affected by the damaged dsl_scan_phys_t can be detected prior to
an upgrade by running the following command as root:

zdb -dddd poolname 1 | grep -P '^\t\tscan = ' | sed -e 's;scan = ;;' | wc -w

Note that `poolname` must be replaced with the name of the pool you wish
to check. A value of 25 indicates the dsl_scan_phys_t has been damaged.
A value of 24 indicates that the dsl_scan_phys_t is normal. A value of 0
indicates that there has never been a scrub run on the pool.

The regression caused by the change to zbookmark_t never made it into a
tagged release, Gentoo backports, Ubuntu, Debian, Fedora, or EPEL
stable respositorys.  Only those using the HEAD version directly from
Github after the 0.6.2 but before the 0.6.3 tag are affected.

This patch does have one limitation that should be mentioned.  It will not
detect errata zfsonlinux#2 on a pool unless errata zfsonlinux#1 is also present.  It expected
this will not be a significant problem because pools impacted by errata zfsonlinux#2
have a high probably of being impacted by errata zfsonlinux#1.

End users can ensure they do no hit this unlikely case by waiting for all
asynchronous destroy operations to complete before updating ZoL.  The
presence of any background destroys on any imported pools can be checked
by running `zpool get freeing` as root.  This will display a non-zero
value for any pool with an active asynchronous destroy.

Lastly, it is expected that no user data has been lost as a result of
this erratum.

Original-patch-by: Tim Chase <tim@chase2k.com>
Reworked-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tim Chase <tim@chase2k.com>
Signed-off-by: Richard Yao <ryao@gentoo.org>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue #2094
dweeezil pushed a commit to dweeezil/zfs that referenced this issue Jul 30, 2014
4914 zfs on-disk bookmark structure should be named *_phys_t
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

Porting notes:

    There were a number of zfsonlinux-specific uses of zbookmark_t which
    needed to be updated.  This should reduce the likelihood of further
    problems like issue openzfs#2094 from occurring.

Ported by: Tim Chase <tim@chase2k.com>
dweeezil pushed a commit to dweeezil/zfs that referenced this issue Jul 31, 2014
4914 zfs on-disk bookmark structure should be named *_phys_t
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

Porting notes:

    There were a number of zfsonlinux-specific uses of zbookmark_t which
    needed to be updated.  This should reduce the likelihood of further
    problems like issue openzfs#2094 from occurring.

Ported by: Tim Chase <tim@chase2k.com>
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Aug 6, 2014
4914 zfs on-disk bookmark structure should be named *_phys_t

Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>

References:
  https://www.illumos.org/issues/4914
  illumos/illumos-gate@7802d7b

Porting notes:

There were a number of zfsonlinux-specific uses of zbookmark_t which
needed to be updated.  This should reduce the likelihood of further
problems like issue openzfs#2094 from occurring.

Ported by: Tim Chase <tim@chase2k.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#2558
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants