Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztest: uncorrectable I/O failure #939

Closed
dechamps opened this issue Sep 5, 2012 · 8 comments
Closed

ztest: uncorrectable I/O failure #939

dechamps opened this issue Sep 5, 2012 · 8 comments
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Milestone

Comments

@dechamps
Copy link
Contributor

dechamps commented Sep 5, 2012

As initially reported in #934, when running ztest, the following often happens after a while (output is maximum verbosity):

starting main threads...
ztest/ds_0 replay 0 blocks, 0 records, seq 0
ztest/ds_1 replay 0 blocks, 0 records, seq 0
  0.00 sec in (null)
ztest/ds_2 replay 0 blocks, 0 records, seq 0
ztest/ds_3 replay 0 blocks, 0 records, seq 0
  0.01 sec in (null)
  0.00 sec in (null)
ztest/ds_4 replay 0 blocks, 0 records, seq 0
  0.00 sec in (null)
  0.02 sec in (null)
  0.00 sec in (null)
  0.03 sec in (null)
  0.02 sec in (null)
  0.01 sec in (null)
  0.00 sec in (null)
Expanding LUN /tmp/ztest.0a from 75497472 to 84934656
/tmp/ztest.0a grew from 75497472 to 84934656 bytes
/tmp/ztest.1a grew from 75497472 to 84934656 bytes
/tmp/ztest.2a grew from 75497472 to 84934656 bytes
/tmp/ztest.3a grew from 75497472 to 84934656 bytes
/tmp/ztest.4a grew from 75497472 to 84934656 bytes
/tmp/zb grew from 75497472 to 84934656 bytes
/tmp/ztest.6a grew from 75497472 to 84934656 bytes
/tmp/ztest.7a grew from 75497472 to 84934656 bytes
error: Pool 'ztest' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.
child died with signal 6

Note that the error is the same each time, but the context (previous lines) is not constant.

This happens with 0.6.0-rc10, but not with 0.6.0-rc9.

@dechamps
Copy link
Contributor Author

dechamps commented Sep 6, 2012

Bissection results:

ba9b542 is good
3541dc6 is bad
c7f2d69 is bad

This seems to be caused by 3541dc6.

@dechamps
Copy link
Contributor Author

dechamps commented Sep 6, 2012

Confirmed: in latest master, if I comment out the contents of ztest_reguid(), ztest succeeds. There's definitely an issue with the reguid code introduced in 3541dc6.

@dechamps
Copy link
Contributor Author

I just posted to the Illumos ZFS mailing list about this, to make sure the issue really is limited to ZoL.

@dechamps
Copy link
Contributor Author

According to George Wilson from Illumos, the fix is in illumos/illumos-gate@dfbb943.

@mmatuska: you're the original porter. Can you please port the fix?

@dechamps
Copy link
Contributor Author

dechamps added a commit to dechamps/zfs that referenced this issue Sep 28, 2012
Currently, ztest fails with the following error:

    error: Pool 'ztest' has encountered an uncorrectable I/O failure and the failure mode property for this pool is set to panic.

We know how to fix it (see issue openzfs#939), but it may take some time
before we get around to merging the fix, which has some heavy
dependencies.

In the mean time, it is not ideal to be unable to use ztest just
because of a small isolated issue, so this patch works around the
problem by disabling the reguid test. This is just a temporary hack to
keep ztest usable.

The reguid test will be enabled again when the proper fix is merged.
@dechamps
Copy link
Contributor Author

#997 fixes the issue by disabling the test while we are waiting for the proper fix to be merged. This way we can still use ztest.

behlendorf pushed a commit that referenced this issue Oct 3, 2012
Currently, ztest fails with the following error:

    error: Pool 'ztest' has encountered an uncorrectable I/O failure
    and the failure mode property for this pool is set to panic.

We know how to fix it (see issue #939), but it may take some time
before we get around to merging the fix, which has some heavy
dependencies.

In the mean time, it is not ideal to be unable to use ztest just
because of a small isolated issue, so this patch works around the
problem by disabling the reguid test. This is just a temporary hack to
keep ztest usable.

The reguid test will be enabled again when the proper fix is merged.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #997
@dechamps
Copy link
Contributor Author

dechamps commented Oct 4, 2012

While the issue is technically fixed in d135245, let's keep this open so that we don't forget we must merge illumos/illumos-gate@dfbb943.

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 18, 2012
3090 vdev_reopen() during reguid causes vdev to be treated as corrupt
3102 vdev_uberblock_load() and vdev_validate() may read the wrong label

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#939
@behlendorf
Copy link
Contributor

We can close this once the feature-flags branch gets merged.

behlendorf added a commit to behlendorf/zfs that referenced this issue Dec 20, 2012
3090 vdev_reopen() during reguid causes vdev to be treated as corrupt
3102 vdev_uberblock_load() and vdev_validate() may read the wrong label

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#939
unya pushed a commit to unya/zfs that referenced this issue Dec 13, 2013
3090 vdev_reopen() during reguid causes vdev to be treated as corrupt
3102 vdev_uberblock_load() and vdev_validate() may read the wrong label

Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <chris.siden@delphix.com>
Reviewed by: Garrett D'Amore <garrett@damore.org>
Approved by: Eric Schrock <Eric.Schrock@delphix.com>

References:
  illumos/illumos-gate@dfbb943
  illumos changeset: 13777:b1e53580146d
  https://www.illumos.org/issues/3090
  https://www.illumos.org/issues/3102

Ported-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#939
pcd1193182 pushed a commit to pcd1193182/zfs that referenced this issue Sep 26, 2023
openzfs#939)

Bumps [tracing-attributes](https://github.com/tokio-rs/tracing) from 0.1.24 to 0.1.25.
- [Release notes](https://github.com/tokio-rs/tracing/releases)
- [Commits](tokio-rs/tracing@tracing-attributes-0.1.24...tracing-attributes-0.1.25)

---
updated-dependencies:
- dependency-name: tracing-attributes
  dependency-type: indirect
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case
Projects
None yet
Development

No branches or pull requests

2 participants