vdev_mirror: don't scrub/resilver devices that can't be read #11930

nwf · 2021-04-22T19:25:26Z

Motivation and Context

This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't needlessly create
DTL entries for offline devices that are already up-to-date.

Consider a 3-way mirror, with disk A always online (and so always with an empty
DTL) and B and C only occasionally online. When A & B resilver with C offline,
B's DTL will effectively be appended to C's due to these spurious ZIOs even as
the resilver empties B's DTL:

These ZIOs eventually land in vdev_mirror_scrub_done() and flag an error
That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so
it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD
because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS.
That ZIO fails, too, and eventually zio_done() gets its hands on it and
calls vdev_stat_update().
vdev_stat_update() sees the error and this zio...
- is not speculative,
- is not due to EIO (but rather ENXIO, since the device is closed)
- has an ->io_vd != NULL (specifically, the offline leaf device)
- is a write
- is for a txg != 0 (but rather the read block's physical birth txg)
- has ZIO_FLAG_SCAN_THREAD asserted
and so, vdev_stat_update() calls vdev_dtl_dirty() on the offline device.

Then, when A & C resilver with B offline, that story gets replayed and C's DTL
will be appended to B's.

In fact, one does not need this permanently-broken-mirror scenario to induce
badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for
all offline devices. These DTLs will persist until the entire mirror is
reassembled for the duration of the resilver, which, incidentally, will not
consider the devices with good data to be sources of good data in the case of a
read failure.

Description

To fix the above, just don't issue child zios to devices that are not considered readable.

How Has This Been Tested?

Manual inspection of DTLs with zdb after 2-online-of-3 mirror scrubs.

Types of changes

Bug fix (non-breaking change which fixes an issue)
Performance enhancement (non-breaking change which improves efficiency)

Checklist:

My code follows the OpenZFS code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
I have run the ZFS Test Suite with this change applied.
All commit messages are properly formatted and contain Signed-off-by.

This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>

nwf · 2021-04-24T11:06:51Z

Whoops; checkstyle has flagged the commit message as overly wide.

In thinking about it a little bit more, it's probably better to behave like vdev_mirror_child_select and mark the missing children as mc_tried = mc_skipped = 1 so that, if another error does occur, we won't end up trying to repair these offline children then, either.

nwf · 2021-04-25T11:09:35Z

Test failures appear unrelated, I think? Neither checkpoint_discard_busy nor initialize_import_export appear to use vdev_mirror.

behlendorf

Good find, I agree this is the right way to handle this. My only suggestion would be adding a small test case which verifies the DTLs are being updated correctly. Though, as long as it's been manually tested I don't think that's critical.

This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com> Closes openzfs#11930

behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 22, 2021

nwf force-pushed the 202104-mirror-no-zio-offline branch from c7b5ef0 to 5911599 Compare April 24, 2021 11:06

behlendorf approved these changes Apr 26, 2021

View reviewed changes

behlendorf requested a review from don-brady April 26, 2021 18:53

behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 27, 2021

behlendorf requested a review from mmaybee April 27, 2021 19:00

mmaybee approved these changes Apr 27, 2021

View reviewed changes

behlendorf merged commit 056a658 into openzfs:master Apr 28, 2021

nwf deleted the 202104-mirror-no-zio-offline branch April 28, 2021 00:55

nwf mentioned this pull request Jun 5, 2021

zhack scrub subcommand for offline scrubs in userland #6209

Open

12 tasks

nwf mentioned this pull request Jul 4, 2021

vdev_mirror: when resilvering, try reading first #12327

Open

7 tasks

nwf mentioned this pull request Jul 17, 2021

Scrub of 3-way mirror attributes errors to OFFLINE device, fails to self-heal? #11629

Closed

rincebrain mentioned this pull request Aug 6, 2021

chksum errors increasing on unavail device #12454

Closed

nwf mentioned this pull request May 20, 2022

allow disk rotation OFFLINE w/o marking pool DEGRADED #13475

Open

rincebrain mentioned this pull request May 2, 2023

checksum errors on offline devices #14815

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vdev_mirror: don't scrub/resilver devices that can't be read #11930

vdev_mirror: don't scrub/resilver devices that can't be read #11930

nwf commented Apr 22, 2021

nwf commented Apr 24, 2021

nwf commented Apr 25, 2021 •

edited

behlendorf left a comment

vdev_mirror: don't scrub/resilver devices that can't be read #11930

vdev_mirror: don't scrub/resilver devices that can't be read #11930

Conversation

nwf commented Apr 22, 2021

Motivation and Context

Description

How Has This Been Tested?

Types of changes

Checklist:

nwf commented Apr 24, 2021

nwf commented Apr 25, 2021 • edited

behlendorf left a comment

Choose a reason for hiding this comment

nwf commented Apr 25, 2021 •

edited