New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vdev_mirror: don't scrub/resilver devices that can't be read #11930
Conversation
This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com>
c7b5ef0
to
5911599
Compare
Whoops; checkstyle has flagged the commit message as overly wide. In thinking about it a little bit more, it's probably better to behave like |
Test failures appear unrelated, I think? Neither |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good find, I agree this is the right way to handle this. My only suggestion would be adding a small test case which verifies the DTLs are being updated correctly. Though, as long as it's been manually tested I don't think that's critical.
This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com> Closes openzfs#11930
This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com> Closes openzfs#11930
This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com> Closes openzfs#11930
This ensures that we don't accumulate checksum errors against offline or unavailable devices but, more importantly, means that we don't needlessly create DTL entries for offline devices that are already up-to-date. Consider a 3-way mirror, with disk A always online (and so always with an empty DTL) and B and C only occasionally online. When A & B resilver with C offline, B's DTL will effectively be appended to C's due to these spurious ZIOs even as the resilver empties B's DTL: * These ZIOs land in vdev_mirror_scrub_done() and flag an error * That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS. * That ZIO fails, too, and eventually zio_done() gets its hands on it and calls vdev_stat_update(). * vdev_stat_update() sees the error and this zio... * is not speculative, * is not due to EIO (but rather ENXIO, since the device is closed) * has an ->io_vd != NULL (specifically, the offline leaf device) * is a write * is for a txg != 0 (but rather the read block's physical birth txg) * has ZIO_FLAG_SCAN_THREAD asserted * So: vdev_stat_update() calls vdev_dtl_dirty() on the offline vdev. Then, when A & C resilver with B offline, that story gets replayed and C's DTL will be appended to B's. In fact, one does not need this permanently-broken-mirror scenario to induce badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for all offline devices. These DTLs will persist until the entire mirror is reassembled for the duration of the *resilver*, which, incidentally, will not consider the devices with good data to be sources of good data in the case of a read failure. Reviewed-by: Mark Maybee <mark.maybee@delphix.com> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Nathaniel Wesley Filardo <nwfilardo@gmail.com> Closes openzfs#11930
Motivation and Context
This ensures that we don't accumulate checksum errors against offline or
unavailable devices but, more importantly, means that we don't needlessly create
DTL entries for offline devices that are already up-to-date.
Consider a 3-way mirror, with disk A always online (and so always with an empty
DTL) and B and C only occasionally online. When A & B resilver with C offline,
B's DTL will effectively be appended to C's due to these spurious ZIOs even as
the resilver empties B's DTL:
These ZIOs eventually land in vdev_mirror_scrub_done() and flag an error
That flagged error causes vdev_mirror_io_done() to see unexpected_errors, so
it issues a ZIO_TYPE_WRITE repair ZIO, which inherits ZIO_FLAG_SCAN_THREAD
because zio_vdev_child_io() includes that flag in ZIO_VDEV_CHILD_FLAGS.
That ZIO fails, too, and eventually zio_done() gets its hands on it and
calls vdev_stat_update().
vdev_stat_update() sees the error and this zio...
and so, vdev_stat_update() calls vdev_dtl_dirty() on the offline device.
Then, when A & C resilver with B offline, that story gets replayed and C's DTL
will be appended to B's.
In fact, one does not need this permanently-broken-mirror scenario to induce
badness: breaking a mirror with no DTLs and then scrubbing will create DTLs for
all offline devices. These DTLs will persist until the entire mirror is
reassembled for the duration of the resilver, which, incidentally, will not
consider the devices with good data to be sources of good data in the case of a
read failure.
Description
To fix the above, just don't issue child zios to devices that are not considered readable.
How Has This Been Tested?
Manual inspection of DTLs with
zdb
after 2-online-of-3 mirror scrubs.Types of changes
Checklist:
Signed-off-by
.