Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zinject: inject device errors into ioctls #16061

Merged
merged 1 commit into from
Apr 8, 2024

Conversation

robn
Copy link
Contributor

@robn robn commented Apr 3, 2024

Motivation and Context

I'm working on flush error responses. Being able to inject flush errors is very useful!

Description

Adds 'ioctl' as a valid IO type for device error injection, so we can simulate a flush error (which OpenZFS currently ignores, but that's by the by).

To support this, adding ZIO_STAGE_VDEV_IO_DONE to ZIO_IOCTL_PIPELINE, since that's where device error injection happens. This needs a small exclusion to avoid the vdev_queue, since flushes are not queued, and I'm assuming that the various failure responses are still reasonable for flush failures (probes, media change, etc). This seems reasonable to me, as a flush failure is not unlike a write failure in this regard, however this may be too aggressive or subtle to assume in just this change.

How Has This Been Tested?

OpenZFS currently ignores flush failures, so I added some logging just to show non-zero returns from ioctl IOs internally. It fired when a fault was injected, and were silent otherwise.

Light sanity checking suggests that when there's no fault injected, everything else is working fine.

Test suite will have to get the rest.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Apr 4, 2024
Copy link
Member

@amotin amotin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've never liked type IOCTL, since it is meaningless. I only hope that potential users of zinject know that in ZFS it means flush.

include/sys/zio_impl.h Show resolved Hide resolved
@robn
Copy link
Contributor Author

robn commented Apr 4, 2024

I've never liked type IOCTL, since it is meaningless. I only hope that potential users of zinject know that in ZFS it means flush.

Agreed. I wasn't too bothered because its "only" zinject, which you're already not supposed to use without a lot of knowledge and confidence.

I do have designs to change it to be just "flush" (much like "trim"), but it never really mattered too much to me and I was going to tackle it after this unit of work (this is an early commit in a much larger series).

I could take a swing at renaming it first, if you'd prefer? I know you weren't asking for that, but you're not wrong, and maybe it'd be better to clean it up now before making it more visible?

@amotin
Copy link
Member

amotin commented Apr 4, 2024

My only worry is that renaming may be pretty invasive, possibly complicating some merges, so I am torn.

Adds 'ioctl' as a valid IO type for device error injection, so we can
simulate a flush error (which OpenZFS currently ignores, but that's by
the by).

To support this, adding ZIO_STAGE_VDEV_IO_DONE to ZIO_IOCTL_PIPELINE,
since that's where device error injection happens. This needs a small
exclusion to avoid the vdev_queue, since flushes are not queued, and I'm
assuming that the various failure responses are still reasonable for
flush failures (probes, media change, etc). This seems reasonable to me,
as a flush failure is not unlike a write failure in this regard, however
this may be too aggressive or subtle to assume in just this change.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
@robn robn force-pushed the zinject-ioctl-device-errors branch from ee21245 to a572999 Compare April 4, 2024 03:11
@robn
Copy link
Contributor Author

robn commented Apr 4, 2024

My only worry is that renaming may be pretty invasive, possibly complicating some merges, so I am torn.

Rough version here: https://github.com/robn/zfs/commits/zio-ioctl-flush-rename/ (+158/-201)

@amotin
Copy link
Member

amotin commented Apr 4, 2024

Rough version here: https://github.com/robn/zfs/commits/zio-ioctl-flush-rename/ (+158/-201)

LGTM. Just the concept "TRIM ops and bytes are reported to user space as ZIO_TYPE_FLUSH." is weird. Also we could scrap io_cmd field and zio_ioctl() function after that.

@robn
Copy link
Contributor Author

robn commented Apr 5, 2024

LGTM. Just the concept "TRIM ops and bytes are reported to user space as ZIO_TYPE_FLUSH." is weird. Also we could scrap io_cmd field and zio_ioctl() function after that.

Thanks for that. Opened #16064.

Since its not really related to this PR, I'm happy to continue with this one as a separate thing. Whichever one lands first, I'll make the appropriate adjustment to the other.

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Apr 8, 2024
@behlendorf
Copy link
Contributor

Whichever one lands first, I'll make the appropriate adjustment to the other.

This looks good. Let's land this first and you can fix up the names in #16064.

@behlendorf behlendorf merged commit 76d1dde into openzfs:master Apr 8, 2024
24 of 26 checks passed
amotin added a commit to amotin/zfs that referenced this pull request Apr 18, 2024
Before openzfs#16061 zio_vdev_io_done() was not used for FLUSH requests.
Addition of it triggers reprobe each TXG for vdevs not supporting
them.  Since those errors are often expected, they are normally
handled by individual vdev drivers and should be ignored here.

Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
@amotin amotin mentioned this pull request Apr 18, 2024
13 tasks
behlendorf pushed a commit that referenced this pull request Apr 19, 2024
Before #16061 zio_vdev_io_done() was not used for FLUSH requests.
Addition of it triggers reprobe each TXG for vdevs not supporting
them.  Since those errors are often expected, they are normally
handled by individual vdev drivers and should be ignored here.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Closes #16110
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants