Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zed: Add deadman-slot_off.sh zedlet #16226

Merged
merged 1 commit into from
May 29, 2024
Merged

Commits on May 25, 2024

  1. zed: Add deadman-slot_off.sh zedlet

    Optionally turn off disk's enclosure slot if an I/O is hung
    triggering the deadman.
    
    It's possible for outstanding I/O to a misbehaving SCSI disk to
    neither promptly complete or return an error.  This can occur due
    to retry and recovery actions taken by the SCSI layer, driver, or
    disk.  When it occurs the pool will be unresponsive even though
    there may be sufficient redundancy configured to proceeded without
    this single disk.
    
    When a hung I/O is detected by the kmods it will be posted as a
    deadman event.  By default an I/O is considered to be hung after
    5 minutes.  This value can be changed with the zfs_deadman_ziotime_ms
    module parameter.  If ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is set
    the disk's enclosure slot will be powered off causing the outstanding
    I/O to fail.  The ZED will then handle this like a normal disk failure.
    By default ZED_POWER_OFF_ENCLOSURE_SLOT_ON_DEADMAN is not set.
    
    As part of this change `zfs_deadman_events_per_second` is added
    to control the ratelimitting of deadman events independantly of
    delay events.  In practice, a single deadman event is sufficient
    and more aren't particularly useful.
    
    Alphabetize the zfs_deadman_* entries in zfs.4.
    
    Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
    behlendorf committed May 25, 2024
    Configuration menu
    Copy the full SHA
    c9cafb0 View commit details
    Browse the repository at this point in the history