Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add statechange event when drive is replaced #9523

Closed
wants to merge 1 commit into from

Conversation

cvoltz
Copy link
Contributor

@cvoltz cvoltz commented Oct 28, 2019

Motivation and Context

Fix #9437
Fix LU-12836

Description

When the zpool online command is used to bring a faulted or offline drive back online, a resource.fs.zfs.statechange event is generated. When the zpool replace command is used to bring a faulted or offline drive back online, a statechange event is not generated. Add the missing statechange event after resilvering has finished. The new sequence of events looks like
this:

    sysevent.fs.zfs.vdev_attach
    sysevent.fs.zfs.resilver_start
    sysevent.fs.zfs.history_event (scan setup)
    sysevent.fs.zfs.history_event (scan done)
    sysevent.fs.zfs.resilver_finish
    sysevent.fs.zfs.config_sync
  + resource.fs.zfs.statechange
    sysevent.fs.zfs.vdev_remove
    sysevent.fs.zfs.history_event (vdev attach)
    sysevent.fs.zfs.config_sync
    sysevent.fs.zfs.history_event (detach)

The statechange event looks like this:

Oct 16 2019 12:04:08.424390273 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "ost04"
        pool_guid = 0x8159dca79b3945a4
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x8159dca79b3945a4
        vdev_state = "ONLINE" (0x7)
        vdev_laststate = "UNKNOWN" (0x0)
        time = 0x5da74d88 0x194bae81
        eid = 0x3c

How Has This Been Tested?

Updated test events_001_pos.ksh to verify the statechange event is generated. Ran the full set of functional tests using the updated ZFS Test suite on CentOS 7.6 using the CentOS 7 v1905.1 Vagrant image. See cvoltz/zfs-test-suite for scripts which use Ansible and Vagrant to provision a Vagrant VM, using libvirt, to build and run the ZFS test suite on a specified branch of a specified repository.

Also tested the patch with Lustre 2.12.2-1 and ZFS 0.7.13-1 using the test-degraded-drive test script. See the LU-12836 ticket for details.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Performance enhancement (non-breaking change which improves efficiency)

  • Code cleanup (non-breaking change which makes code smaller or more readable)

  • Breaking change (fix or feature that would cause existing functionality to change)

  • Documentation (a change to man pages or other documentation)

  • My code follows the ZFS on Linux code style requirements.

  • I have updated the documentation accordingly.

  • I have read the contributing document.

  • I have added tests to cover my changes.

  • All new and existing tests passed.

  • All commit messages are properly formatted and contain Signed-off-by.

When the "zpool online" command is used bring a faulted or
offline drive back online, a resource.fs.zfs.statechange event
is generated. When the "zpool replace" command is used to bring
a faulted or offline drive back online, a statechange event is
not generated. Add the missing statechange event after
resilvering has finished. The new sequence of events looks like
this:
    sysevent.fs.zfs.vdev_attach
    sysevent.fs.zfs.resilver_start
    sysevent.fs.zfs.history_event (scan setup)
    sysevent.fs.zfs.history_event (scan done)
    sysevent.fs.zfs.resilver_finish
    sysevent.fs.zfs.config_sync
  + resource.fs.zfs.statechange
    sysevent.fs.zfs.vdev_remove
    sysevent.fs.zfs.history_event (vdev attach)
    sysevent.fs.zfs.config_sync
    sysevent.fs.zfs.history_event (detach)

Signed-off-by: Christopher Voltz <christopher.voltz@hpe.com>
External-issue: LU-12836
Closes openzfs#9437
@ikozhukhov
Copy link
Contributor

scenario:
VDEV1 has VDEVID=WWN1

we put VDEV1 to OFFLINE in zpool -> replace by VDEV2 (with WWN2) in the same slot
zpool contain VDEV1 in pool and if you try it 'ONLINE' - it will say: you missed VDEV
we have to use zpool replace for replace VDEV1 to VDEV2 in zpool
what scenario you try to resolve by this update?

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Oct 28, 2019
@cvoltz
Copy link
Contributor Author

cvoltz commented Oct 28, 2019

I think the scenario is fairly well documented in the #9437 issue and the LU-12836 ticket. Please refer to them for the details. In short, a drive fails (or is brought offline) and zpool replace is used to replace the failed drive with a drive (a cold-spare) from another bay. For example:

zpool replace ost04 d8000_sep500C0FF03C1AC73E_bay101-0 d8000_sep500C0FF03C1AC73E_bay050-0

} else if (state == VDEV_STATE_ONLINE &&
vd->vdev_prevstate == VDEV_STATE_UNKNOWN) {
/* post an online state change */
zfs_post_state_change(spa, vd, vd->vdev_prevstate);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good thought, but unfortunately this doesn't work quite as intended. In my manual testing this will result in "ONLINE" resource.fs.zfs.statechange events being generated for both the top-level mirror/raidz vdev and the old replaced leaf vdev guid (not the new vdev guid).

For this to behave as intended we'd need to add logic to the end of vdev_set_state(), and somehow not filter out the initial open event from a replaced vdev.

                /* filter out state change due to initial vdev_open */
                if (save_state > VDEV_STATE_CLOSED)
                        zfs_post_state_change(spa, vd, save_state);

Alternately, we could perhaps add the event to the end of spa_vdev_attach() which handles the replacement itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I'll look into the second option since that seems simpler to me.

Can you please provide the steps you did to see the duplicate events? Since I didn't see that in my testing, I would like to update my test steps so I don't miss it again.

Should I close this PR and open another one when I have a different patch or just force push this one with the new patch when I have it ready?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cvoltz @tonyhutter I think we should take a step and reconsider the desired behavior.

Currently the code intentionally suppresses all statechange events when opening a device. This is why there are no such events posted when running zpool import, when using zpool add|attach, or zpool replace as in this case. Instead there are explicit add, attach, etc, events.

Would be more convenient if we generated events every time we opened a device (offline -> online)? Our would that generate too much noise in the event logs?

Or should we only generate the events for new devices? In which case we'd want to handle zpool create. zpool add, zpool attach, zpool split as well. This would in part be redudant with the config_sync event which is posted every time there's a configuration change. This does include a zpool replace.

Can you please provide the steps you did to see the duplicate events?

I was able to reproduce this by running zpool replace with the device online. Two state changes events are posted and you can see one is for the top-level vdev by checking the reported guid in the event. You can use zpool status -g to print each vdevs guid.

Should I close this PR and open another one when I have a different patch or just force push this one with the new patch when I have it ready?

Let's leave this open at least until we hash out the best way to handle this. Then I'd suggest force updating the PR so we can keep the discussion in one place.

@codecov
Copy link

codecov bot commented Oct 28, 2019

Codecov Report

Merging #9523 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9523      +/-   ##
==========================================
+ Coverage   79.11%   79.12%   +<.01%     
==========================================
  Files         416      416              
  Lines      123652   123655       +3     
==========================================
+ Hits        97831    97840       +9     
+ Misses      25821    25815       -6
Flag Coverage Δ
#kernel 79.71% <100%> (+0.01%) ⬆️
#user 67.15% <100%> (+0.25%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e357046...af16615. Read the comment docs.

@tonyhutter
Copy link
Contributor

My gut feeling would be to add an additional statechange event for all attach events, except for when the disk is being added at import. I don't know off the top of my head what you'd key off of to know if you're importing or not, but there has to be something. You'd also not want to generate any statechange events for the "OLD" vdev ("OLD" is the actual vdev name), which is an internal vdev that gets temporary created while you're doing an attach (in fact, you probably don't want to generate any events at all for OLD).

I don't know what a good vdev_laststate would be for the newly added disks. Maybe keep it as UNKNOWN like you have now?

@cvoltz
Copy link
Contributor Author

cvoltz commented Oct 29, 2019

As we typically have hundreds of drives attached to a system, I would prefer not to see an event for every drive at import (or export). Having additional statechange events for the other attach events makes sense to me.

Generally symmetry is good so it begs the question of whether there should be equivalent statechange events when a drive is detached. Do these already exist for all cases? From the POV of the statechange-lustre.sh script, I just need to make sure that I can get notified when the pool changes back and forth between degraded and online.

To make sure I have all of the test cases, we are talking about having statechange events for
zpool add|attach|create|replace|online|split and, possibly, for zpool detach|offline|remove|replace?

@tonyhutter
Copy link
Contributor

@cvoltz sorry, this PR completely fell off my radar. To answer your question, I'd be only looking for a test case to verify the statechange after a replace. If you want to add test case to test for statechange on all attach/remove events, all the better. See tests/zfs-tests/tests/functional/fault/{setup.ksh, zpool_status_-s.ksh, cleanup.ksh} for examples on how to start/stop ZED and look at events using the test suite.

@tonyhutter
Copy link
Contributor

I recently rebased this PR on master (still merges cleanly) and gave it a test. The good news is that I'm seeing a statechange event on replace. The bad news is that I'm seeing multiple statechange events.

$ sudo ./cmd/zpool/zpool events   
TIME                           CLASS
$ sudo ./cmd/zpool/zpool replace tank U2 U3
$ sudo ./cmd/zpool/zpool events 
TIME                           CLASS
Mar 10 2021 14:41:07.059645027 resource.fs.zfs.statechange
Mar 10 2021 14:41:07.059645027 resource.fs.zfs.statechange
Mar 10 2021 14:41:07.059645027 resource.fs.zfs.statechange
Mar 10 2021 14:41:07.059645027 sysevent.fs.zfs.vdev_attach
Mar 10 2021 14:41:07.475643431 sysevent.fs.zfs.resilver_start
Mar 10 2021 14:41:07.475643431 sysevent.fs.zfs.history_event
Mar 10 2021 14:41:07.758642342 sysevent.fs.zfs.config_sync
Mar 10 2021 14:41:08.625639015 sysevent.fs.zfs.history_event
Mar 10 2021 14:41:08.625639015 sysevent.fs.zfs.history_event
Mar 10 2021 14:41:08.625639015 sysevent.fs.zfs.resilver_finish
Mar 10 2021 14:41:08.904637933 resource.fs.zfs.statechange
Mar 10 2021 14:41:08.904637933 resource.fs.zfs.statechange
Mar 10 2021 14:41:08.904637933 sysevent.fs.zfs.vdev_remove
Mar 10 2021 14:41:09.232636672 sysevent.fs.zfs.config_sync

The other bad news is that the statechange events it generates don't contain the vdev_path or other info needed to tell us which vdev is being replace. Here's what it gives us:

Mar 10 2021 15:29:03.900605785 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "tank"
        pool_guid = 0x7f96989abc2bd061
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x7f96989abc2bd061
        vdev_state = "ONLINE" (0x7)
        vdev_laststate = "UNKNOWN" (0x0)
        time = 0x6049563f 0x35ae2759 
        eid = 0x1f

Now compare this to the statechange from a zpool offline -f <vdev>:

Mar 10 2021 15:26:20.405230670 resource.fs.zfs.statechange
        version = 0x0
        class = "resource.fs.zfs.statechange"
        pool = "tank"
        pool_guid = 0x7f96989abc2bd061
        pool_state = 0x0
        pool_context = 0x0
        vdev_guid = 0x7581c66b53ae29af
        vdev_state = "FAULTED" (0x5)
        vdev_path = "/dev/disk/by-vdev/U3"
        vdev_devid = "dm-uuid-mpath-35000c500a61ce91f"
        vdev_physpath = "U3"
        vdev_enc_sysfs_path = "/sys/class/enclosure/10:0:4:0/SLOT  4 14  "
        vdev_laststate = "ONLINE" (0x7)
        time = 0x6049559c 0x1827544e 
        eid = 0x15

@tonyhutter
Copy link
Contributor

@cvoltz I just submitted this patch to Lustre:

LU-12836 osd-zfs: Catch all ZFS pool change events
https://review.whamcloud.com/#/c/43552/

It creates symlinks to statechange-lustre for all the other events that could cause a pool status change. If that's a sufficient fix for you, then we may be able to get away without having to make any changes to ZFS.

@cvoltz cvoltz closed this May 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Code Review Needed Ready for review and testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

statechange event not always generated when VDEV state changes
4 participants