New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add statechange event when drive is replaced #9523
Conversation
When the "zpool online" command is used bring a faulted or
offline drive back online, a resource.fs.zfs.statechange event
is generated. When the "zpool replace" command is used to bring
a faulted or offline drive back online, a statechange event is
not generated. Add the missing statechange event after
resilvering has finished. The new sequence of events looks like
this:
sysevent.fs.zfs.vdev_attach
sysevent.fs.zfs.resilver_start
sysevent.fs.zfs.history_event (scan setup)
sysevent.fs.zfs.history_event (scan done)
sysevent.fs.zfs.resilver_finish
sysevent.fs.zfs.config_sync
+ resource.fs.zfs.statechange
sysevent.fs.zfs.vdev_remove
sysevent.fs.zfs.history_event (vdev attach)
sysevent.fs.zfs.config_sync
sysevent.fs.zfs.history_event (detach)
Signed-off-by: Christopher Voltz <christopher.voltz@hpe.com>
External-issue: LU-12836
Closes openzfs#9437
|
scenario: we put VDEV1 to OFFLINE in zpool -> replace by VDEV2 (with WWN2) in the same slot |
|
I think the scenario is fairly well documented in the #9437 issue and the LU-12836 ticket. Please refer to them for the details. In short, a drive fails (or is brought offline) and |
| } else if (state == VDEV_STATE_ONLINE && | ||
| vd->vdev_prevstate == VDEV_STATE_UNKNOWN) { | ||
| /* post an online state change */ | ||
| zfs_post_state_change(spa, vd, vd->vdev_prevstate); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good thought, but unfortunately this doesn't work quite as intended. In my manual testing this will result in "ONLINE" resource.fs.zfs.statechange events being generated for both the top-level mirror/raidz vdev and the old replaced leaf vdev guid (not the new vdev guid).
For this to behave as intended we'd need to add logic to the end of vdev_set_state(), and somehow not filter out the initial open event from a replaced vdev.
/* filter out state change due to initial vdev_open */
if (save_state > VDEV_STATE_CLOSED)
zfs_post_state_change(spa, vd, save_state);
Alternately, we could perhaps add the event to the end of spa_vdev_attach() which handles the replacement itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I'll look into the second option since that seems simpler to me.
Can you please provide the steps you did to see the duplicate events? Since I didn't see that in my testing, I would like to update my test steps so I don't miss it again.
Should I close this PR and open another one when I have a different patch or just force push this one with the new patch when I have it ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cvoltz @tonyhutter I think we should take a step and reconsider the desired behavior.
Currently the code intentionally suppresses all statechange events when opening a device. This is why there are no such events posted when running zpool import, when using zpool add|attach, or zpool replace as in this case. Instead there are explicit add, attach, etc, events.
Would be more convenient if we generated events every time we opened a device (offline -> online)? Our would that generate too much noise in the event logs?
Or should we only generate the events for new devices? In which case we'd want to handle zpool create. zpool add, zpool attach, zpool split as well. This would in part be redudant with the config_sync event which is posted every time there's a configuration change. This does include a zpool replace.
Can you please provide the steps you did to see the duplicate events?
I was able to reproduce this by running zpool replace with the device online. Two state changes events are posted and you can see one is for the top-level vdev by checking the reported guid in the event. You can use zpool status -g to print each vdevs guid.
Should I close this PR and open another one when I have a different patch or just force push this one with the new patch when I have it ready?
Let's leave this open at least until we hash out the best way to handle this. Then I'd suggest force updating the PR so we can keep the discussion in one place.
Codecov Report
@@ Coverage Diff @@
## master #9523 +/- ##
==========================================
+ Coverage 79.11% 79.12% +<.01%
==========================================
Files 416 416
Lines 123652 123655 +3
==========================================
+ Hits 97831 97840 +9
+ Misses 25821 25815 -6
Continue to review full report at Codecov.
|
|
My gut feeling would be to add an additional statechange event for all attach events, except for when the disk is being added at import. I don't know off the top of my head what you'd key off of to know if you're importing or not, but there has to be something. You'd also not want to generate any statechange events for the "OLD" vdev ("OLD" is the actual vdev name), which is an internal vdev that gets temporary created while you're doing an attach (in fact, you probably don't want to generate any events at all for OLD). I don't know what a good |
|
As we typically have hundreds of drives attached to a system, I would prefer not to see an event for every drive at import (or export). Having additional statechange events for the other attach events makes sense to me. Generally symmetry is good so it begs the question of whether there should be equivalent statechange events when a drive is detached. Do these already exist for all cases? From the POV of the To make sure I have all of the test cases, we are talking about having statechange events for |
|
@cvoltz sorry, this PR completely fell off my radar. To answer your question, I'd be only looking for a test case to verify the statechange after a replace. If you want to add test case to test for statechange on all attach/remove events, all the better. See |
|
I recently rebased this PR on master (still merges cleanly) and gave it a test. The good news is that I'm seeing a statechange event on replace. The bad news is that I'm seeing multiple statechange events. The other bad news is that the statechange events it generates don't contain the Now compare this to the statechange from a |
|
@cvoltz I just submitted this patch to Lustre: LU-12836 osd-zfs: Catch all ZFS pool change events It creates symlinks to statechange-lustre for all the other events that could cause a pool status change. If that's a sufficient fix for you, then we may be able to get away without having to make any changes to ZFS. |
Motivation and Context
Fix #9437
Fix LU-12836
Description
When the
zpool onlinecommand is used to bring a faulted or offline drive back online, aresource.fs.zfs.statechangeevent is generated. When thezpool replacecommand is used to bring a faulted or offline drive back online, astatechangeevent is not generated. Add the missingstatechangeevent after resilvering has finished. The new sequence of events looks likethis:
The
statechangeevent looks like this:How Has This Been Tested?
Updated test
events_001_pos.kshto verify thestatechangeevent is generated. Ran the full set of functional tests using the updated ZFS Test suite on CentOS 7.6 using the CentOS 7 v1905.1 Vagrant image. See cvoltz/zfs-test-suite for scripts which use Ansible and Vagrant to provision a Vagrant VM, using libvirt, to build and run the ZFS test suite on a specified branch of a specified repository.Also tested the patch with Lustre 2.12.2-1 and ZFS 0.7.13-1 using the test-degraded-drive test script. See the LU-12836 ticket for details.
Types of changes
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Performance enhancement (non-breaking change which improves efficiency)
Code cleanup (non-breaking change which makes code smaller or more readable)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (a change to man pages or other documentation)
My code follows the ZFS on Linux code style requirements.
I have updated the documentation accordingly.
I have read the contributing document.
I have added tests to cover my changes.
All new and existing tests passed.
All commit messages are properly formatted and contain
Signed-off-by.