-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
configure zed's diagnosis engine with vdev properties #13805
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking up this work!
cmd/zed/agents/zfs_diagnosis.c
Outdated
if (checksum_n == UINT64_MAX) | ||
checksum_n = 10; | ||
if (checksum_t == UINT64_MAX) | ||
checksum_t = 600; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add #defines for DEFAULT_CHECKSUM_N
and DEFAULT_CHECKSUM_T
here so this is very clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Should we do the same for the other things that use this idiom (filesystem_limits and snapshot_limits)?
module/zcommon/zpool_prop.c
Outdated
@@ -410,6 +410,12 @@ vdev_prop_init(void) | |||
sfeatures); | |||
|
|||
/* default numeric properties */ | |||
zprop_register_number(VDEV_PROP_CHECKSUM_N, "checksum_n", UINT64_MAX, | |||
PROP_DEFAULT, ZFS_TYPE_VDEV, "<count>", "CHECKSUMN", B_FALSE, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PROP_DEFAULT, ZFS_TYPE_VDEV, "<count>", "CHECKSUMN", B_FALSE, | |
PROP_DEFAULT, ZFS_TYPE_VDEV, "<count>", "CK_N", B_FALSE, |
...and CK_T
would be more consistent with CKERR
above. Just not intuitive at all, maybe something like CKSUM_T
and CKSUM_N
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I think CKSUM_T
is better
module/zfs/zfs_fm.c
Outdated
uint64_t objid, intval; | ||
int err = 0; | ||
|
||
objid = vd->vdev_top_zap != 0 ? vd->vdev_top_zap : vd->vdev_leaf_zap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
objid = vd->vdev_top_zap != 0 ? vd->vdev_top_zap : vd->vdev_leaf_zap; | |
objid = vd->vdev_top_zap != 0 ?: vd->vdev_leaf_zap; |
May be more readable.
So by 'none' do you mean "use the old default values"? /*
* UINT64_MAX is the default value of the
* checksum_* vdev properties to represent a
* 'none' value.
*/
if (checksum_n == UINT64_MAX)
checksum_n = 10;
if (checksum_t == UINT64_MAX)
checksum_t = 600;
|
Given that IO and checksum errors are reported independently, it seems natural to allow configuring these thresholds independently.
I agree that the common case would be for all vdevs to have the same thresholds..and more than likely it'll be the default that the zed provides. Given two vdevs, I wonder if there's any value in having the ability to influence which vdev degrades first..that is, setting different thresholds for each vdev. I imagine that'd be an uncommon scenario though.
Indirectly, yes - by 'none', I mean the vdev property has not been set. When zed looks up the property and sees that it hasn't been set (i.e., UINT64_MAX) the zed will use a default value.
Will do. |
Looks like there are failing tests related to this change, I'll dig into it and see what's going on. I'll update this pull request with review feedback when I get the testing bits sorted out. |
The difference here being: if the leaf vdev is set to none, we check its parent, and recurse all the way up to the root vdev. Basically faking inheritance. And if it is not set anywhere, fall back to the old ZED default |
e414060
to
5912702
Compare
Would that be an issue with the "Properties are NOT inherited from top-level vdevs" line in |
vdev properties are NOT inherited, so we wrote a little function to work around that for ZED's needs. |
@tonyhutter, as requested - I've added a few test cases:
|
The test I wrote is failing, it looks like either the 'ereport.fs.zfs.checksum' isn't making it to
These tests worked on my local machine, anyone spot an obvious error? In the meantime, I'll try to replicate the error by running the tests on a fresh linux box. In addition my failing test, the following tests failed for these specific instances: Debian 10 x86_64: fault/auto_spare_shared These tests didn't fail in the previous test run and it seems odd that the failing tests aren't consistent across the linux instances. Any insight on whether these are familiar failures or related to this PR? |
module/zfs/vdev.c
Outdated
} else if (vd->vdev_leaf_zap != 0) { | ||
objid = vd->vdev_leaf_zap; | ||
} else { | ||
return (EINVAL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if preferring the top over the leaf made sense or if there could be something that is neither top nor leaf, but this matches existing logic below in vdev_prop_get()
except for the lack of SET_ERROR()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question, I think the root vdev is neither top or leaf...and no zap. This function was partially factored out of vdev_prop_get()
and I brought this sanity check over with it.
vdev_prop_get()
is a bit imperfect as it doesn't handle available spares or cache vdevs correctly since they don't have a top_zap or a leaf_zap (as I understand it).
cmd/zed/agents/zfs_diagnosis.c
Outdated
/* | ||
* UINT64_MAX is the default value of the | ||
* checksum_* vdev properties to represent a | ||
* 'none' value. | ||
*/ | ||
if (checksum_n == UINT64_MAX) | ||
checksum_n = DEFAULT_CHECKSUM_N; | ||
if (checksum_t == UINT64_MAX) | ||
checksum_t = DEFAULT_CHECKSUM_T; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit awkward. Would it be better to leave these fields out of the payload if they are unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, that is a bit weird....maybe better to leave out when unset, that way the semantics of these internal values don't bleed into userland.
it looks like sys/fs/zfs.h
is a common header between ZED, module/zcommon/zpool_prop.c
, module/zfs/vdev.c
, and module/zfs/zfs_fm.c
...although its not explicitly declared in zfs_fm.c
but brought in through other includes.
..eh, probably better to just drop these fields from the payload when unset and let ZED define the default - I'll update this PR to reflect your suggestion
Thanks! 👍
@allanjude A leaf vdev's parent is a root vdev, so there's only recursion up one level. This wouldn't recurse all the way to the pool level, if that's what you were thinking. That said, is there use case for wanting to set thresholds on a root vdev (like |
What do folks think about storing these properties in the vdev struct? This would allow reading the properties from memory. If I understand correctly, the zap_lookup() in the io pipeline path is problematic since it may try to doing sync io. |
@allanjude - Sorry, you were right - I got the terminology wrong. It should be:
I got tripped up looking at our documentation here:
https://openzfs.github.io/openzfs-docs/man/7/zpoolconcepts.7.html ... but this example should say "two top level vdevs". That said, I'm able to set the values on the vdev and top-level vdev, but not the root-vdev/pool:
Is there a different syntax I should be using for setting it on the root vdev? Side note - I notice the usage text for
|
Yes, we might need to do a pass to clarify the documentation
So, part of this is the way we hijacked the zpool set command. There are 2 syntaxes: set a property on a pool:zpool set property=value pool set a property on a vdev:zpool set propery=value pool vdev And yes, currently there is no way to set a property on the 'root' vdev. Because the way the zpool set command works, it is looking for names that are not pool names, to use as vdevs (so |
@allanjude thanks for the info. I just noticed that
Would it make sense to just add |
The issue with doing a |
Hmm... that really diminishes the benefit of making the thresholds per-vdev then. You can't just "set it and forget it" on the vdevs, as would be the natural user expectation. To make the thresholds persistent (the common use case), you would have to set them on top-level-vdevs. If you added a new top-level-vdev, you would have to set thresholds on the new top-level-vdev as well. And the user would have to know that they could set values on the top-level-vdevs, which is non-standard. With all that in mind, does it make sense to just have the thresholds be per-pool properities? |
This is why we were thinking of the root-vdev property, so you could set a default basically at the pool level, but then still be able to override it at the top-level vdev or individual leaf vdev level. |
So what would the command be to set thresholds on the root vdev? I tried this earlier without success:
|
The syntax does not current exist. It would need to be different than that for pool properties, like |
Here's how I would summarize what this PR is proposing: Checksum thresholds shouldn't be a per-pool property. They should instead be a per-vdev property, which would need to be re-set when a vdev is replaced, or, set on a top-level vdev, which is something we don't currently do, or set per root vdev, which we currently don't have the capability to do, but will be added at a later time, with a syntax different from setting pool properties, even though the effect is conceptually the same. I think that may be a little too complicated. Assuming Principal Of Least Surprise, I think you could do:
I would tend to favor 1, unless there's a real-world use case for 2. |
Right, allow the user to tune a vdev at a granular level.
This isn't entirely correct. It is possible to set this property on a top-level vdev. If one wants this property to persist when replacing a leaf vdev, it would be advised to set the property on the top-level vdev - not on the leaf vdev itself.
Conceptually the same, sort of. I'd view it as a vdev-wide property as opposed to a pool property.
In general, I'd steer away from making it a pool property; as it seems logical to maintain a clear line of separation between pool properties and vdev properties - and checksum errors are reported at the vdev level.
Syntactically, I like this. However, the namespacing is different between pool properties and vdev properties - which would require maintaining the property in two different namespaces.
I agree it'd be ideal to figure out an intuitive syntax for setting vdev-wide properties. Either using a special keyword To some degree, it'd be nice to abuse the pool name as an alias to the root vdev for setting certain properties at the vdev-wide level - but eh.. Thoughts? |
Based on conversations at the OpenZFS developer summit, we have settled on the syntax: We will also add an explicit alias, |
8994402
to
d234dc3
Compare
36cc58c
to
8e4922f
Compare
With the additional parameters added now, this is ready for review. |
Please update the man pages to document these new properties (and |
8e4922f
to
9b1a7de
Compare
Hey Tony, I documented the new properties in Would you find it acceptable for the |
Sorry, I was under the mistaken impression they were included in this PR. That's fine if you want to move it to a different PR. Note that if you guys don't plan to use the You will want to further update the docs to make it clear you can set these new values not only on vdevs, but on top level vdevs as well. You'll also want to update that man page to say that vdev props are not persistent across drive replacements, and maybe add a note to the |
9b1a7de
to
6f55221
Compare
bump I believe all feedback has been addressed. Are there any outstanding issues that would prevent this pull request from being merged? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on this and incorporating the feedback. The updated version does look ready to go.
I still think the wording in
Maybe just re-word it a little? diff --git a/man/man7/vdevprops.7 b/man/man7/vdevprops.7
index 3244324..4468f4a 100644
--- a/man/man7/vdevprops.7
+++ b/man/man7/vdevprops.7
@@ -43,7 +43,7 @@ section, below.
.Ss Native Properties
Every vdev has a set of properties that export statistics about the vdev
as well as control various behaviors.
-Properties are NOT inherited from top-level vdevs.
+In most cases, properties are not inherited from top-level vdevs.
.Pp
The values of numeric properties can be specified using human-readable suffixes
.Po for example, |
6f55221
to
8cdce7b
Compare
Rebase and clarify that checksum_* and io_* properties can be inherited from a top-level vdev. Thanks to all for taking the time to review this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this!
It appears the saga continues...the zed_io_config test is failing. I'll update this pull request when I understand what the problem is. |
When updating this please go ahead and squash the commits as well. |
Alright, will do - I'll be updating this shortly. I've identified the issue for the failing test which was introduced in Implement uncached prefetch #14243 The test was doing sequential reads which triggered the prefetching and subsequently exceeded the thresholds of the test causing the failure. I'll write the test to either not do sequential reads or simply turn prefetching off via |
I'd also go with disabling prefetch. We needed to do the same thing in the |
Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Sponsored-by: Seagate Technology LLC Submitted-by: Klara, Inc. Signed-off-by: Rob Wing <rob.wing@klarasystems.com>
8cdce7b
to
c77155d
Compare
Introduce four new vdev properties: checksum_n checksum_t io_n io_t These properties can be used for configuring the thresholds of zed's diagnosis engine and are interpeted as <N> events in T <seconds>. When this property is set to a non-default value on a top-level vdev, those thresholds will also apply to its leaf vdevs. This behavior can be overridden by explicitly setting the property on the leaf vdev. Note that, these properties do not persist across vdev replacement. For this reason, it is advisable to set the property on the top-level vdev instead of the leaf vdev. The default values for zed's diagnosis engine (10 events, 600 seconds) remains unchanged. Reviewed-by: Tony Hutter <hutter2@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Reviewed-by: Ryan Moeller <ryan@iXsystems.com> Reviewed-by: Allan Jude <allan@klarasystems.com> Signed-off-by: Rob Wing <rob.wing@klarasystems.com> Sponsored-by: Seagate Technology LLC Closes openzfs#13805
Currently, zed uses hard-coded values for configuring the thresholds of its diagnosis engine . This patchset
introduces a mechanism to configure these thresholds using vdev properties.
This introduces two new vdev properties, checksum_n and checksum_t. These properties can be used
for tuning the values that are passed into the serd algorithm used by the diagnosis engine. The meaning
of these properties are N <events> in T <seconds>.
Motivation and Context
To allow a user to tune the thresholds of zed's diagnosis engine for testing and/or production purposes.
Description
Since vdev properties don't have inheritance, I've added a routine to workaround it. If the
vdev property, checksum_n and/or checksum_t, has not been set on the vdev for the error report
being generated, the parent vdev's will be checked recursively to see if either of the properties have been set.
If these properties are not set in the vdev or any parent vdev, zed will fallback to the original hard-coded default.
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.