New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow show hidden metrics in kube-apiserver #84292
Allow show hidden metrics in kube-apiserver #84292
Conversation
/assign @logicalhan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no tests included in this PR, we should really include a unit test at minimum. I'm assuming this is sort of a follow-up to #81970. @logicalhan we previously discussed that we wanted this to be a feature flag rather than a CLI option as adding a CLI option removes the possibility of removing it/changing schema in the future, and the KEP specifies that we should be able to list specific hidden metrics to enable/disable (rather than a blanket disable for all of them). I think for now it would make more sense to go with the feature flag approach as I previously looked at? |
/sig instrumentation |
@@ -153,6 +154,9 @@ func (s *ServerRunOptions) Flags() (fss cliflag.NamedFlagSets) { | |||
fs.BoolVar(&s.AllowPrivileged, "allow-privileged", s.AllowPrivileged, | |||
"If true, allow privileged containers. [default=false]") | |||
|
|||
fs.BoolVar(&s.ShowHiddenMetrics, "show-hidden-metrics", s.ShowHiddenMetrics, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is to prevent an admin from:
- in version X, turn this on to get back deprecated metric M
- in version Y, not turn this off, and therefore fail to notice metric N is being deprecated
- in version Z, metric N is removed with (effectively) no warning.
To prevent this scenario, I suggest forcing the user to specify the version for which they want to show metrics as the value of this flag rather than "true".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion, it seems reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good idea (and I also wouldn't mind getting rid of the KEP specification to be able to turn on/off individual metrics and just make it binary-wide for hidden) so perhaps we should update the KEP to include this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I think we should be able to turning off individual metrics, since we have had memory leak issues in the past. These don't tend to manifest except under certain cardinality conditions, so it's possible that metrics like that will get sneak by and get released at some point. I'm thinking of that as a beta/GA feature though. Always good to have that safety net of being able to turn off a memory leak without having to patch and release kubernetes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the suggestion, it seems reasonable to me.
+1
It's a really good point!
I was thinking maybe we should just separate the functionality. A generic toggle for deprecated/hidden metrics but we should still eventually have the ability to disable metric[s] on demand. I want the safety net to be able to get rid of a metric memory leak without having to release a new version of kubernetes. |
So from this discussion I'm hearing we should specify two things:
Sounds like we now need two CLI flags... can we have a quick design discussion somewhere, maybe on the KEP, before moving forward with this? /hold |
If I understand correctly, here @logicalhan want to I doubt if it is convincing enough to add a flag(hide metrics) just to prevent memory leak? |
From this discussion, personally I think we should provide a feature gate as in #81970 which can provide stability guarantees for the flag(s). e.g
|
This is basically not at all the conclusion I am coming to. How are you arriving at feature gates from this conversation? |
We have already had to do this: #74636. We should not require a new release of kubernetes in order to turn off a bad metric. |
I like @lavalamp 's idea. I am a little worried we will lose the possibility of improving this flag when we get another good idea. Am I over-cautious? |
@lavalamp’s idea (as I understand it from talking to him about it today) is that the flag takes in a string parameter, a version for which we will show what would otherwise be hidden metrics. This eliminates the danger of what happens when someone forgets to untoggle a flag after migrating a set of deprecated metrics. Being able to hide metrics is something which we can address orthogonally, using another flag. |
OK. Copy that. as well as
I will push a change according @lavalamp's comments. |
Yeah, I know that. I intended to deal with them after this PR. Because they will share the validation logic from the component base. |
// TODO(RainbowMango): move it to genericoptions before next flag comes. | ||
mfs := fss.FlagSet("metrics") | ||
mfs.StringVar(&s.ShowHiddenMetricsForVersion, "show-hidden-metrics-for-version", s.ShowHiddenMetricsForVersion, | ||
"The previous version(x.y) for which you want to show hidden metrics. Only the previous minor version is allowed.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this version? Which formats are accepted? What does the last sentence mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not --show-hidden-metrics
without any argument?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of flag is incompatible by design to the previous version on each version change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not --show-hidden-metrics without any argument?
See the discussion above:
#84292 (comment)
Also updated to KEP:
why not --show-hidden-metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, got that.
Another point: I have not read the KEP. I have no clue what this flag does. What does "show" mean? What does "hidden" mean? This must be super clear from the flag description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the question I asked here on the KEP update: kubernetes/enhancements#1358 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why didn't we stick with the original plan of allowing enabling a specific deprecated metric?
Because we want to reserve the ability to disable a metric (regardless of whether it is stable or not) at application boot and that functionality is orthogonal to metrics being auto-disabled by metrics stability framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original plan dictates providing a mechanism for turning off individual metrics via the command line.
..all metrics should be able to be individually disabled by the cluster administrator, regardless of stability class. By default, all non-deprecated metrics will be automatically registered to the metrics endpoint unless explicitly blacklisted via a command line flag (i.e. '--disable-metrics=somebrokenmetric,anothermetric')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest: The previous version for which you want to show hidden metrics. Only the previous minor version is meaningful, other values will be ignored. The format is <major>.<minor>, e.g.: 1.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion, I think it makes sense to add a comment here explaining the purpose of this (to avoid giving the admin a surprise at v N+2). This could also be said in the help text, e.g.: The purpose of this format is make sure you have the opportunity to notice if the next release hides additional metrics, rather than being surprised when they are permanently removed in the release after that.
22dd973
to
1290698
Compare
New push with a minor change: |
Does "not be allowed" mean apiserver won't start? I can't decide if that's good or not. |
silently ignoring an explicit option they asked for seems not good, especially for something like this where that would mean alerting/monitoring configurations could be making incorrect decisions based on absent metrics |
|
||
targetVersion, err := semver.Parse(pathVersion) | ||
if err != nil { | ||
return fmt.Errorf("specified --show-hidden-metrics-for-version '%v' not follows the version format x.y", targetVersionStr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/not follows/does not follow/
return fmt.Errorf("specified --show-hidden-metrics-for-version '%v' not follows the version format x.y", targetVersionStr) | ||
} | ||
|
||
maxAllowedVersion, err := semver.Make(fmt.Sprintf("%d.%d.0", currentVersion.Major, currentVersion.Minor)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you don't need any of this semver code. Only the previous minor version is allowed, so you can print it to a string and compare directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that can handle the above error condition, too. The error message can say, --show-hidden-metrics-for-version must be omitted or have the value "%v". Only the previous minor version is allowed.
OK, if @liggitt agrees then I'm sold, but that means the error handling code can be drastically simplified. |
- Update bazel by hack/update-bazel.sh
1290698
to
b2fbdee
Compare
Thanks, much clear now. |
/lgtm |
return nil | ||
} | ||
|
||
validVersionStr := fmt.Sprintf("%d.%d", currentVersion.Major, currentVersion.Minor-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're ever going to have a v2, but if we did, it would print 2.-1 which is not right :)
Feel free to fix in a follow-up if you want.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: brancz, lavalamp, logicalhan, RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What type of PR is this?
/kind feature
What this PR does / why we need it:
Add a flag to show hidden metrics.
Refer to stability framework kep:
Which issue(s) this PR fixes:
Special notes for your reviewer:
It's a lightweight implementation against the KEP.
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: