-
Notifications
You must be signed in to change notification settings - Fork 39k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for volume scheduling operations #59529
Conversation
/ok-to-test |
pkg/scheduler/metrics/metrics.go
Outdated
Buckets: prometheus.ExponentialBuckets(1000, 2, 15), | ||
}, | ||
) | ||
VolumeBindingPredicateLatency = prometheus.NewHistogram( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bsalamat, what do you think about having a metric on a specific predicate? This predicate does have the potential to be slow because it has to iterative through all PVs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having metrics for a specific predicate is not consistent with the rest of our code base, but it's fine for now while we are finding a better solution for dynamic resource binding.
pkg/scheduler/metrics/metrics.go
Outdated
@@ -87,6 +87,30 @@ var ( | |||
Name: "total_preemption_attempts", | |||
Help: "Total preemption attempts in the cluster till now", | |||
}) | |||
AssumeBindVolumesLatency = prometheus.NewHistogram( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know the function name is assume and bind, but I think it makes more sense for the metric to be named just "AssumeVolumesLatency" because the function doesn't actually do the bind operation, it just queues it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/scheduler/scheduler.go
Outdated
@@ -350,6 +352,8 @@ func (sched *Scheduler) bindVolumesWorker() { | |||
Status: v1.ConditionFalse, | |||
Reason: reason, | |||
}) | |||
|
|||
metrics.BindPodVolumeLatency.Observe(metrics.SinceInMicroseconds(start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may also be good to add a metric to count VolumeBindingFailed. I don't expect it to happen often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@msau42 Done
I do not know why
|
@wackxu can you also add metrics to pkg/controller/volume/persistentvolume/scheduler_binder_cache.go? I want to track how many times a binding was added to the cache, and how many times it was removed from the cache. This could help detect potential memory leaks. |
regarding your import failures, I think because the change is adding an import to the predicates pkg, it will trigger some allowed import check. You can modify the import restrictions file, like this |
25ec499
to
e4e0612
Compare
@msau42 Done, PTAL |
@@ -59,6 +60,8 @@ func (c *podBindingCache) DeleteBindings(pod *v1.Pod) { | |||
|
|||
podName := getPodName(pod) | |||
delete(c.bindings, podName) | |||
|
|||
metrics.VolumeBindingDeleteFromSchedulerBinderCache.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better to only record when it's actually removed (it could have already been removed previously)
@@ -72,6 +75,8 @@ func (c *podBindingCache) UpdateBindings(pod *v1.Pod, node string, bindings []*b | |||
c.bindings[podName] = nodeBinding | |||
} | |||
nodeBinding[node] = bindings | |||
|
|||
metrics.VolumeBindingAddToSchedulerBinderCache.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here, only record when it's actually added.
@msau42 Done, PTAL |
@@ -71,6 +75,10 @@ func (c *podBindingCache) UpdateBindings(pod *v1.Pod, node string, bindings []*b | |||
nodeBinding = nodeBindings{} | |||
c.bindings[podName] = nodeBinding | |||
} | |||
if _, ok := nodeBinding[node]; !ok { | |||
metrics.VolumeBindingAddToSchedulerBinderCache.Inc() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can just be updated in the block above, since the metrics is based on the pod name, not node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -1588,6 +1591,7 @@ func (c *VolumeBindingChecker) predicate(pod *v1.Pod, meta algorithm.PredicateMe | |||
return false, failReasons, nil | |||
} | |||
|
|||
metrics.VolumeBindingPredicateLatency.Observe(metrics.SinceInMicroseconds(start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also record latency when predicate returns failReasons because it still has to do lots of calculations in FindPodVolumes.
pkg/scheduler/metrics/metrics.go
Outdated
VolumeBindingFailed = prometheus.NewCounter( | ||
prometheus.CounterOpts{ | ||
Subsystem: schedulerSubsystem, | ||
Name: "total_volume_binding_failed", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the convention is to have count or latency at the end of the name, so something like "volume_binding_error_count"
pkg/scheduler/scheduler.go
Outdated
@@ -298,6 +299,7 @@ func (sched *Scheduler) assumeAndBindVolumes(assumed *v1.Pod, host string) error | |||
} | |||
return err | |||
} | |||
metrics.AssumeVolumesLatency.Observe(metrics.SinceInMicroseconds(start)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should observe latency even when error
@@ -56,6 +73,8 @@ type PVCLister interface { | |||
func Register(pvLister PVLister, pvcLister PVCLister) { | |||
registerMetrics.Do(func() { | |||
prometheus.MustRegister(newPVAndPVCCountCollector(pvLister, pvcLister)) | |||
prometheus.MustRegister(VolumeBindingAddToSchedulerBinderCache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the registration here is not going to work. This registration is called in PV controller process, but the volume binding cache is a library used by scheduler process. So I think the library needs export a register method that the scheduler will call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These registration calls should be removed from this file, because this is called for PV controller process, not scheduler process.
@wackxu thanks for working on this. I just wanted to check if you will still have time to continue on this? |
@msau42 Sorry for the delay, will update later |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good thanks! Just one minor nit
|
||
// RegisterForScheduler is used for scheduler, because the volume binding cache is a library | ||
// used by scheduler process. | ||
func RegisterForScheduler() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you call this "RegisterVolumeSchedulingMetrics" to be more clear?
/lgtm |
/assign @bsalamat |
@@ -21,6 +21,7 @@ import ( | |||
"time" | |||
|
|||
"github.com/prometheus/client_golang/prometheus" | |||
"k8s.io/kubernetes/pkg/controller/volume/persistentvolume" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance we could separate the metrics in a different package and import only the metrics package instead of the whole persistent volume controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wackxu would you be able to look into this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have already import k8s.io/kubernetes/pkg/controller/volume/persistentvolume
in the scheduler and the metric is used only for scheduler like scheduler_binder files. I think we can extract a separate dir for those files in a follow pr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable.
/test all Tests are more than 96 hours old. Re-running tests. |
@wackxu a lot of refactoring has gone in 1.12, so this change needs to be rebased. Do you still have time to work on this? |
@msau42 Sorry for the delay, will update today |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
VolumeBindingFailed = prometheus.NewCounter( | ||
prometheus.CounterOpts{ | ||
Subsystem: VolumeSchedulerSubsystem, | ||
Name: "binding_error_total", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also have error count for assume and predicates too? (similar to stage latency)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay, updated
/lgtm Thanks for continuing to work on this! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bsalamat, msau42, wackxu The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
What this PR does / why we need it:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes # #56162
Special notes for your reviewer:
/assign @msau42
Release note: