Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add scheduler cache size metrics #83508

Merged
merged 2 commits into from Oct 30, 2019

Conversation

@damemi
Copy link
Contributor

damemi commented Oct 4, 2019

What type of PR is this?
/kind feature

What this PR does / why we need it:
This adds a metric for the scheduler cache size and updates that metric in certain cache functions

Which issue(s) this PR fixes:
Fixes #83404

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

Scheduler now reports metrics on cache size including nodes, pods, and assumed pods
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 4, 2019

/cc @liu-cong
I'm not too experienced with metrics (learning though), could you review that this looks right?

@k8s-ci-robot k8s-ci-robot requested a review from liu-cong Oct 4, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 4, 2019

/sig instrumentation

@@ -29,6 +29,7 @@ import (
utilfeature "k8s.io/apiserver/pkg/util/feature"
"k8s.io/kubernetes/pkg/features"
"k8s.io/kubernetes/pkg/scheduler/algorithm"
"k8s.io/kubernetes/pkg/scheduler/metrics"

This comment has been minimized.

Copy link
@liu-cong

liu-cong Oct 4, 2019

Contributor

It looks like not all cache size change events are captured, e.g,, Line #400 in original file where delete(cache.assumedPods, key).

I think the best way to capture everything is to go through all references of the 3 maps assumedPods, podStates and nodes, and Inc/Dec the metric when add/delete happens. There might be an opportunity to refactor the code as well, e.g., if we make all add/delete operation go through a single set of helper functions, then we only need to Inc/Dec in those helper functions.

This comment has been minimized.

Copy link
@damemi

damemi Oct 7, 2019

Author Contributor

That's a good point, I also think there is a chance to refactor some of the cache code here. Will work more on it

This comment has been minimized.

Copy link
@ahg-g

ahg-g Oct 29, 2019

Member

it seems error prone to Inc/Dec because I am pretty sure we will forget a place and the counter will diverge from the actual cache size and can never recover, how about we have a function that Set the value of the metrics to len(map) of each map, instead of doing inc/dec

@liu-cong

This comment has been minimized.

Copy link
Contributor

liu-cong commented Oct 23, 2019

Hi are you still actively working on this?

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 3c501ff to 159ca20 Oct 23, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 23, 2019

I have updated this, and while I like the idea of adding helper functions where we centralize the updates to the cache and metrics, that is complicated by the fact that these 3 caches are structured with different types meaning they would probably need unique helpers.

I've tried to move metric updates to only follow direct calls to update the cache in:

  • AssumePod - update pods, assumed_pods metrics
  • ForgetPod -update pods, assumed_pods metrics
    --
  • AddPod - update assumed_pods, pods metrics
  • RemovePod - update pods metric
    --
  • AddNode - update nodes metric
  • removeNodeInfoFromList - update nodes metric

Edge cases:

  • addPod - update nodes metric (if new node detected)
  • expirePod - update assumed_pods, pods metrics (cleanup helper)
  • UpdateNode - update nodes metric if updated node not found

With these in place it seems effectively the same as adding new helper functions for each update, imo. Could you please review and double check if these cover the cases, without any duplicate metric reporting?

@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 23, 2019

/retest

@damemi damemi changed the title Scheduler cache metric Add scheduler cache size metrics Oct 23, 2019
Copy link
Contributor

liu-cong left a comment

/lgtm

Some comments that not necessarily need to be addressed in this PR.

  1. I'd love to see some unit tests.
  2. While I believe we covered all cases to inc/dec the cache counters, I still prefer a more centralized way to manage the counters. This not only makes the code cleaner, but also reduces the chance of another person changing the cache and forgetting to update the counter accordingly.
@@ -292,7 +294,9 @@ func (cache *schedulerCache) AssumePod(pod *v1.Pod) error {
pod: pod,
}
cache.podStates[key] = ps
metrics.CacheSize.WithLabelValues("pods").Inc()

This comment has been minimized.

Copy link
@liu-cong

liu-cong Oct 25, 2019

Contributor

nit: the label values can be constants

@liu-cong

This comment has been minimized.

Copy link
Contributor

liu-cong commented Oct 28, 2019

can you rebase and ping @ahg-g to approve?

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 159ca20 to 3662e1e Oct 28, 2019
@k8s-ci-robot k8s-ci-robot removed the lgtm label Oct 28, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 28, 2019

@liu-cong updated, thanks

@@ -537,6 +548,7 @@ func (cache *schedulerCache) UpdateNode(oldNode, newNode *v1.Node) error {
if !ok {
n = newNodeInfoListItem(schedulernodeinfo.NewNodeInfo())
cache.nodes[newNode.Name] = n
metrics.CacheSize.WithLabelValues("nodes").Inc()

This comment has been minimized.

Copy link
@ahg-g

ahg-g Oct 29, 2019

Member

Why are we incrementing the metric when a node is updated? this is an example where we could get things wrong with Inc/Dec, but using Set would just be redundant but not invalid.

This comment has been minimized.

Copy link
@damemi

damemi Oct 29, 2019

Author Contributor

@ahg-g you're right, I will update this to just use Set, that makes much more sense

@@ -29,6 +29,7 @@ import (
utilfeature "k8s.io/apiserver/pkg/util/feature"
"k8s.io/kubernetes/pkg/features"
"k8s.io/kubernetes/pkg/scheduler/algorithm"
"k8s.io/kubernetes/pkg/scheduler/metrics"

This comment has been minimized.

Copy link
@ahg-g

ahg-g Oct 29, 2019

Member

it seems error prone to Inc/Dec because I am pretty sure we will forget a place and the counter will diverge from the actual cache size and can never recover, how about we have a function that Set the value of the metrics to len(map) of each map, instead of doing inc/dec

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch 2 times, most recently from 76148bc to 0f50d20 Oct 29, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 29, 2019

@ahg-g updated with your feedback (as well as @liu-cong's), does this look good now?

@ahg-g

This comment has been minimized.

Copy link
Member

ahg-g commented Oct 29, 2019

@ahg-g updated with your feedback (as well as @liu-cong's), does this look good now?

it is scattered, I think it is better if you create a function updateMetrics that sets the value for each of the three metrics and defer calling it in each public function of the Cache that updates a pod or node (e.g.. AddPod, AssumePod, UpdatePod, ForgetPod, AddNode, RemoveNode etc.)

@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 30, 2019

@ahg-g updated with your feedback (as well as @liu-cong's), does this look good now?

it is scattered, I think it is better if you create a function updateMetrics that sets the value for each of the three metrics and defer calling it in each public function of the Cache that updates a pod or node (e.g.. AddPod, AssumePod, UpdatePod, ForgetPod, AddNode, RemoveNode etc.)

One function that updates all of the metrics would work nicer, but as I listed above there are some edge cases in non-public functions that will need to call the metrics update function as well. If that's okay, then I'll go ahead with it.

(tbh, a lot of this file is scattered in terms of where it's updating the caches)

@ahg-g

This comment has been minimized.

Copy link
Member

ahg-g commented Oct 30, 2019

@ahg-g updated with your feedback (as well as @liu-cong's), does this look good now?

it is scattered, I think it is better if you create a function updateMetrics that sets the value for each of the three metrics and defer calling it in each public function of the Cache that updates a pod or node (e.g.. AddPod, AssumePod, UpdatePod, ForgetPod, AddNode, RemoveNode etc.)

One function that updates all of the metrics would work nicer, but as I listed above there are some edge cases in non-public functions that will need to call the metrics update function as well. If that's okay, then I'll go ahead with it.

(tbh, a lot of this file is scattered in terms of where it's updating the caches)

But those non-public functions can only be called by a public function, right? so if you defer updateMetrics at the beginning of each public function, then you should capture any update that could happen in a non-public function.

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 0f50d20 to 75e6c69 Oct 30, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 30, 2019

@ahg-g ah you're right, especially now that we use Set there is less need to be precise with each edge case (the one exception is in the cleanup function I think). I pushed an update, and also removed the constants for these labels since they're now only being used once

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 75e6c69 to 862f1dc Oct 30, 2019
@liu-cong

This comment has been minimized.

Copy link
Contributor

liu-cong commented Oct 30, 2019

The cache has a goroutine which runs cleanup every second. I actually think it's good enough to update the cache size in that and no need to update it anywhere else. It won't be 100% accurate but good enough.

Thoughts? @ahg-g, @damemi

go wait.Until(cache.cleanupExpiredAssumedPods, cache.period, cache.stop)

@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 30, 2019

@liu-cong that makes sense to me, if we need to run it in that loop anyway (which I think we do, it calls expirePod which modifies the cache and isn't called by any other public functions)

@ahg-g

This comment has been minimized.

Copy link
Member

ahg-g commented Oct 30, 2019

@liu-cong that makes sense to me, if we need to run it in that loop anyway (which I think we do, it calls expirePod which modifies the cache and isn't called by any other public functions)

sounds good to me, typically systems scrape metrics every second anyways.

@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 862f1dc to 78a961c Oct 30, 2019
@damemi

This comment has been minimized.

Copy link
Contributor Author

damemi commented Oct 30, 2019

Sounds good! Updated, this pr ended up much simpler. I kept the helper function separate to not clog up that cleanupAssumedPods goroutine and also provide a clear extension point for future cache metrics

@@ -638,6 +639,7 @@ func (cache *schedulerCache) cleanupExpiredAssumedPods() {
func (cache *schedulerCache) cleanupAssumedPods(now time.Time) {
cache.mu.Lock()
defer cache.mu.Unlock()
defer cache.updateMetrics()

This comment has been minimized.

Copy link
@liu-cong

liu-cong Oct 30, 2019

Contributor

nit: To make it more explicit, I would rename(or comment) cleanupExpiredAssumedPods to cleanupExpiredAssumedPodsAndUpdateMetrics.

This comment has been minimized.

Copy link
@damemi

damemi Oct 30, 2019

Author Contributor

I don't think it's necessary to make it explicit in the function name that it updates the metrics, I know there are many other functions that report metrics as a side-effect that don't have it in the name. but I'll update the comment so it's documented

damemi added 2 commits Oct 4, 2019
@damemi damemi force-pushed the damemi:scheduler-cache-metric branch from 78a961c to 828d662 Oct 30, 2019
@liu-cong

This comment has been minimized.

Copy link
Contributor

liu-cong commented Oct 30, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label Oct 30, 2019
@ahg-g

This comment has been minimized.

Copy link
Member

ahg-g commented Oct 30, 2019

/approve

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented Oct 30, 2019

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, damemi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit a381f7c into kubernetes:master Oct 30, 2019
15 checks passed
15 checks passed
cla/linuxfoundation damemi authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-dependencies Job succeeded.
Details
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-kind Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-node-e2e-containerd Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
tide In merge pool.
Details
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 30, 2019
@damemi damemi deleted the damemi:scheduler-cache-metric branch Nov 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants
You can’t perform that action at this time.