Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dynamic config metrics #57527

Merged
merged 1 commit into from May 24, 2018

Conversation

@mtaufen
Copy link
Contributor

mtaufen commented Dec 21, 2017

This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.

kubernetes/enhancements#281

The Kubelet now exports metrics that report the assigned (node_config_assigned), last-known-good (node_config_last_known_good), and active (node_config_active) config sources, and a metric indicating whether the node is experiencing a config-related error (node_config_error). The config source metrics always report the value 1, and carry the node_config_name, node_config_uid, node_config_resource_version, and node_config_kubelet_key labels, which identify the config version. The error metric reports 1 if there is an error, 0 otherwise.

@mtaufen mtaufen self-assigned this Dec 21, 2017

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch 3 times, most recently from e4fec46 to 1c6b5e2 Dec 21, 2017

@k8s-ci-robot k8s-ci-robot added size/L and removed size/XL labels Dec 24, 2017

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from 42f8595 to a9dad8f Dec 27, 2017

@mtaufen mtaufen changed the title (WIP) node config source metrics (WIP) PoC node config source metrics Jan 19, 2018

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from a9dad8f to 33569f6 Jan 19, 2018

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from 33569f6 to 23a9a26 Jan 19, 2018

@mtaufen mtaufen changed the title (WIP) PoC node config source metrics PoC node config source metrics Jan 19, 2018

@crassirostris
Copy link
Member

crassirostris left a comment

Overall approach SGTM


func SetAssignedNodeConfigSource(val string) {
// TODO(mtaufen): will this eliminate previous metric, or generate a parallel one?
AssignedNodeConfigSource.WithLabelValues(val).Set(1)

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

This approach, implemented like this, is going to result in memory leak, because old values are not GC'd, as you noted in the comment

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

@loburm @x13n FYI, this is the usecase for string metrics

AssignedNodeConfigSourceKey = "assigned_node_config_source"
// AssignedNodeConfigSourceOkayKey's metric indicates whether the currently assigned config source and the config it contains is valid
AssignedNodeConfigSourceOkayKey = "assigned_node_config_source_okay"
AssignedNodeConfigSourceStatusTrue = 1

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

Please use string values, performance considerations in this case do not outweigh the decreased readability

This comment has been minimized.

@mtaufen

mtaufen Jan 23, 2018

Author Contributor

As in, just report a string value in another label key-value pair?

Since prometheus doesn't support string valued metrics:
https://prometheus.io/docs/concepts/metric_types/
prometheus/prometheus#2227

// AssignedNodeConfigSourceKey's metric reports the config source the Kubelet believes is currently assigned to it
AssignedNodeConfigSourceKey = "assigned_node_config_source"
// AssignedNodeConfigSourceOkayKey's metric indicates whether the currently assigned config source and the config it contains is valid
AssignedNodeConfigSourceOkayKey = "assigned_node_config_source_okay"

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

Nit: IMHO ok is more widely used than okay

This comment has been minimized.

@mtaufen

mtaufen Jan 23, 2018

Author Contributor

fair enough

utillog.Infof("starting controller")

// set metrics when all is said and done
var using string

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

using is a bad name, is says nothing and spans a huge scope

Also, if you change the logic below later, it may be used uninitialized

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

I don't understand that global defer at all, why not leave setting metrics local?

@@ -101,7 +122,7 @@ var (
Help: "Interval in microseconds between relisting in PLEG.",
},
)
// Metrics of remote runtime operations.
// Metrics for remote runtime operations

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

Mostly I see a slow convergence towards comments being complete sentences, with full stop at the end and so on.

DevicePluginRegistrationCountKey = "device_plugin_registration_count"
DevicePluginAllocationLatencyKey = "device_plugin_alloc_latency_microseconds"
// Metric keys for node config sources
// AssignedNodeConfigSourceKey's metric reports the config source the Kubelet believes is currently assigned to it
AssignedNodeConfigSourceKey = "assigned_node_config_source"

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

Why the assigned_node prefix? It's little big confusing, why not just config_source or dynamic_config_source?

This comment has been minimized.

@mtaufen

mtaufen Jan 23, 2018

Author Contributor

There are three config references you'd want the Kubelet to report:

  1. What it believes its currently assigned source is.
  2. What it believes its last-known-good source is.
  3. Which of the above two it's actually using.

NodeConfigSource is the type of the ConfigSource field on the Node object and is used to point the Kubelet at a config source, hence node_config_source.

Could do:

  1. node_config_source_assigned
  2. node_config_source_last_known_good
  3. node_config_source_using (or some name other than using)
AssignedNodeConfigSourceStatusUnknown = 0
AssignedNodeConfigSourceStatusFalse = -1
// LastKnownGoodNodeConfigSourceKey's metric indicates what the Kubelet is treating as it's last-known-good config source
LastKnownGoodNodeConfigSourceKey = "last_known_good_node_config_source"

This comment has been minimized.

@crassirostris

crassirostris Jan 22, 2018

Member

Try to make these hierarchical, that helps, e.g. dynamic_config_last_known_source

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented Jan 23, 2018

Thanks @crassirostris for taking a look at this

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from 999073c to 1a81dd5 Jan 23, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented Jan 23, 2018

/retest

@mtaufen mtaufen referenced this pull request Feb 15, 2018

Open

Dynamic Kubelet Configuration #281

27 of 27 tasks complete

@mtaufen mtaufen changed the title PoC node config source metrics add dynamic config metrics May 22, 2018

ActiveConfigKey = "node_config_active"
LastKnownGoodConfigKey = "node_config_last_known_good"
ConfigErrorKey = "node_config_error"
ConfigNameLabelKey = "node_config_name" // this is a fully resolved name, e.g. an API path or a URL, "Name" is short but a little confusing name for this key...

This comment has been minimized.

@mtaufen

mtaufen May 22, 2018

Author Contributor

@dashpole any naming suggestions for this?

This comment has been minimized.

@dashpole

dashpole May 22, 2018

Contributor

SelfLink is the URL representing an object in ObjectMeta. Maybe "node_config_self_link"? This should be equivalent to the self link for the configmap, right?

This comment has been minimized.

@mtaufen

mtaufen May 22, 2018

Author Contributor

@dashpole and I settled on node_config_source, as it accurately identifies what the corresponding value describes while being reasonably generic ("local", or a SelfLink, or maybe some URL in the future).

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented May 22, 2018

/retest

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from 1faff12 to 4aa1a2e May 22, 2018

@mtaufen mtaufen added this to the v1.11 milestone May 22, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented May 22, 2018

/status approved-for-milestone

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented May 22, 2018

/remove-lifecycle stale

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 22, 2018

[MILESTONENOTIFIER] Milestone Pull Request: Up-to-date for process

@crassirostris @dashpole @dchen1107 @mtaufen

Pull Request Labels
  • sig/node: Pull Request will be escalated to these SIGs if needed.
  • priority/important-soon: Escalate to the pull request owners and SIG owner; move out of milestone after several unsuccessful escalation attempts.
  • kind/feature: New functionality.
Help

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch 3 times, most recently from fb81c29 to e951978 May 22, 2018

@mtaufen

This comment has been minimized.

Copy link
Contributor Author

mtaufen commented May 22, 2018

/retest

add dynamic config metrics
This PR exports config-releated metrics from the Kubelet.
The Guages for active, assigned, and last-known-good config can be used
to identify config versions and produce aggregate counts across several
nodes. The error-reporting Gauge can be used to determine whether a node
is experiencing a config-related error, and to prodouce an aggregate
count of nodes in an error state.

@mtaufen mtaufen force-pushed the mtaufen:kc-metric branch from e951978 to fd3432e May 22, 2018

@dashpole

This comment has been minimized.

Copy link
Contributor

dashpole commented May 22, 2018

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm label May 22, 2018

@dchen1107

This comment has been minimized.

Copy link
Member

dchen1107 commented May 23, 2018

/lgtm

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

k8s-ci-robot commented May 23, 2018

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dashpole, dchen1107, mtaufen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-github-robot

This comment has been minimized.

Copy link
Contributor

k8s-github-robot commented May 24, 2018

Automatic merge from submit-queue (batch tested with PRs 64013, 63896, 64139, 57527, 62102). If you want to cherry-pick this change to another branch, please follow the instructions here.

@k8s-github-robot k8s-github-robot merged commit 731eaec into kubernetes:master May 24, 2018

18 checks passed

Submit Queue Queued to run github e2e tests a second time.
Details
cla/linuxfoundation mtaufen authorized
Details
pull-kubernetes-bazel-build Job succeeded.
Details
pull-kubernetes-bazel-test Job succeeded.
Details
pull-kubernetes-cross Skipped
pull-kubernetes-e2e-gce Job succeeded.
Details
pull-kubernetes-e2e-gce-100-performance Job succeeded.
Details
pull-kubernetes-e2e-gce-device-plugin-gpu Job succeeded.
Details
pull-kubernetes-e2e-gke Skipped
pull-kubernetes-e2e-kops-aws Job succeeded.
Details
pull-kubernetes-integration Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce Job succeeded.
Details
pull-kubernetes-kubemark-e2e-gce-big Job succeeded.
Details
pull-kubernetes-local-e2e Skipped
pull-kubernetes-local-e2e-containerized Job succeeded.
Details
pull-kubernetes-node-e2e Job succeeded.
Details
pull-kubernetes-typecheck Job succeeded.
Details
pull-kubernetes-verify Job succeeded.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.