Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add k8s metadata labels to VMI metrics #3636

Merged
merged 16 commits into from
Jul 16, 2020

Conversation

ArthurSens
Copy link
Contributor

@ArthurSens ArthurSens commented Jun 23, 2020

Signed-off-by: arthursens arthursens2005@gmail.com

What this PR does / why we need it:
This PR adds K8s metadata labels to VMI metrics.

This facilitates users to identify VMIs and aggregate metrics as they prefer and to implement their own monitoring solutions that best suit their needs

Labels

As said at the Kubernetes Labels and Selectors concept documentation

Labels enable users to map their own organizational structures onto system objects in a loosely coupled fashion, without requiring clients to store these mappings.
Example labels:

  • "release" : "stable", "release" : "canary"
  • "environment" : "dev", "environment" : "qa", "environment" : "production"
  • "tier" : "frontend", "tier" : "backend", "tier" : "cache"
  • "partition" : "customerA", "partition" : "customers"
  • "track" : "daily", "track" : "weekly"

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2108

Special notes for your reviewer:
To test this I'm using the vmi-fedora example.

After deploying the VMI, I get the following metadata when describing the resource:

./cluster-up/kubectl.sh describe virtualmachineinstances/vmi-fedora

Name:         vmi-fedora
Namespace:    default
Labels:       kubevirt.io/nodeName=node01
              special=vmi-fedora

Which is then translated into metric labels on every VMI metric(Note that it doesn't occur on metric not related to VMIs):

curl -k https://localhost:8443/metrics | grep kubevirt
# TYPE kubevirt_info gauge
kubevirt_info{goversion="go1.12.8",kubeversion="{gitVersion}"} 1
# HELP kubevirt_vmi_memory_resident_bytes resident set size of the process running the domain.
# TYPE kubevirt_vmi_memory_resident_bytes gauge
kubevirt_vmi_memory_resident_bytes{domain="default_vmi-fedora",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01"} 4.34688e+08
# HELP kubevirt_vmi_network_errors_total network errors.
# TYPE kubevirt_vmi_network_errors_total counter
kubevirt_vmi_network_errors_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="rx"} 0
kubevirt_vmi_network_errors_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="tx"} 0
# HELP kubevirt_vmi_network_traffic_bytes_total network traffic.
# TYPE kubevirt_vmi_network_traffic_bytes_total counter
kubevirt_vmi_network_traffic_bytes_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="rx"} 13386
kubevirt_vmi_network_traffic_bytes_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="tx"} 5070
# HELP kubevirt_vmi_network_traffic_packets_total network traffic.
# TYPE kubevirt_vmi_network_traffic_packets_total counter
kubevirt_vmi_network_traffic_packets_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="rx"} 99
kubevirt_vmi_network_traffic_packets_total{domain="default_vmi-fedora",interface="vnet0",kubernetes_vmi_label_kubevirt_io_nodeName="node01",kubernetes_vmi_label_special="vmi-fedora",name="vmi-fedora",namespace="default",node="node01",type="tx"} 58

.
.
.

Release note:

Adds kubernetes metadata.labels as VMI metrics' label

@kubevirt-bot kubevirt-bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Jun 23, 2020
@ArthurSens
Copy link
Contributor Author

Currently working on splitting k8s_labels into multiple labels, e.g.:

VMI has the following labels:

kubevirt.io/nodeName: node01
special: vmi-fedora

current label:
k8s_labels="kubevir.io/nodeName=node01,secial=vmi-fedora"

desired labels:
k8s_label_kubevirt_io_nodeName="node01", k8s_label_special="vmi-fedora"

@ArthurSens
Copy link
Contributor Author

Getting the following error at virt-handler logs at the moment

{
	"component": "virt-handler",
	"level": "warning",
	"msg": "Error creating the new const metric for Desc{fqName: \"kubevirt_vmi_memory_available_bytes\", help: \"amount of usable memory as seen by the domain.\", constLabels: {}, variableLabels: [node namespace name vmi_k8s_label_kubevirt_io_nodeName vmi_k8s_label_special vmi_k8s_annotation_kubevirt_io_latest_observed_api_version vmi_k8s_annotation_kubevirt_io_storage_observed_api_version]}: inconsistent label cardinality: expected 9 label values but got 10 in []string{\"node01\", \"default\", \"vmi-fedora\", \"vdb\", \"read\", \"node01\", \"vmi-fedora\", \"v1alpha3\", \"v1alpha3\", \"v1alpha3\"}",
	"pos": "prometheus.go:148",
	"timestamp": "2020-06-23T19:30:37.315607Z"
}

Somehow passing \"vdb\", \"read\" as label values for available_memory metric... still couldn't find out where exactly

Copy link
Member

@rmohr rmohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArthurSens thanks for the PR. One initial question.

pkg/monitoring/vms/prometheus/prometheus.go Outdated Show resolved Hide resolved
@ArthurSens
Copy link
Contributor Author

/retest

@ArthurSens
Copy link
Contributor Author

@rmohr I didn't like declaring metrics descriptors at the beginning and then overriding everything at the update functions... but I couldn't think of a better solution since the Describe function needs all those descriptors as well

Any tips about that? Should I leave it as it is?

@ArthurSens
Copy link
Contributor Author

/retest

@fabiand
Copy link
Member

fabiand commented Jun 29, 2020

Two notes:

  • kubernetes_vmi_label_... should it be kubevirt_vmi_label_...?
  • What about backwards compaibitility?

@ArthurSens
Copy link
Contributor Author

ArthurSens commented Jun 29, 2020

  • kubernetes_vmi_label_... should it be kubevirt_vmi_label_...?

My thought was that metadata.label and metadata.annotation is a Kubernetes thing, so that's why I named it that way, but I have no problems to change it if you guys think that kubevirt_... is better

  • What about backwards compaibitility?

This PR removes the domain label, as requested at #3477 and is also removed at the PR #3500

But as we discussed at today's meeting, we shouldn't remove that label without sending a deprecation note before.

I think we should close #3500, I should re-add the domain label to this PR and then remove it in a future PR after the deprecation note, is that correct?

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jun 29, 2020
Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Not an expert in prometheus, looking good from code perspective.

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2020
@dhiller
Copy link
Contributor

dhiller commented Jul 9, 2020

@rmohr any objections here? Otherwise I'm going to approve it tomorrow.

@dhiller
Copy link
Contributor

dhiller commented Jul 10, 2020

/approve

@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dhiller

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2020
@dhiller
Copy link
Contributor

dhiller commented Jul 10, 2020

/hold need to resolve the situation around #3500, when that is done, feel free to cancel the hold

@kubevirt-bot kubevirt-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 10, 2020
Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am missing something but I think we have data races here.

labelPreffix = "kubernetes_vmi_label_"
annotationPreffix = "kubernetes_vmi_annotation_"

k8sLabels = []string{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this cause a data race?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, thanks!

I think it was just about to be merged 😮

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you take a look and point if this race condition was removed on this last commit?

},

// Metrics descriptors used at the Describe function
storageIopsDesc = prometheus.NewDesc("kubevirt_vmi_storage_iops_total", "", nil, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here. As we spawn goroutine per vmi on node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the race condition here, but I'm not sure how to solve this 😕

Any suggestions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know they shouldn't be global variables... but how can I use them at the Describe function and also update them at the update... functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need them to be globally defined if you override them later on anyway per metric which you push?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be global only if you want to pass them in the Describe.AFAIK it only performs some checks on registration, so we might be safe to not use it. (Also we modify the Desc afterward so the checks will not help us).

You have at least 2 options:

  1. to use private local struct where you have everything you need and pass it through functions
  2. convert the functions to methods on private local struct

Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Arthur,
would it be possible to include a functional test?

},

// Metrics descriptors used at the Describe function
storageIopsDesc = prometheus.NewDesc("kubevirt_vmi_storage_iops_total", "", nil, nil)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They need to be global only if you want to pass them in the Describe.AFAIK it only performs some checks on registration, so we might be safe to not use it. (Also we modify the Desc afterward so the checks will not help us).

You have at least 2 options:

  1. to use private local struct where you have everything you need and pass it through functions
  2. convert the functions to methods on private local struct

log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", memoryAvailableDesc, err)
return
if vmStats.Memory.RSSSet {
mv, err := prometheus.NewConstMetric(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can only set up labels where we see that RSSSet is true. (True for other metrics). WDYT?

@@ -533,3 +594,12 @@ func Handler(MaxRequestsInFlight int) http.Handler {
}),
)
}

func updateLabelsAndAnnotations(vmi *k6tv1.VirtualMachineInstance) (k8sLabels []string, k8sLabelValues []string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we want only labels?

Signed-off-by: arthursens <arthursens2005@gmail.com>
Also, metrics' labelset will only be populated when the domain stat is set

Signed-off-by: arthursens <arthursens2005@gmail.com>
Signed-off-by: arthursens <arthursens2005@gmail.com>
@ArthurSens
Copy link
Contributor Author

@xpivarc I think the race conditions were solved and a functional test was added. Do you mind taking a look again?

Copy link
Member

@xpivarc xpivarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Race conditions are gone. Just small changes and I think it's done.

)

func tryToPushMetric(desc *prometheus.Desc, mv prometheus.Metric, err error, ch chan<- prometheus.Metric) {
if err != nil {
log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", memoryAvailableDesc, err)
log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", desc, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@@ -384,6 +444,23 @@ func updateVersion(ch chan<- prometheus.Metric) {
)
}

type vmiMetrics struct {
Copy link
Member

@xpivarc xpivarc Jul 15, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you actually don't need this and all these Desc can be local. Please ignore this if I missed some cases.(NIT)

Otherwise, it seems good to me. Please just squash commits and ping me 👍

Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general looking good!

One thing I noticed was (sorry for being late for that!) the unit test coverage on the areas where the changes happened was low (only 25% of those were covered).

May I propose some additional unit tests (just a sketch, far from perfect!) on which you can elaborate?

pkg/monitoring/vms/prometheus/prometheus.go Outdated Show resolved Hide resolved
Signed-off-by: arthursens <arthursens2005@gmail.com>
Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for your work!

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2020
Signed-off-by: arthursens <arthursens2005@gmail.com>
@kubevirt-bot kubevirt-bot removed the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2020
Copy link
Contributor

@dhiller dhiller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the tests!

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 16, 2020
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented Jul 16, 2020

@ArthurSens: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-check-tests-for-flakes 1b51ea3 link /test pull-kubevirt-check-tests-for-flakes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ArthurSens
Copy link
Contributor Author

/hold cancel

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 16, 2020
@kubevirt-bot kubevirt-bot merged commit 2df17e5 into kubevirt:master Jul 16, 2020
ArthurSens added a commit to ArthurSens/kubevirt that referenced this pull request Jul 24, 2020
This PR addresses a small NIT that wasn't tackled by kubevirt#3636

See also: kubevirt#3636 (comment)

Signed-off-by: arthursens <arthursens2005@gmail.com>
ArthurSens added a commit to ArthurSens/kubevirt that referenced this pull request Aug 1, 2020
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct
The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes.

The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident

Signed-off-by: arthursens <arthursens2005@gmail.com>
victortoso pushed a commit to victortoso/kubevirt that referenced this pull request Aug 11, 2020
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct
The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes.

The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident

Signed-off-by: arthursens <arthursens2005@gmail.com>
victortoso pushed a commit to victortoso/kubevirt that referenced this pull request Aug 24, 2020
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct
The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes.

The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident

Signed-off-by: arthursens <arthursens2005@gmail.com>
EdDev pushed a commit to EdDev/kubevirt that referenced this pull request Oct 14, 2020
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct
The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes.

The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident

Signed-off-by: arthursens <arthursens2005@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

virt-handler metrics do not integrate any k8s VMI properties (labels, names, etc)
6 participants