-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add k8s metadata labels to VMI metrics #3636
Add k8s metadata labels to VMI metrics #3636
Conversation
|
Currently working on splitting k8s_labels into multiple labels, e.g.: VMI has the following labels: kubevirt.io/nodeName: node01
special: vmi-fedoracurrent label: desired labels: |
|
Getting the following error at virt-handler logs at the moment {
"component": "virt-handler",
"level": "warning",
"msg": "Error creating the new const metric for Desc{fqName: \"kubevirt_vmi_memory_available_bytes\", help: \"amount of usable memory as seen by the domain.\", constLabels: {}, variableLabels: [node namespace name vmi_k8s_label_kubevirt_io_nodeName vmi_k8s_label_special vmi_k8s_annotation_kubevirt_io_latest_observed_api_version vmi_k8s_annotation_kubevirt_io_storage_observed_api_version]}: inconsistent label cardinality: expected 9 label values but got 10 in []string{\"node01\", \"default\", \"vmi-fedora\", \"vdb\", \"read\", \"node01\", \"vmi-fedora\", \"v1alpha3\", \"v1alpha3\", \"v1alpha3\"}",
"pos": "prometheus.go:148",
"timestamp": "2020-06-23T19:30:37.315607Z"
}Somehow passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ArthurSens thanks for the PR. One initial question.
|
/retest |
|
@rmohr I didn't like declaring metrics descriptors at the beginning and then overriding everything at the update functions... but I couldn't think of a better solution since the Any tips about that? Should I leave it as it is? |
|
/retest |
|
Two notes:
|
My thought was that
This PR removes the But as we discussed at today's meeting, we shouldn't remove that label without sending a deprecation note before. I think we should close #3500, I should re-add the domain label to this PR and then remove it in a future PR after the deprecation note, is that correct? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Not an expert in prometheus, looking good from code perspective.
|
@rmohr any objections here? Otherwise I'm going to approve it tomorrow. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhiller The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/hold need to resolve the situation around #3500, when that is done, feel free to cancel the hold |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am missing something but I think we have data races here.
| labelPreffix = "kubernetes_vmi_label_" | ||
| annotationPreffix = "kubernetes_vmi_annotation_" | ||
|
|
||
| k8sLabels = []string{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this cause a data race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
I think it was just about to be merged 😮
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you take a look and point if this race condition was removed on this last commit?
| }, | ||
|
|
||
| // Metrics descriptors used at the Describe function | ||
| storageIopsDesc = prometheus.NewDesc("kubevirt_vmi_storage_iops_total", "", nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here. As we spawn goroutine per vmi on node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the race condition here, but I'm not sure how to solve this 😕
Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know they shouldn't be global variables... but how can I use them at the Describe function and also update them at the update... functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need them to be globally defined if you override them later on anyway per metric which you push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to be global only if you want to pass them in the Describe.AFAIK it only performs some checks on registration, so we might be safe to not use it. (Also we modify the Desc afterward so the checks will not help us).
You have at least 2 options:
- to use private local struct where you have everything you need and pass it through functions
- convert the functions to methods on private local struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Arthur,
would it be possible to include a functional test?
| }, | ||
|
|
||
| // Metrics descriptors used at the Describe function | ||
| storageIopsDesc = prometheus.NewDesc("kubevirt_vmi_storage_iops_total", "", nil, nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They need to be global only if you want to pass them in the Describe.AFAIK it only performs some checks on registration, so we might be safe to not use it. (Also we modify the Desc afterward so the checks will not help us).
You have at least 2 options:
- to use private local struct where you have everything you need and pass it through functions
- convert the functions to methods on private local struct
| log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", memoryAvailableDesc, err) | ||
| return | ||
| if vmStats.Memory.RSSSet { | ||
| mv, err := prometheus.NewConstMetric( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can only set up labels where we see that RSSSet is true. (True for other metrics). WDYT?
| @@ -533,3 +594,12 @@ func Handler(MaxRequestsInFlight int) http.Handler { | |||
| }), | |||
| ) | |||
| } | |||
|
|
|||
| func updateLabelsAndAnnotations(vmi *k6tv1.VirtualMachineInstance) (k8sLabels []string, k8sLabelValues []string) { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we want only labels?
Signed-off-by: arthursens <arthursens2005@gmail.com>
Also, metrics' labelset will only be populated when the domain stat is set Signed-off-by: arthursens <arthursens2005@gmail.com>
Signed-off-by: arthursens <arthursens2005@gmail.com>
|
@xpivarc I think the race conditions were solved and a functional test was added. Do you mind taking a look again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Race conditions are gone. Just small changes and I think it's done.
| ) | ||
|
|
||
| func tryToPushMetric(desc *prometheus.Desc, mv prometheus.Metric, err error, ch chan<- prometheus.Metric) { | ||
| if err != nil { | ||
| log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", memoryAvailableDesc, err) | ||
| log.Log.V(4).Warningf("Error creating the new const metric for %s: %s", desc, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
| @@ -384,6 +444,23 @@ func updateVersion(ch chan<- prometheus.Metric) { | |||
| ) | |||
| } | |||
|
|
|||
| type vmiMetrics struct { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you actually don't need this and all these Desc can be local. Please ignore this if I missed some cases.(NIT)
Otherwise, it seems good to me. Please just squash commits and ping me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looking good!
One thing I noticed was (sorry for being late for that!) the unit test coverage on the areas where the changes happened was low (only 25% of those were covered).
May I propose some additional unit tests (just a sketch, far from perfect!) on which you can elaborate?
Signed-off-by: arthursens <arthursens2005@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for your work!
/lgtm
Signed-off-by: arthursens <arthursens2005@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests!
/lgtm
|
@ArthurSens: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/hold cancel |
This PR addresses a small NIT that wasn't tackled by kubevirt#3636 See also: kubevirt#3636 (comment) Signed-off-by: arthursens <arthursens2005@gmail.com>
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes. The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident Signed-off-by: arthursens <arthursens2005@gmail.com>
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes. The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident Signed-off-by: arthursens <arthursens2005@gmail.com>
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes. The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident Signed-off-by: arthursens <arthursens2005@gmail.com>
This PR was originally created to address a small NIT that was pointed out at kubevirt#3636, that asked to remove the vmiMetrics struct The metrics descriptors were removed, since it makes more sense for them to be local variables, but the sturct was kept with new attributes. The VMI object, their labels, and the channel are strongly correlated, just like the proccess of updating all metrics values. The struct makes that more evident Signed-off-by: arthursens <arthursens2005@gmail.com>
Signed-off-by: arthursens arthursens2005@gmail.com
What this PR does / why we need it:
This PR adds K8s metadata labels to VMI metrics.
This facilitates users to identify VMIs and aggregate metrics as they prefer and to implement their own monitoring solutions that best suit their needs
Labels
As said at the Kubernetes Labels and Selectors concept documentation
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #2108
Special notes for your reviewer:
To test this I'm using the vmi-fedora example.
After deploying the VMI, I get the following metadata when describing the resource:
Which is then translated into metric labels on every VMI metric(Note that it doesn't occur on metric not related to VMIs):
Release note: