New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add total VMs created metric #10418
Add total VMs created metric #10418
Conversation
docs/metrics.md
Outdated
@@ -81,6 +81,9 @@ The current available memory of the VM containers based on the rss. Type: Gauge. | |||
### kubevirt_vm_container_free_memory_bytes_based_on_working_set_bytes | |||
The current available memory of the VM containers based on the working set. Type: Gauge. | |||
|
|||
### kubevirt_vm_created_total | |||
Amount of VMs created, broken down by namespace. Type: Counter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created since when?
Since instal, update, controller pod restart, phase of the moon :) …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can't obviously have the counter track VMs from previous versions, but other than that Prometheus takes care of summing all counter values.
maybe this is a good opportunity to think once again about more metrics metadata, release lifecycle, version when it was added, when they are deprecated, etc... (would also help with the discussion in https://groups.google.com/g/kubevirt-dev/c/7p3q5Lo71hs/m/jIJGNJpdAwAJ)
@@ -81,6 +81,9 @@ The current available memory of the VM containers based on the rss. Type: Gauge. | |||
### kubevirt_vm_container_free_memory_bytes_based_on_working_set_bytes | |||
The current available memory of the VM containers based on the working set. Type: Gauge. | |||
|
|||
### kubevirt_vm_created_total | |||
Amount of VMs created, broken down by namespace. Type: Counter. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why do we need this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
providers using KubeVirt (through insights) and users would benefit from this because it would ease the process of understanding the VM usage through time (with absolute values and increase/delta/etc functions to understand trends)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The counter appears to be incremented whenever a VM is:
- Created for the first time
- Deleted and recreated
- Reconciled the first time by a new instance of virt-controller
On a system with a lot of churn there will be a big drop when virt-controller is restarted. Maybe that's not a big deal? How is that handled by consumers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prometheus handles the counter restarting at 0, when virt-controller
is restarted.
The only issue here would be point 3: "Reconciled the first time by a new instance of virt-controller". But with:
// VM is nil or already processed
if vm == nil || len(vm.Status.Conditions) != 0 {
return
}
that I added in func NewVMCreated(vm *v1.VirtualMachine)
, since the VM would have the status conditions set, I would expect it not to happen, am I missing something?
5a0b77d
to
10f6bf8
Compare
10f6bf8
to
cc992c2
Compare
[]string{"namespace"}, | ||
) | ||
|
||
vmMap = map[string]bool{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we're essentially keeping state in the virt-controller here, which (correct me if I'm wrong) means that the created VM metric will start lying as soon as the virt-controller pod dies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this state is to prevent "concurrent" requests when the VM is first created
because the vm.Status.Conditions
are not set yet in these (which would make this function being skipped as I mentioned in #10418 (comment))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean, on a system with existing VMs, when the virt-controller dies because of maintenance or something,
the metric would reset to 0 (lie). Is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is expected with counters, Prometheus handles those counter resets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we would be okay with this metric not reporting the correct amount of VMs after a restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't put it in that way. We want the metric correctly report the number of VMs created by that instance of 'virt-controller'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you can query it generally? so the sum of all instances that ever existed, including ones that died?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@machadovilaca @sradco Hi. My apologies but I find it hard to follow the motivation behind creating this metric. From my POV we can also add metric for VM deletion, VM modification. But what is the use-case we are trying to reach here? why is it important to track the creation rate?
@enp0s3 |
@machadovilaca Thank you for the reply. I see this as an answer to |
The |
Did we check if any existing Kubernetes metric could be used for this? |
yes, we could track |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why don't we implement this in the admitter? The admitter will process each VM exactly once(at least this hold for creation). It seems to me we will get the wrong numbers for the metric in the current implementation.
28bb15d
to
903533e
Compare
903533e
to
c2f83e5
Compare
@xpivarc good suggesting, I updated the PR to push the metric on vms-admitter |
) | ||
) | ||
|
||
func NewVMCreated(vm *v1.VirtualMachine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reduce the vm
to namespace
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we are using a pointer I think we don't have any performance problems with copying the structure, and personally, I think it makes more sense that a function NewVMCreated
receives a VM as an argument and not a Namespace. Even if in the future we want more labels or create new metrics in the function, it would be easier
@@ -210,6 +211,10 @@ func (admitter *VMsAdmitter) Admit(ar *admissionv1.AdmissionReview) *admissionv1 | |||
reviewResponse := admissionv1.AdmissionResponse{} | |||
reviewResponse.Allowed = true | |||
|
|||
if ar.Request.Operation == admissionv1.Create { | |||
metrics.NewVMCreated(&vm) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think vm might not have the namespace set. Could you check if we populate it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a resource always comes with a namespace, even if it is not set, it gets the value of the current namespace by default
docs/metrics.md
Outdated
@@ -66,6 +66,9 @@ The current available memory of the VM containers based on the rss. Type: Gauge. | |||
### kubevirt_vm_container_free_memory_bytes_based_on_working_set_bytes | |||
The current available memory of the VM containers based on the working set. Type: Gauge. | |||
|
|||
### kubevirt_vm_created_total | |||
Amount of VMs created, broken down by namespace. Type: Counter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @fabiand pointed out, you might want to elaborate on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
e9704dd
to
1787ffc
Compare
/cc @fossedihelm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @machadovilaca! few comments below
pkg/virt-api/api.go
Outdated
// setup monitoring | ||
err = metrics.SetupMetrics() | ||
if err != nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it correct to silently return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct, added a panic
1787ffc
to
8a8e125
Compare
@machadovilaca IIUC the metric is reset every time the instance of the virt-api is restarted. |
@fossedihelm that is the expected behavior and common in Prometheus. Since we are using a counter, which is monotonically increasing, Prometheus has mechanisms to handle when a value goes back to an inferior value (with the restart of the virt-api for example, where we would start counting from zero again), what they call a 'counter restart' |
/lgtm |
@@ -210,6 +211,10 @@ func (admitter *VMsAdmitter) Admit(ar *admissionv1.AdmissionReview) *admissionv1 | |||
reviewResponse := admissionv1.AdmissionResponse{} | |||
reviewResponse.Allowed = true | |||
|
|||
if ar.Request.Operation == admissionv1.Create { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be a dry run, please add a check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, check for nil
s as well
Signed-off-by: João Vilaça <jvilaca@redhat.com>
8d137f9
to
bdddab0
Compare
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
@fossedihelm PTAL |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: xpivarc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
What this PR does / why we need it:
Track the total number of VMs created by virt-controller
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Checklist
This checklist is not enforcing, but it's a reminder of items that could be relevant to every PR.
Approvers are expected to review this list.
Release note:
/cc @sradco @enp0s3