New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add downwardMetrics volume to KubeVirt to allow exposing host metrics to guests #5502
Conversation
/cc @jean-edouard I think you looked a little bit into vhostmd, do you think you could do an initial review and post your thoughts? |
/cc @vladikr |
/retest |
/test all |
/test help |
@rmohr: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/test all |
1 similar comment
/test all |
/test pull-kubevirt-e2e-k8s-1.18 |
/test all |
/retest |
2 similar comments
/retest |
/retest |
@rmohr is there any way to use vhostmd in a limited fashion without requiring it to connect to a libvirtd? If so, why is it preferable for us to mimic the vhostmd api rather than running vhostmd in virt-handler and coming up with some shim to write that into the downward volume idea you're working with? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
I've added some questions/comments.
I think we should verify that a VMI can migrate with this disk as expected.
return MustToMetric(value, name, unit, api.MetricContextVM) | ||
} | ||
|
||
func MustToMetric(value interface{}, name string, unit string, context api.MetricContext) api.Metric { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, what does this Must
prefix mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Afaik the Must
prefix is used in golang to indicate that if this operation fails, we are probably dealing with a programming error and we don't want to hide it. In this case all values are explicitly passed in in a hardcoded fashion, and if we have an error here it is a programming error. Examples are: MustCompile
in the regex package, or MustParse
and similar functions for parsing quantities or e.g. label selectors in k8s.
metricspkg.MustToMetric(3, "TotalCPU", "", api.MetricContextVM), | ||
}, | ||
} | ||
for x := 0; x < 5; x++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think that if you write and read the same thing the result will be the same.
I wonder what was your concern here, why would this change over time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is mostly there to check that load and write don't modify results in an unexpected way and that the load really takes it all and does not loose anything.
metricspkg.MustToUnitlessHostMetric(cpuinfo.NumPhysicalCPU(), "NumberOfPhysicalCPUs"), | ||
) | ||
} else { | ||
log.Log.Reason(err).Info("failed to collect cpuinfo on the node") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the behavior is a bit inconsistent. When we can't parse a metric we panic, but when we can't get the metrics we just skip it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we can get an error (e.g. reading the file) which is outside of our control. In this case I want to take as much as I can and hope that things recover at some point, but I don't want to crash the app. It is not necessarily a sign of a programming error.
tests/infra_test.go
Outdated
metrics := getDownwardMetrics(vmi) | ||
timestamp := getTimeFromMetrics(metrics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Basically I fetch the metrics a first time and extract the timestamp. Then I re-fetch them a few times and expect that the timestamp has changed because the file was updated with new metrics.
Create a scraper which collects all metrics found in standard RHEL and Fedora vhostmd.conf files. Wire that scraper into virt-handler. Signed-off-by: Roman Mohr <rmohr@redhat.com>
As part of it, bump the fedora testing image with a version which contains `vm-dump-metrics`, to verify that `vm-dump-metrics` works with metrics exposed by us. Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
pkg/downwardmetrics/vhostmd/disk.go
Outdated
if err != nil { | ||
return fmt.Errorf("failed to open vhostmd disk: %v", err) | ||
} | ||
defer f.Close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to consider this here? https://www.joeshaw.org/dont-defer-close-on-writable-files/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can let it log an error. There is not action necessary regarding to recovering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr 👍 |
Signed-off-by: Roman Mohr <rmohr@redhat.com>
/retest |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: vladikr The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@rmohr: The following test failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest |
github.com/kubevirt/kubevirt/pull/5502 introduced a new Kubevirt feature named downwardMetrics to expose host metrics to guests. The feature is controlled by a feature gate named "DownwardMetrics". Enable it in the list of HCO opinionated feature gates. Fixes: https://bugzilla.redhat.com/1991691 Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
github.com/kubevirt/kubevirt/pull/5502 introduced a new Kubevirt feature named downwardMetrics to expose host metrics to guests. The feature is controlled by a feature gate named "DownwardMetrics". Enable it in the list of HCO opinionated feature gates. Fixes: https://bugzilla.redhat.com/1991691 Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
github.com/kubevirt/kubevirt/pull/5502 introduced a new Kubevirt feature named downwardMetrics to expose host metrics to guests. The feature is controlled by a feature gate named "DownwardMetrics". Enable it in the list of HCO opinionated feature gates. Fixes: https://bugzilla.redhat.com/1991691 Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
github.com/kubevirt/kubevirt/pull/5502 introduced a new Kubevirt feature named downwardMetrics to expose host metrics to guests. The feature is controlled by a feature gate named "DownwardMetrics". Enable it in the list of HCO opinionated feature gates. Fixes: https://bugzilla.redhat.com/1991691 Signed-off-by: Simone Tiraboschi <stirabos@redhat.com> Co-authored-by: Simone Tiraboschi <stirabos@redhat.com>
What this PR does / why we need it:
Most virtualization platforms provide a way to make a limited amount of host metrics visible inside guests for performance and certification reasons.
In the libvirt/qemu ecosystem, exposing metrics via vhostmd is common.
vhostmd
does not really suite the architectural needs for kubevirt, since it is designed to run together with a central libvirt instance and being connected to it. Something which kubevirt does not have.Therefore a different approach is taken here. Instead of integrating
vhostmd
, the metrics expected to be visible in the guest are collected in virt-handler and then exposed to the guest via read-only raw block devices, likevhostmd
does. It also uses the same XML format for the exposed metrics. The exposed metrics are matching the configuration on RHEL and Fedora.One can define a
dowwardMetrics
volume like this on the VMI:When this VMI is started, the
vm-dump-metrics
tool can be used to get the metrics. This is exactly how this works forvhostmd
.We only expose non-critical data from the guest itself and non-critical data from the node. No data from any other guest. Still, in case that admins feel uncomfortable with that option, the feature is moved behind a featuregate called
DownwardMetrics
. The feature gate needs to be enabled to allow using the feature.Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Here a fully working example VMI:
Inside the guest one can run
to get the metrics.
Release note: