Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add downwardMetrics volume to KubeVirt to allow exposing host metrics to guests #5502

Merged
merged 14 commits into from May 21, 2021

Conversation

rmohr
Copy link
Member

@rmohr rmohr commented Apr 21, 2021

What this PR does / why we need it:

Most virtualization platforms provide a way to make a limited amount of host metrics visible inside guests for performance and certification reasons.

In the libvirt/qemu ecosystem, exposing metrics via vhostmd is common. vhostmd does not really suite the architectural needs for kubevirt, since it is designed to run together with a central libvirt instance and being connected to it. Something which kubevirt does not have.

Therefore a different approach is taken here. Instead of integrating vhostmd, the metrics expected to be visible in the guest are collected in virt-handler and then exposed to the guest via read-only raw block devices, like vhostmd does. It also uses the same XML format for the exposed metrics. The exposed metrics are matching the configuration on RHEL and Fedora.

One can define a dowwardMetrics volume like this on the VMI:

---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: vhostmd
  volumes:
  - downwardMetrics: {}
    name: vhostmd

When this VMI is started, the vm-dump-metrics tool can be used to get the metrics. This is exactly how this works for vhostmd .

We only expose non-critical data from the guest itself and non-critical data from the node. No data from any other guest. Still, in case that admins feel uncomfortable with that option, the feature is moved behind a featuregate called DownwardMetrics. The feature gate needs to be enabled to allow using the feature.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Special notes for your reviewer:

Here a fully working example VMI:

---
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
  labels:
    special: vmi-fedora
  name: vmi-fedora
spec:
  domain:
    devices:
      disks:
      - disk:
          bus: virtio
        name: containerdisk
      - disk:
          bus: virtio
        name: cloudinitdisk
      - disk:
          bus: virtio
        name: vhostmd
      rng: {}
    machine:
      type: ""
    resources:
      requests:
        memory: 1024M
  terminationGracePeriodSeconds: 0
  volumes:
  - containerDisk:
      image: registry:5000/kubevirt/fedora-cloud-container-disk-demo:devel
    name: containerdisk
  - cloudInitNoCloud:
      userData: |-
        #cloud-config
        password: fedora
        chpasswd: { expire: False }
    name: cloudinitdisk
  - downwardMetrics: {}
    name: vhostmd

Inside the guest one can run

$ sudo dnf install -y vm-dump-metrics
$ sudo vm-dump-metrics 
<metrics>
  <metric type="string" context="host">
    <name>HostName</name>
    <value>node01</value>
[...]
  <metric type="int64" context="host" unit="s">
    <name>Time</name>
    <value>1619008605</value>
  </metric>
  <metric type="string" context="host">
    <name>VirtualizationVendor</name>
    <value>kubevirt.io</value>
  </metric>
</metrics>

to get the metrics.

Release note:

Add downwardMetrics volume to expose a limited set of hots metrics to guests

@kubevirt-bot kubevirt-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/XXL kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/build-change Categorizes PRs as related to changing build files of virt-* components labels Apr 21, 2021
@rmohr rmohr changed the title Add downwardMetrics volume to KubeVirt to allow host metrics exposure to guests Add downwardMetrics volume to KubeVirt to allow exposing host metrics to guests Apr 21, 2021
@rmohr
Copy link
Member Author

rmohr commented Apr 21, 2021

/cc @jean-edouard

I think you looked a little bit into vhostmd, do you think you could do an initial review and post your thoughts?

@rmohr
Copy link
Member Author

rmohr commented Apr 22, 2021

/cc @vladikr

@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/retest

@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/test all

@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/test help

@kubevirt-bot
Copy link
Contributor

@rmohr: The specified target(s) for /test were not found.
The following commands are available to trigger jobs:

  • /test pull-kubevirt-e2e-k8s-1.20
  • /test pull-kubevirt-e2e-k8s-1.20-sig-network
  • /test pull-kubevirt-e2e-k8s-1.20-sig-storage
  • /test pull-kubevirt-e2e-k8s-1.20-sig-compute
  • /test pull-kubevirt-e2e-k8s-1.20-operator
  • /test pull-kubevirt-e2e-k8s-1.20-cgroupsv2
  • /test pull-kubevirt-e2e-k8s-1.19
  • /test pull-kubevirt-e2e-k8s-1.19-sig-network
  • /test pull-kubevirt-e2e-k8s-1.19-sig-storage
  • /test pull-kubevirt-e2e-k8s-1.19-operator
  • /test pull-kubevirt-e2e-k8s-1.18
  • /test pull-kubevirt-e2e-k8s-1.17
  • /test pull-kubevirt-e2e-windows2016
  • /test pull-kubevirt-e2e-kind-1.17-sriov
  • /test pull-kubevirt-check-tests-for-flakes
  • /test pull-kubevirt-e2e-k8s-1.17-rook-ceph
  • /test pull-kubevirt-generate
  • /test pull-kubevirt-verify-rpms
  • /test pull-kubevirt-gosec
  • /test pull-kubevirt-build
  • /test pull-kubevirt-build-arm64
  • /test pull-kubevirt-unit-test
  • /test pull-kubevirt-goveralls
  • /test pull-kubevirt-apidocs
  • /test pull-kubevirt-client-python
  • /test pull-kubevirt-manifests
  • /test pull-kubevirt-prom-rules-verify

Use /test all to run the following jobs:

  • pull-kubevirt-e2e-k8s-1.20-sig-network
  • pull-kubevirt-e2e-k8s-1.20-sig-storage
  • pull-kubevirt-e2e-k8s-1.19
  • pull-kubevirt-e2e-k8s-1.19-sig-network
  • pull-kubevirt-e2e-k8s-1.19-sig-storage
  • pull-kubevirt-e2e-k8s-1.19-operator
  • pull-kubevirt-e2e-k8s-1.18
  • pull-kubevirt-e2e-windows2016
  • pull-kubevirt-e2e-kind-1.17-sriov
  • pull-kubevirt-check-tests-for-flakes
  • pull-kubevirt-e2e-k8s-1.17-rook-ceph
  • pull-kubevirt-generate
  • pull-kubevirt-verify-rpms
  • pull-kubevirt-build
  • pull-kubevirt-build-arm64
  • pull-kubevirt-unit-test
  • pull-kubevirt-apidocs
  • pull-kubevirt-client-python
  • pull-kubevirt-manifests
  • pull-kubevirt-prom-rules-verify

In response to this:

/test help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/test all

1 similar comment
@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/test all

@rmohr
Copy link
Member Author

rmohr commented Apr 23, 2021

/test pull-kubevirt-e2e-k8s-1.18

@rmohr
Copy link
Member Author

rmohr commented Apr 24, 2021

/test all

@rmohr
Copy link
Member Author

rmohr commented Apr 24, 2021

/retest

2 similar comments
@rmohr
Copy link
Member Author

rmohr commented Apr 28, 2021

/retest

@stu-gott
Copy link
Member

stu-gott commented May 3, 2021

/retest

@kubevirt-bot kubevirt-bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 7, 2021
@davidvossel
Copy link
Member

vhostmd does not really suite the architectural needs for kubevirt, since it is designed to run together with a central libvirt instance and being connected to it. Something which kubevirt does not have.

@rmohr is there any way to use vhostmd in a limited fashion without requiring it to connect to a libvirtd? If so, why is it preferable for us to mimic the vhostmd api rather than running vhostmd in virt-handler and coming up with some shim to write that into the downward volume idea you're working with?

Copy link
Member

@vladikr vladikr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great.
I've added some questions/comments.
I think we should verify that a VMI can migrate with this disk as expected.

pkg/downwardmetrics/vhostmd/disk_test.go Outdated Show resolved Hide resolved
return MustToMetric(value, name, unit, api.MetricContextVM)
}

func MustToMetric(value interface{}, name string, unit string, context api.MetricContext) api.Metric {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, what does this Must prefix mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Afaik the Must prefix is used in golang to indicate that if this operation fails, we are probably dealing with a programming error and we don't want to hide it. In this case all values are explicitly passed in in a hardcoded fashion, and if we have an error here it is a programming error. Examples are: MustCompile in the regex package, or MustParse and similar functions for parsing quantities or e.g. label selectors in k8s.

metricspkg.MustToMetric(3, "TotalCPU", "", api.MetricContextVM),
},
}
for x := 0; x < 5; x++ {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think that if you write and read the same thing the result will be the same.
I wonder what was your concern here, why would this change over time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly there to check that load and write don't modify results in an unexpected way and that the load really takes it all and does not loose anything.

pkg/downwardmetrics/vhostmd/metrics/metrics.go Outdated Show resolved Hide resolved
metricspkg.MustToUnitlessHostMetric(cpuinfo.NumPhysicalCPU(), "NumberOfPhysicalCPUs"),
)
} else {
log.Log.Reason(err).Info("failed to collect cpuinfo on the node")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the behavior is a bit inconsistent. When we can't parse a metric we panic, but when we can't get the metrics we just skip it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we can get an error (e.g. reading the file) which is outside of our control. In this case I want to take as much as I can and hope that things recover at some point, but I don't want to crash the app. It is not necessarily a sign of a programming error.

Comment on lines 105 to 106
metrics := getDownwardMetrics(vmi)
timestamp := getTimeFromMetrics(metrics)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Basically I fetch the metrics a first time and extract the timestamp. Then I re-fetch them a few times and expect that the timestamp has changed because the file was updated with new metrics.

rmohr added 6 commits May 14, 2021 14:07
Create a scraper which collects all metrics found in standard RHEL and
Fedora vhostmd.conf files.

Wire that scraper into virt-handler.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
As part of it, bump the fedora testing image with a version which
contains `vm-dump-metrics`, to verify that `vm-dump-metrics` works with
metrics exposed by us.

Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
Signed-off-by: Roman Mohr <rmohr@redhat.com>
if err != nil {
return fmt.Errorf("failed to open vhostmd disk: %v", err)
}
defer f.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can let it log an error. There is not action necessary regarding to recovering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@vladikr
Copy link
Member

vladikr commented May 18, 2021

@rmohr 👍
/lgtm

@kubevirt-bot kubevirt-bot added lgtm Indicates that a PR is ready to be merged. and removed lgtm Indicates that a PR is ready to be merged. labels May 18, 2021
Signed-off-by: Roman Mohr <rmohr@redhat.com>
@rmohr
Copy link
Member Author

rmohr commented May 18, 2021

@stu-gott @vladikr fixe the minor close issue. Ready for another look. :)

@rmohr
Copy link
Member Author

rmohr commented May 20, 2021

/retest

@vladikr
Copy link
Member

vladikr commented May 20, 2021

/approve
/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label May 20, 2021
@kubevirt-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vladikr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 20, 2021
@kubevirt-bot
Copy link
Contributor

kubevirt-bot commented May 20, 2021

@rmohr: The following test failed, say /retest to rerun all failed tests:

Test name Commit Details Rerun command
pull-kubevirt-e2e-k8s-1.18 48ba03d link /test pull-kubevirt-e2e-k8s-1.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rmohr
Copy link
Member Author

rmohr commented May 20, 2021

/retest

@kubevirt-bot kubevirt-bot merged commit 9bc2254 into kubevirt:master May 21, 2021
tiraboschi added a commit to tiraboschi/hyperconverged-cluster-operator that referenced this pull request Aug 26, 2021
github.com/kubevirt/kubevirt/pull/5502
introduced a new Kubevirt feature named
downwardMetrics to expose host metrics
to guests.
The feature is controlled by a feature
gate named "DownwardMetrics".
Enable it in the list of HCO
opinionated feature gates.

Fixes: https://bugzilla.redhat.com/1991691

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
kubevirt-bot pushed a commit to kubevirt/hyperconverged-cluster-operator that referenced this pull request Aug 26, 2021
github.com/kubevirt/kubevirt/pull/5502
introduced a new Kubevirt feature named
downwardMetrics to expose host metrics
to guests.
The feature is controlled by a feature
gate named "DownwardMetrics".
Enable it in the list of HCO
opinionated feature gates.

Fixes: https://bugzilla.redhat.com/1991691

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
kubevirt-bot pushed a commit to kubevirt-bot/hyperconverged-cluster-operator that referenced this pull request Aug 27, 2021
github.com/kubevirt/kubevirt/pull/5502
introduced a new Kubevirt feature named
downwardMetrics to expose host metrics
to guests.
The feature is controlled by a feature
gate named "DownwardMetrics".
Enable it in the list of HCO
opinionated feature gates.

Fixes: https://bugzilla.redhat.com/1991691

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>
kubevirt-bot pushed a commit to kubevirt/hyperconverged-cluster-operator that referenced this pull request Aug 27, 2021
github.com/kubevirt/kubevirt/pull/5502
introduced a new Kubevirt feature named
downwardMetrics to expose host metrics
to guests.
The feature is controlled by a feature
gate named "DownwardMetrics".
Enable it in the list of HCO
opinionated feature gates.

Fixes: https://bugzilla.redhat.com/1991691

Signed-off-by: Simone Tiraboschi <stirabos@redhat.com>

Co-authored-by: Simone Tiraboschi <stirabos@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/build-change Categorizes PRs as related to changing build files of virt-* components lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants