Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: VM metrics documentation #487

Closed
wants to merge 4 commits into from
Closed

Conversation

@fromanirh
Copy link
Member

fromanirh commented Oct 4, 2017

Related-To: #296
Signed-off-by: Francesco Romani fromani@redhat.com

Related-To: #296
Signed-off-by: Francesco Romani <fromani@redhat.com>
@fromanirh

This comment has been minimized.

Copy link
Member Author

fromanirh commented Oct 4, 2017

Proposal about how to address and solve #296

@kubevirt-bot

This comment has been minimized.

Copy link
Contributor

kubevirt-bot commented Oct 4, 2017

Can one of the admins verify this patch?

Copy link
Member

fabiand left a comment

Just a few comments, nice start!

VM expose a different, and partially overlapping, set of metrics with
respect to containers. This document explain the differences and how
those metrics are made available to the cluster, to be consumed using
the standard kubernetes facilities.

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

capital K, Kubernetes


## libvirt metrics
[libvirt](https://libvirt.org/) is the virtualization framework that
kubevirt uses; in order to describe the VM metrics in kubevirt, the

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

capital K and V, KubeVirt


In a nutshell, libvirt uses both the QEMU configuration and the standard
facilites of the linux kernel to do the resource limitation and
accounting. As prime example, libvirt uses cgroups, to do resource

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

As -> A?

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

fixed

accounting for memory, cpu usage, I/O. Thus, we can consider this subset
of metrics to fully overlap with what monitoring system (e.g.
[cAdvisor](https://github.com/google/cadvisor)) already provides.
Let's call this set `system metrics`.

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

We need to be careful here.

cgroups are used in both cases, tho there are different processes in the livirt cgroups and in the kubelet cgoups.

Not sure if it matters.

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

Not sure either, but the concept I was try to convey is that the underlying monitoring mechanism is the same, so we can expect the same metric to be delivered one way or the other; in other words, there is nothing that libvirt reports that cAdvisor can't report as well -and vice versa.
Anyway I'll keep this point in mind and will check in the future.

accounting for memory, cpu usage, I/O. Thus, we can consider this subset
of metrics to fully overlap with what monitoring system (e.g.
[cAdvisor](https://github.com/google/cadvisor)) already provides.
Let's call this set `system metrics`.

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

system metrics is not code, what about using italics?

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Oct 6, 2017

This is being called core metrics or resource metrics in kubernetes

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

OK about italics. Thanks @smarterclayton for pointing this out, will check kubernetes doc and will align the naming and the usage of terms.

| virtual CPU #x total time | hypervisor | - | - |
| network interface count | hypervisor | - | - |
| block device count | hypervisor | - | - |
| migration downtime | hypervisor | yes | |

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

Any reason for not continuing to use a - in The provided by column?

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

It was a placeholder. I was looking for the equivalent concept in kubernetes, initial research was unfruitful. Now I learned those are 'core metrics', so I can drop the column.

From now on, the overlapping metrics (aka 'system metrics') will be ignored,
and we will focus on the 'hypervisor' and 'guest agent' metrics.

## collection of VM metrics

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

capital C, Collection?

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

Will fix.

https://github.com/kubernetes/kubernetes/blob/release-1.2/docs/proposals/custom-metrics.md).
All the metrics are accessible through the libvirt API.

The collection of the VM metrics will be done by a separate pod, implemented as kubernetes DaemonSet.

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

capital K, Kubernetes


Once name is exported, it is considered part of the API stability and treated accordingly.

TODO: what about name clashes? at least report them? We can do that?

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

We'd need somebody to find clashes.

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

Ok, so there is no automated way to detect them IIUC


- **BAD**:
`VM0.migration.bps`
What `bps` is? Is it Bytes Per Second or Bits per Second? Or something else more specific?

This comment has been minimized.

Copy link
@fabiand

fabiand Oct 4, 2017

Member

Yes, eventually encoding the unit in the name might make sense.

This comment has been minimized.

Copy link
@MarSik

MarSik Oct 10, 2017

Member

bps is definitely bits per seconds btw :) Byte uses uppercase B when the unit is written properly. But it should not have its own leaf, unless you want to expose multiple different scales.

To increase resilience, each node should expose all the metrics it knows about, avoiding
a single point of failure per cluster. The resulting architecture is shown in the figure below.

![VM metrics diagram](vm-metrics.png "VM Metrics")

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 5, 2017

Author Member

wrong diagram, I (hastly) copied another one and got it wrong. Libvirt and vAdvisor must run on each node. Will update it ASAP.

Fix spelling of Kubernetes and KubeVirt, and update
the diagram to really reflect the proposed architecture.
Copy link
Member

fabiand left a comment

To sum up my thoughts - in general a good start, I don't see any conflicts of interest.

+1

@fabiand

This comment has been minimized.

Copy link
Member

fabiand commented Oct 5, 2017

I'd appreciate that we start looking into providing additional components i.e. to showcase this metrics - i.e. add an optional graphana manifest

## collection of VM metrics
A natural fit for the KubeVirt specific metrics is the [custom metrics](
https://github.com/Kubernetes/Kubernetes/blob/release-1.2/docs/proposals/custom-metrics.md).
All the metrics are accessible through the libvirt API.

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Oct 6, 2017

There can only be one custom metrics provider per cluster for now, which is going to be a prometheus gathering high value metrics), so keep that in mind

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

OK, thanks for pointing this out

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

@smarterclayton could you please point me to some k8s documentation about this limitation of only one custom metrics provider? I want to learn more (e.g. why there is this constraint, if/we it is planned to be lifted, if any other constraint exist) but I didn't find complete documentation yet. Thanks!

## VM metrics naming rules

In order to minimize the clashes and make the VM metrics name predictable, we adopt the following rules
to set the names:

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Oct 6, 2017

Are you trying to name these to be consistent with Kube? The standard format for Kubernetes is prometheus, so I would expect you to follow the conventions there for metrics that will be gathered (use labels for unique indentificstion of entities, for example).

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

I was looking for feedback about this. Is the prometheus convention good enough for us? (I'm not hinting yes or no, just asking)

This comment has been minimized.

Copy link
@MarSik

MarSik Oct 10, 2017

Member

Prometheus format is less strict than what you proposed, because you can have multiple different orthogonal labels instead of just one fixed metrics name.

## Implementation notes

To implement the VM metrics collection and reporting, we will initially
adapt the [vAdvisor project](https://github.com/KubeVirt/vAdvisor),

This comment has been minimized.

Copy link
@smarterclayton

smarterclayton Oct 6, 2017

You're talking about the agent on the node collection? Or something higher level?

This comment has been minimized.

Copy link
@fromanirh

fromanirh Oct 9, 2017

Author Member

I mean the agent on the node, to report metrics we care about and cAdvisor doesn't know


- **BAD**:
`VM0.migration.bps`
What `bps` is? Is it Bytes Per Second or Bits per Second? Or something else more specific?

This comment has been minimized.

Copy link
@MarSik

MarSik Oct 10, 2017

Member

bps is definitely bits per seconds btw :) Byte uses uppercase B when the unit is written properly. But it should not have its own leaf, unless you want to expose multiple different scales.

## VM metrics naming rules

In order to minimize the clashes and make the VM metrics name predictable, we adopt the following rules
to set the names:

This comment has been minimized.

Copy link
@MarSik

MarSik Oct 10, 2017

Member

Prometheus format is less strict than what you proposed, because you can have multiple different orthogonal labels instead of just one fixed metrics name.


- It always export all the metrics it knows about, even if the real data is not yet available.
Such metrics report the special value [TODO: which one? does this
concept exist in Kubernetes?] until the real data is available.

This comment has been minimized.

Copy link
@MarSik

MarSik Oct 10, 2017

Member

It might be possible to report the last modification date too, Prometheus format supports timestamps.

This patch reorganize the content of the vm-metrics proposal, aligning
the naming scheme to the one used in
https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md
Even though the key concepts are the same as the previous version,
almost all the document has been rewritten to reframe the concepts
in the framework of the monitoring architecture proposal.
@fromanirh

This comment has been minimized.

Copy link
Member Author

fromanirh commented Oct 12, 2017

ping?

Reference the Prometheus naming rules, add note about
the intended usage of Prometheus metric labels.
Highlight the pending items about the collection side.
Copy link
Member

stu-gott left a comment

The general message of this seems fine to me. There are a large number of misplaced newlines in this document. This won't show up in a generated web page, but looks odd when viewing in a text editor.


The key components of the monitoring pipeline are:

- The KubeVirt clustr agent: we can use a standard Prometheus server instance

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 26, 2017

Member

typo? clustr->cluster


## Final diagram

The following diagram illustrates how the KyubeVirt monitoring pipeline will look like

This comment has been minimized.

Copy link
@stu-gott

stu-gott Oct 26, 2017

Member

Typo: KyubeVirt

Copy link
Member

rmohr left a comment

Looks nice over all and it also seems to take into account where k8s wants to go.

What are your plans for integration right now? Using the common plane with custom metrics? Are you also planning to ship our metrics on two different streams? If yes, are there plans based on the k8s proposal, to allow "add-on core metrics"?

to set the names:

1. **TO BE CHECKED** If namespaces are available, they should be used: each VM has its own namespace - the VM name or UUID,
to be decided. Otherwise, a metric name should include the VM identifier (name or UUID) as prefix.

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

In case of Prometheus that would be labels, not a metric prefix, right?

accounting for memory, cpu usage, I/O.

Libvirtd provides metrics that roughly overlaps with Kubernetes
[core metrics](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/instrumentation/monitoring_architecture.md)

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

Looks like in that proposal, the core metrics should not be extensible by design, so we can't add core metrics which might be relevant for VMs only there?


The normalization features will be controlled by node agent options (e.g. command line switches).

### Metric naming rules

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

That looks good.


The options to implement the KubeVirt node agent are:

#### vAdvisor

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

If we really only export prometheus metrics, it might not even be worth to temporarily use vAdvisor. Although I thought it might make sense in the past. Collecting and exposing prometheus metrics is extremely simple in go.

[Initial experimentations](https://github.com/fromanirh/go-vAdvisor) gave promising results about
the development speed

#### collectd

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

I would try to focus on the mostly used tools which are afaik heapster, prometheus and cAdvisor. snapd and collectd seem to be mentioned quite regularly too. I don't have enough insight in real world deployments to see if snapd and collectd are important enough right now. Anyone?


## KubeVIRT API Adapters: VM metrics naming rules

The task of the KubeVirt HPA API adapter is to expose the metrics collected by the cluster agent

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

If HPA stands for Horizontal Pod Autoscaler, I am a little bit confused. HPA is for sure a consumer of Heapster/prometheus metrics, but how is that related to this proposal? Maybe I misunderstand HPA? Or are you thinking about scaling VMs based on that?

The following diagram illustrates how the KyubeVirt monitoring pipeline will look like
with all the components described in this document deployed

![VM metrics monitoring pipeline example](vm-metrics-example.png "VM Metrics monitoring pipeline example")

This comment has been minimized.

Copy link
@rmohr

rmohr Nov 3, 2017

Member

Could you add existing component names (if they exist) to the boxes? E.g. "Monitoring CLuster Agent" probably relates to Heapster or Prometheus?

@kubevirt-bot

This comment has been minimized.

Copy link
Contributor

kubevirt-bot commented Dec 6, 2017

Can one of the admins verify this patch?

1 similar comment
@kubevirt-bot

This comment has been minimized.

Copy link
Contributor

kubevirt-bot commented Dec 6, 2017

Can one of the admins verify this patch?

@fabiand

This comment has been minimized.

Copy link
Member

fabiand commented Apr 3, 2018

Thanks for the work.

I'm closing this PR for now, we can revive it once this becomes relevant again.

@fabiand fabiand closed this Apr 3, 2018
@kubevirt kubevirt deleted a comment from yanirq Aug 20, 2018
@fromanirh fromanirh deleted the fromanirh:vm-metrics branch Nov 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.