Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cri and container_runtime to shoot_node_info #61

Merged
merged 2 commits into from
Aug 18, 2021

Conversation

voelzmo
Copy link
Member

@voelzmo voelzmo commented Aug 16, 2021

What this PR does / why we need it:
This adds information about the CRI worker pool configuration to shoot_node_info, looking like this

$ curl http://localhost:2719/metrics | grep garden_shoot_node_info

# HELP garden_shoot_node_info Information about the nodes in a Shoot.
# TYPE garden_shoot_node_info gauge
garden_shoot_node_info{container_runtimes="",cri="docker (default)",image="flatcar",name="marco-test-shoot",project="dev",version="2765.2.6",worker_group="fc-docker"} 0
garden_shoot_node_info{container_runtimes="gvisor",cri="containerd",image="flatcar",name="marco-test-shoot",project="dev",version="2765.2.6",worker_group="fc-contd-new"} 0

The following cases are considered:

  • The node pool cri is configured explicitly with either .spec.provider.workers[].cri.name: docker or .spec.provider.workers[].cri.name: containerd. This value is directly put into the cri label
  • The node pool cri is not configured explicitly and .spec.provider.workers[].cri: nil. In this case, we insert docker (default) in the label to distinguish from the explicit configuration option above. This can only be the case for k8s versions < 1.22 – afterwards a value of nil is defaulted to containerd.
  • The list of container runtimes at .spec.provider.workers[].cri.containerruntimes[] can be nil or contain an arbitrary number of elements. The type field values will be concatenated with ", " and put into the container_runtimes label.

Which issue(s) this PR fixes:
related gardener/gardener#4110

Special notes for your reviewer:
We probably want a dashboard for this, just for convenience – I'm just not sure how to develop that, tbh.

Release note:

`shoot_node_info` contains two new labels: `cri`, containing the worker pool's cri configuration configured at `.spec.provider.workers[].cri.name` and `container_runtimes`, containing a comma-separated list of all supported runtimes configured at `.spec.provider.workers[].cri.containerruntimes[]`.

Such that we can get an overview on how the amount of Shoots with `containerd` and `docker` pools develop over time.
@gardener-robot gardener-robot added needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Aug 16, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 16, 2021
@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Aug 16, 2021
Copy link
Contributor

@istvanballok istvanballok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: I did a minor edit of the description.

@@ -294,6 +306,8 @@ func (c gardenMetricsCollector) collectShootNodeMetrics(shoot *gardenv1beta1.Sho
worker.Name,
worker.Machine.Image.Name,
*worker.Machine.Image.Version,
criName,
strings.Join(containerRuntimes, ", "),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not so sure about adding a list into a label value. I think it makes more sense to create a new timeseries for each container runtime. @istvanballok wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the PR description:

container_runtimes is the (comma-separated) list of all supported runtimes

then I think it is fine to have the list in a single metric label, because the full list is the one that is supported. However, we should at least make sure that the sorting is consistent.

On the other hand, I'm not sure what the semantics of the supported runtimes really is - maybe it would be sufficient to expose only the used runtime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our call, I sorted the list of container runtimes to keep the order stable.

var containerRuntimes []string

if worker.CRI == nil {
criName = "docker (default)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using docker (default) here, can we just use (default)? If I understand correctly after k8s 1.22 the default will be containerd but the metric would still say docker (default) which is incorrect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed just now: Shoots with k8s >= 1.22 will always have a value for .spec.provider.workers[].cri.name. The defaulting logic we added with gardener/gardener#4222 takes care of the case where the user didn't specify anything.

@gardener-robot-ci-3 gardener-robot-ci-3 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 17, 2021
@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Aug 17, 2021
Copy link
Contributor

@wyb1 wyb1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

As discussed once there are more container_runtimes we can look into this again and possibly create a new metric to prevent adding a list into a label.

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging and removed needs/review Needs review labels Aug 18, 2021
@wyb1 wyb1 merged commit 9e8049a into gardener:master Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) reviewed/lgtm Has approval for merging size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants