-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cri and container_runtime to shoot_node_info #61
Conversation
Such that we can get an overview on how the amount of Shoots with `containerd` and `docker` pools develop over time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: I did a minor edit of the description.
@@ -294,6 +306,8 @@ func (c gardenMetricsCollector) collectShootNodeMetrics(shoot *gardenv1beta1.Sho | |||
worker.Name, | |||
worker.Machine.Image.Name, | |||
*worker.Machine.Image.Version, | |||
criName, | |||
strings.Join(containerRuntimes, ", "), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not so sure about adding a list into a label value. I think it makes more sense to create a new timeseries for each container runtime. @istvanballok wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the PR description:
container_runtimes
is the (comma-separated) list of all supported runtimes
then I think it is fine to have the list in a single metric label, because the full list is the one that is supported. However, we should at least make sure that the sorting is consistent.
On the other hand, I'm not sure what the semantics of the supported runtimes really is - maybe it would be sufficient to expose only the used runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on our call, I sorted the list of container runtimes to keep the order stable.
var containerRuntimes []string | ||
|
||
if worker.CRI == nil { | ||
criName = "docker (default)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using docker (default)
here, can we just use (default)
? If I understand correctly after k8s 1.22 the default will be containerd but the metric would still say docker (default)
which is incorrect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed just now: Shoots with k8s >= 1.22 will always have a value for .spec.provider.workers[].cri.name
. The defaulting logic we added with gardener/gardener#4222 takes care of the case where the user didn't specify anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
As discussed once there are more container_runtimes
we can look into this again and possibly create a new metric to prevent adding a list into a label.
What this PR does / why we need it:
This adds information about the CRI worker pool configuration to
shoot_node_info
, looking like thisThe following cases are considered:
.spec.provider.workers[].cri.name: docker
or.spec.provider.workers[].cri.name: containerd
. This value is directly put into thecri
label.spec.provider.workers[].cri: nil
. In this case, we insertdocker (default)
in the label to distinguish from the explicit configuration option above. This can only be the case for k8s versions < 1.22 – afterwards a value ofnil
is defaulted tocontainerd
..spec.provider.workers[].cri.containerruntimes[]
can be nil or contain an arbitrary number of elements. Thetype
field values will be concatenated with ", " and put into thecontainer_runtimes
label.Which issue(s) this PR fixes:
related gardener/gardener#4110
Special notes for your reviewer:
We probably want a dashboard for this, just for convenience – I'm just not sure how to develop that, tbh.
Release note: