Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handler: Expose nmstatectl stats as k8s metrics #1221

Merged
merged 1 commit into from
Apr 17, 2024

Conversation

qinqon
Copy link
Member

@qinqon qinqon commented Dec 11, 2023

Is this a BUG FIX or a FEATURE ?:
/kind enhancement

What this PR does / why we need it:
Now that nmstatectl is able to calculate some useful stats from network configuration [1], we can bubble them up and expose them as k8s metrics so k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
nmstatectl st and also create a new deployment nmstate-metrics that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

This is an example of nmstate feature stat

kubernetes_nmstate_features_applied{name="dhcpv4-custom-hostname"} 1

Depends on nmstate 2.2.20, looks like it's build but still not present at centos 9 stream

[1] nmstate/nmstate#2420

TODO:

  • Compare old and new nncp stats to be able to decrease conunters.

Release note:

Expoxe statistics generated from `nmstatectl stats`

@kubevirt-bot kubevirt-bot added kind/enhancement release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. size/L labels Dec 11, 2023
@qinqon qinqon force-pushed the expose-nmstatectl-stats branch 5 times, most recently from 6aede34 to 39570e3 Compare December 15, 2023 11:44
@sradco
Copy link

sradco commented Dec 17, 2023

@machadovilaca @avlitman please see if you can help in reviewing this PR.

pkg/monitoring/metrics.go Outdated Show resolved Hide resolved
pkg/monitoring/metrics.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics_test.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics.go Outdated Show resolved Hide resolved
pkg/nmstatectl/nmstatectl.go Show resolved Hide resolved
deploy/handler/operator.yaml Outdated Show resolved Hide resolved
deploy/handler/role.yaml Show resolved Hide resolved
test/e2e/handler/metrics_test.go Outdated Show resolved Hide resolved
@qinqon qinqon force-pushed the expose-nmstatectl-stats branch 5 times, most recently from 13c356a to 00cc20b Compare December 21, 2023 12:51
@qinqon
Copy link
Member Author

qinqon commented Dec 21, 2023

@avlitman @machadovilaca @sradco can you take another look ?

Comment on lines 231 to 236
for _, t := range stats.Topology {
monitoring.ApplyTopologyTotal.WithLabelValues(t).Inc()
}
for _, f := range stats.Features {
monitoring.ApplyFeaturesTotal.WithLabelValues(f).Inc()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://github.com/nmstate/kubernetes-nmstate/pull/1221/files#r1431131839, as I understand, I'm afraid this will have unbound cardinality as it will contain network state information like hostnames/interfaces/address/ports, is that correct?

Copy link
Member Author

@qinqon qinqon Dec 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://github.com/nmstate/kubernetes-nmstate/pull/1221/files#r1431131839, as I understand, I'm afraid this will have unbound cardinality as it will contain network state information like hostnames/interfaces/address/ports, is that correct?

The cardinality is not unbound, is not like we are going to count "mycluster1" and "eth1", the topology is a combination of enumerations (that's a little unbound but not that much) and the features is a list of enums too.

Can this be a problem ?

Copy link

@sradco sradco Dec 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, labels need to have a limited values set. Except for specific labels like VM names, pod names, namespaces and other resources etc.
See note: https://prometheus.io/docs/practices/naming/#labels

Copy link
Member Author

@qinqon qinqon Jan 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this is an example of the nmstate stdats

topology:
- static_ip4,static_ip6 -> linux-bridge -> bond -> ethernet
- static_ip4,static_ip6 -> vlan -> linux-bridge -> bond -> ethernet
features:
- sriov
- mac-based-identifier

For "features" is clear that is limited on the enum variants, maybe the main problem would be "topology".

@sradco @machadovilaca do you know what kind of limits do we have for labels bound ?

I read the following at the docs

CAUTION: Remember that every unique combination of key-value label pairs represents a new time series, which can dramatically increase the amount of data stored. Do not use labels to store dimensions with high cardinality (many different label values), such as user IDs, email addresses, or other unbounded sets of values.

But I don't see specific limits, also topology is going to be smaller than something like generated ID or pod names.

pkg/monitoring/metrics.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics_test.go Show resolved Hide resolved
test/e2e/handler/metrics_test.go Outdated Show resolved Hide resolved
test/e2e/handler/metrics_test.go Show resolved Hide resolved
@kubevirt-bot
Copy link
Collaborator

@machadovilaca: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@sradco
Copy link

sradco commented Dec 21, 2023

@qinqon I propose you to also add docs generator for the operator. I think this is really useful.

We have the automation, so that when the user adds a PR with a new metric, the test runs and checks if the metric is already documented. If not the user is asked to run make generate and this automatically updated the PR with the change to the metrics.md file with the new metric, description and type.

See an example to the metrics.md file here
https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/docs/metrics.md
and the docs generator is here
https://github.com/kubevirt/hyperconverged-cluster-operator/blob/main/tools/metricsdocs/metricsdocs.go
(Note: we plan to move it to /monitoring/tools/ )

@kubevirt-bot kubevirt-bot added the dco-signoff: no Indicates the PR's author has not DCO signed all their commits. label Dec 21, 2023
@qinqon qinqon force-pushed the expose-nmstatectl-stats branch 6 times, most recently from e813553 to 678209e Compare April 16, 2024 05:36
@qinqon
Copy link
Member Author

qinqon commented Apr 16, 2024

/hold cancel

Now is ready

@kubevirt-bot kubevirt-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 16, 2024
@qinqon
Copy link
Member Author

qinqon commented Apr 17, 2024

/retest

#
set -e

linter_image_tag="v0.0.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance this could be consumed from the ENV and fallback to 0.0.3 in case nothing is defined? I have in mind make promlint-check so that 0.0.3 would be defined in the Makefile as promlinter_version and passed down to this script

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense

Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
@mkowalski
Copy link
Member

/lgtm

@kubevirt-bot kubevirt-bot added the lgtm Indicates that a PR is ready to be merged. label Apr 17, 2024
@qinqon
Copy link
Member Author

qinqon commented Apr 17, 2024

/approve

@qinqon
Copy link
Member Author

qinqon commented Apr 17, 2024

/retest

@kubevirt-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qinqon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kubevirt-bot kubevirt-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 17, 2024
@qinqon
Copy link
Member Author

qinqon commented Apr 17, 2024

/retest

@kubevirt-bot kubevirt-bot merged commit bf889c0 into nmstate:main Apr 17, 2024
8 checks passed
mkowalski pushed a commit to mkowalski/kubernetes-nmstate that referenced this pull request Apr 19, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request Apr 24, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request Apr 30, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request May 8, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request May 22, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request May 23, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
metal-net-cloner-bot bot pushed a commit to openshift-networking/kubernetes-nmstate that referenced this pull request May 27, 2024
Now that nmstatectl is able to calculate some useful stats from network
configuration [1], we can bubble them up and expose them as k8s metrics so
k-nmstate users can digg on them using prometheus, graphana or the like.

This change add a new "Features" under nnce Status with the output of
`nmstatectl st` and also create a new deployment `nmstate-metrics` that
will gather the NNCEs features and reflecta that at a cluster wide
gaugue prometheus metric.

[1] nmstate/nmstate#2420

Signed-off-by: Enrique Llorente <ellorent@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. kind/enhancement lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants