-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problematic dimension handling of network metrics for host network pods/interfaces #2615
Comments
Completely agree that the current behavior is problematic. We might be able to do this by inspecting the container definition for different container runtimes. E.g. for docker, we can tell if it is using hostnetwork: cadvisor/container/docker/handler.go Line 75 in 65fa5b4
|
I don't think we can do it generically, so it would be a per-runtime-integration change |
The only other consideration is backwards-compatibility... This makes sense for prometheus metrics, but i'm not sure if it makes sense for the summary API. I guess we could either re-insert node-level usage for pods that have |
I do agree on backwards-compatibility. My preference would be the first, as the later would still violate the "sum of all must make sense" rule. |
Wouldn't it be possible to simply add a new label (container_network_type) instead of a new metric ? It would then be at least easy to filter the pods using host network. |
A label to distinguish between host network pods and normal as proposed by @Jean-Daniel would already make things easier. Currently, I distinguish between the two kinds of Pods as follows (Prometheus):
The above calculates the total receive data rate of the cluster. The first line represents the node network interfaces (that are used by the host network Pods). The The second line represents the normal Pods (i.e. non host network). It happens in my cluster that all normal Pods have an This solution is certainly not generally applicable since network interfaces may be named differently in different clusters, or there may be multiple network interfaces per Pod. So, a label indicating whether a Pod is in the host network or not would already make things easier. However, this still wouldn't meet the "sum must make sense" principle mentioned by @brancz. So, omitting the host network Pods from the metric entirely and having only a single time series for the entire network interface of a node might make sense. |
Pods with A solution is to sum up all the network interfaces by the pod but I'm unsure if that is a correct way to handle a pod with |
google#2615 mentioned a WAI cAdvisor behavior for pods with `hostNetwork: true`. It should be mentioned in doc to avoid people spending days to do test and searching and then finally find it in issue list.
Prometheus convention is that any sum over a metric's dimensions should make sense. Metrics exposed by cAdvisor violate this in multiple occurrences, but here I would like to focus on one specific one: Kubernetes Pod's with
hostNetwork: true
.The particularly confusing result of this is that the sum of all container's network traffic accounts in multiples for any container/pod with host network enabled.
I don't know cAdvisor's relationship with Kubernetes well enough to be able to say whether cAdvisor even knows about this, but a potential solution could be to exclude metrics for those that use host networking, and have 1 separate set of series just for host networking (or leave that up to an entirely separate component like node_exporter).
@dashpole
The text was updated successfully, but these errors were encountered: