New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kubelet: CAdvisor sometimes injects old Pod metrics from an old namespaces #85852
kubelet: CAdvisor sometimes injects old Pod metrics from an old namespaces #85852
Conversation
@@ -317,6 +317,8 @@ func removeTerminatedContainerInfo(containerInfo map[string]cadvisorapiv2.Contai | |||
podRef: buildPodRef(cinfo.Spec.Labels), | |||
containerName: kubetypes.GetContainerName(cinfo.Spec.Labels), | |||
} | |||
// Clear the UID since the container can be created in a new namespace. | |||
cinfoID.podRef.UID = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so buildPodRef will have set the podRef.UID , we then strip it, wont every result in infos in ListPodCPUAndMemoryStats have an empty UID?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
which if i look at this code, makes me think the pod ref in pod stats will always have an empty uid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok since removeTerminatedContainerInfo
only sets up the slice to iterate around. This line iterates around the active containers, and initializes podToStats
which contains the correct UID to the container namespace.
for key, cinfo := range infos { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah ok, i missed that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah missed that! thanks for clarifiying.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: derekwaynecarr, rphillips The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
fyi @dashpole |
Why does creating a container in a new runtime namespace change the pod UID? |
@dashpole that isn't the pod uid... it is the container namespace UID. |
Same logic that is in the cri stats provider:
|
@rphillips I can't seem to figure out why that would be the case... I sshed into a node in a running cluster, and verified with /hold |
No problem. Looking into this. note: we are running crio. |
What type of PR is this?
/kind bug
What this PR does / why we need it:
The CAdvisor stats backend has a bug where it does not reflect pods terminating in an initial namespace and then being created in a new namespace. This is due to
cinfoID.podRef.UID
being used within the map key {PodName, PodNamespace, PodUID}. This tuple for the map key will erroneously inject a newly created pod in a different runtime namespace (UID) and the terminated pod in the initial namespace into the list causing duplicate metrics to be returned.Which issue(s) this PR fixes:
500 errors in /metrics endpoint
https://bugzilla.redhat.com/show_bug.cgi?id=1748073
Special notes for your reviewer:
/cc @kubernetes/sig-node-pr-reviews
/cc @Random-Liu @derekwaynecarr @sjenning
Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: