k8sattributesprocessor missing to consistently fetch pod attributes #13119

rajatvig · 2022-08-09T09:56:48Z

Describe the bug
A clear and concise description of what the bug is.

Seems like post 0.55.0 release of the Collector, the k8sattributesprocessor is behaving a little inconsistently when it comes to capturing resource metrics from pods. It was working fine in the 0.54.0 release.

Looking at the commit log between the 2 releases - #8465 - is the only one I can see that could have changed the behaviour but I am unable to pin point at what exactly is wrong.

For the a deployment with 2 pods, one collector always looks up the metadata for one pod and the other pod goes to another collector but does not get anything.

Restarting the collector pod or the deployment pod moves the problem to another collector pod and it just keeps bouncing that way.

Turning on the debugging logs, I do see the requests land at the collector but it is not able to get it from the in memory list. RBAC is fine as some pods are getting the attributes and the configuration did not change between the releases.

Tried adding pod_association rules but that did not help either.

What version did you use?
Version: 0.55.0

We upgraded to 0.57.2 and it has not helped though the misses are slightly fewer.

What config did you use?

"k8sattributes":
  "auth_type": "serviceAccount"
  "passthrough": false

Environment
Using it on GKE as vanilla upstream docker image.

dmitryax · 2022-08-09T18:33:20Z

Hi @rajatvig. Thanks for reporting the issue.

It's possible that the issue is caused by #8465 . It changed representation of the internal cache keys.

@rajatvig if you can provide any more guidance on how to reproduce the issue, that would be awesome.

@sumo-drosiek please let us know if you have a chance to look into this. Otherwise I can take it.

rajatvig · 2022-08-09T19:09:37Z

@dmitryax What we have a very straightforward Otel Collector Setup with Collectors running as DaemonSets with passThrough on, which send to collectors running as a Gateway that add the rest. The RBAC is as documentation and there is very little load in some clusters.

Podinfo is the sample application we have deployed and we see misses for one of the two pods almost all the time until we restart the collector or the pod.

The configuration as I noted is just

"k8sattributes":
  "auth_type": "serviceAccount"
  "passthrough": false

I suspect it is the internal cache as I see the requests for both the pods augmentation come (in different collectors) and one is always found and the other never is.

For larger clusters, the miss rate is higher and is spread across all gateway collector pods.

dmitryax · 2022-08-09T20:12:46Z

Thank you for providing the details, @rajatvig . I will take a look into this

sumo-drosiek · 2022-08-11T06:33:07Z

@sumo-drosiek please let us know if you have a chance to look into this. Otherwise I can take it.

@dmitryax I was off until Today and will be off next week, so not much time to investigate :/
Thank you for looking into this

hanikesn · 2022-09-08T13:40:23Z

I can confirm this issue and we had to convert back to 0.54 as probably 2/3 off our traces didn't have k8s attributes attached.

github-actions · 2022-11-10T03:53:04Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/k8sattributes: @owais @dmitryax

See Adding Labels via Comments if you do not have permissions to add labels yourself.

dmitryax · 2022-11-10T05:52:04Z

@rajatvig @hanikesn can you please check if you still experience the issue in the latest collector versions?

rajatvig · 2022-11-10T08:31:58Z

This is fixed on our end.

iateya · 2022-11-25T15:31:09Z

@rajatvig
Which collector version did you test the fix on?
I am still seeing the issue with version 0.64 and 0.66

rajatvig · 2022-11-25T17:28:51Z

We are on 0.64 now and the issue isn't there anymore.

trexx · 2023-03-01T15:41:37Z

Hi there, we are on 0.71 of the collectors, and I see this behaviour.
In our Kubernetes cluster we have collectors as sidecars to our workloads which forward on to cluster collectors which do the enrichment.
I can see sometimes the k8s.pod.ip resource attribute is the only attribute that gets added, leading me to assume that the cluster collectors which do the enrichment is failing.

We can see this in Grafana by doing something like: target{k8s_namespace_name="",k8s_pod_ip!=""} and it results in a graph like this.

Gaps in the image is where that metric was enriched by the collectors and had the namespace attribute populated. I also thought it could be the collector hitting resource limits but its far from reaching them.

cbertinato · 2024-02-08T17:28:53Z

This issue appears to crop up even for the latest versions. Should this be closed?

rajatvig added the bug Something isn't working label Aug 9, 2022

dmitryax added priority:p2 Medium processor/k8sattributes k8s Attributes processor labels Aug 9, 2022

dmitryax self-assigned this Aug 9, 2022

evantorrie mentioned this issue Aug 31, 2022

[processor/k8sattributes] agent passthrough, gateway k8s.pod.ip configuration broken post v0.55.0 release #13765

Closed

sumo-drosiek mentioned this issue Sep 13, 2022

fix(k8sprocessor): check every association for eventual update #14097

Merged

github-actions bot added the Stale label Nov 10, 2022

dmitryax removed the Stale label Nov 10, 2022

rajatvig closed this as completed Nov 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k8sattributesprocessor missing to consistently fetch pod attributes #13119

k8sattributesprocessor missing to consistently fetch pod attributes #13119

rajatvig commented Aug 9, 2022

dmitryax commented Aug 9, 2022 •

edited

rajatvig commented Aug 9, 2022 •

edited

dmitryax commented Aug 9, 2022

sumo-drosiek commented Aug 11, 2022

hanikesn commented Sep 8, 2022

github-actions bot commented Nov 10, 2022

dmitryax commented Nov 10, 2022

rajatvig commented Nov 10, 2022

iateya commented Nov 25, 2022

rajatvig commented Nov 25, 2022

trexx commented Mar 1, 2023 •

edited

cbertinato commented Feb 8, 2024

k8sattributesprocessor missing to consistently fetch pod attributes #13119

k8sattributesprocessor missing to consistently fetch pod attributes #13119

Comments

rajatvig commented Aug 9, 2022

dmitryax commented Aug 9, 2022 • edited

rajatvig commented Aug 9, 2022 • edited

dmitryax commented Aug 9, 2022

sumo-drosiek commented Aug 11, 2022

hanikesn commented Sep 8, 2022

github-actions bot commented Nov 10, 2022

dmitryax commented Nov 10, 2022

rajatvig commented Nov 10, 2022

iateya commented Nov 25, 2022

rajatvig commented Nov 25, 2022

trexx commented Mar 1, 2023 • edited

cbertinato commented Feb 8, 2024

dmitryax commented Aug 9, 2022 •

edited

rajatvig commented Aug 9, 2022 •

edited

trexx commented Mar 1, 2023 •

edited