Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8sattributesprocessor missing to consistently fetch pod attributes #13119

Closed
rajatvig opened this issue Aug 9, 2022 · 12 comments
Closed

k8sattributesprocessor missing to consistently fetch pod attributes #13119

rajatvig opened this issue Aug 9, 2022 · 12 comments
Assignees
Labels
bug Something isn't working priority:p2 Medium processor/k8sattributes k8s Attributes processor

Comments

@rajatvig
Copy link

rajatvig commented Aug 9, 2022

Describe the bug
A clear and concise description of what the bug is.

Seems like post 0.55.0 release of the Collector, the k8sattributesprocessor is behaving a little inconsistently when it comes to capturing resource metrics from pods. It was working fine in the 0.54.0 release.

Looking at the commit log between the 2 releases - #8465 - is the only one I can see that could have changed the behaviour but I am unable to pin point at what exactly is wrong.

For the a deployment with 2 pods, one collector always looks up the metadata for one pod and the other pod goes to another collector but does not get anything.

Restarting the collector pod or the deployment pod moves the problem to another collector pod and it just keeps bouncing that way.

Turning on the debugging logs, I do see the requests land at the collector but it is not able to get it from the in memory list. RBAC is fine as some pods are getting the attributes and the configuration did not change between the releases.

Tried adding pod_association rules but that did not help either.

What version did you use?
Version: 0.55.0

We upgraded to 0.57.2 and it has not helped though the misses are slightly fewer.

What config did you use?

"k8sattributes":
  "auth_type": "serviceAccount"
  "passthrough": false

Environment
Using it on GKE as vanilla upstream docker image.

@rajatvig rajatvig added the bug Something isn't working label Aug 9, 2022
@dmitryax
Copy link
Member

dmitryax commented Aug 9, 2022

Hi @rajatvig. Thanks for reporting the issue.

It's possible that the issue is caused by #8465 . It changed representation of the internal cache keys.

@rajatvig if you can provide any more guidance on how to reproduce the issue, that would be awesome.

@sumo-drosiek please let us know if you have a chance to look into this. Otherwise I can take it.

@dmitryax dmitryax added priority:p2 Medium processor/k8sattributes k8s Attributes processor labels Aug 9, 2022
@rajatvig
Copy link
Author

rajatvig commented Aug 9, 2022

@dmitryax What we have a very straightforward Otel Collector Setup with Collectors running as DaemonSets with passThrough on, which send to collectors running as a Gateway that add the rest. The RBAC is as documentation and there is very little load in some clusters.

Podinfo is the sample application we have deployed and we see misses for one of the two pods almost all the time until we restart the collector or the pod.

The configuration as I noted is just

"k8sattributes":
  "auth_type": "serviceAccount"
  "passthrough": false

I suspect it is the internal cache as I see the requests for both the pods augmentation come (in different collectors) and one is always found and the other never is.

For larger clusters, the miss rate is higher and is spread across all gateway collector pods.

@dmitryax
Copy link
Member

dmitryax commented Aug 9, 2022

Thank you for providing the details, @rajatvig . I will take a look into this

@dmitryax dmitryax self-assigned this Aug 9, 2022
@sumo-drosiek
Copy link
Member

@sumo-drosiek please let us know if you have a chance to look into this. Otherwise I can take it.

@dmitryax I was off until Today and will be off next week, so not much time to investigate :/
Thank you for looking into this

@hanikesn
Copy link

hanikesn commented Sep 8, 2022

I can confirm this issue and we had to convert back to 0.54 as probably 2/3 off our traces didn't have k8s attributes attached.

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 10, 2022
@dmitryax
Copy link
Member

@rajatvig @hanikesn can you please check if you still experience the issue in the latest collector versions?

@dmitryax dmitryax removed the Stale label Nov 10, 2022
@rajatvig
Copy link
Author

This is fixed on our end.

@iateya
Copy link

iateya commented Nov 25, 2022

@rajatvig
Which collector version did you test the fix on?
I am still seeing the issue with version 0.64 and 0.66

@rajatvig
Copy link
Author

We are on 0.64 now and the issue isn't there anymore.

@trexx
Copy link

trexx commented Mar 1, 2023

Hi there, we are on 0.71 of the collectors, and I see this behaviour.
In our Kubernetes cluster we have collectors as sidecars to our workloads which forward on to cluster collectors which do the enrichment.
I can see sometimes the k8s.pod.ip resource attribute is the only attribute that gets added, leading me to assume that the cluster collectors which do the enrichment is failing.

We can see this in Grafana by doing something like: target{k8s_namespace_name="",k8s_pod_ip!=""} and it results in a graph like this.
image

Gaps in the image is where that metric was enriched by the collectors and had the namespace attribute populated. I also thought it could be the collector hitting resource limits but its far from reaching them.

@cbertinato
Copy link

This issue appears to crop up even for the latest versions. Should this be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium processor/k8sattributes k8s Attributes processor
Projects
None yet
Development

No branches or pull requests

7 participants