Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry access logging: add cache #39472

Open
howardjohn opened this issue Jun 15, 2022 · 5 comments
Open

Telemetry access logging: add cache #39472

howardjohn opened this issue Jun 15, 2022 · 5 comments
Assignees
Labels
area/extensions and telemetry kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed

Comments

@howardjohn
Copy link
Member

In the old style access logs, we had caching of the generated config. In Telemetry, we lost this.

In real prod clusters, we saw 50% of Istiod CPU spent just on this part. It would be good to add a cache here.

I did some initial work here but got distracted. I might be able to complete but if anyone else wants to pick it up, even better.

@zirain
Copy link
Member

zirain commented Jun 16, 2022

I will take a look this.

@zirain zirain self-assigned this Jun 16, 2022
@zirain
Copy link
Member

zirain commented Jun 17, 2022

In real prod clusters, we saw 50% of Istiod CPU spent just on this part. It would be good to add a cache here.

can you share more details about this? what the cluster's size? how many workloads? how many telemetry resources in the clusters?

buildAccessLogFromTelemetry take 50% if istio beyond my understanding.

@howardjohn
Copy link
Member Author

can you share more details about this? what the cluster's size? how many workloads? how many telemetry resources in the clusters?

Just 1 single telemetry. 5000 proxies. Access logs config are EXTREMELY expensive to generate since its so verbose and so many fields (envoyproxy/envoy#21718)

I created a WIP in https://github.com/istio/istio/compare/master...howardjohn:pilot/telemetry-cache?expand=1 but its not very good -- its hard to handle the packages the way here. I think we may not want to cache it as part of Telemetries in model...

@zirain
Copy link
Member

zirain commented Jun 22, 2022

if you change provider from envoy-json to envoy, still have the same cpu cost?

can you share the pprof?

@zirain
Copy link
Member

zirain commented Jul 21, 2022

A long-term goal is ecds: add support for access log extensions

@istio-policy-bot istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jan 17, 2023
@zirain zirain added lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed and removed lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while labels Jan 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/extensions and telemetry kind/enhancement lifecycle/staleproof Indicates a PR or issue has been deemed to be immune from becoming stale and/or automatically closed
Projects
None yet
Development

No branches or pull requests

3 participants