Skip to content

Performance: high control plane cost to handle endpoint churn #11293

@howardjohn

Description

@howardjohn

Pods scaling up/down is a pretty common task so ought to be handled efficiently. However due to the snapshotPerClient we build a full XdsSnapWrapper on each change per each connected client.

If we have 10 XDS connections, and 100 services with 10 endpoints each, with say 1 pod churn/s this can get expensive. Each pod change will recompute 100 clusters and 100 CLAs with 10 endpoints each, for each connection, each second. So we end up with querying 20,000 entries from krt per second.

On pprof this looks like (rest is all GC):

Image

Metadata

Metadata

Assignees

Labels

staleIssues that are stale. These will not be prioritized without further engagement on the issue.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions