Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[stats, xds] disable per-resource config update stats tracking, add per-API update_duration tracking (if necessary). #23979

Open
stevenzzzz opened this issue Nov 14, 2022 · 10 comments
Labels
area/perf area/stats enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue

Comments

@stevenzzzz
Copy link
Contributor

Title: disable per-resource update_duration tracking, add per-API update_duration tracking (if necessary).

Description: right now there is a histogram tracking per-resource update duration, i.e. how long it takes Envoy to consume and update a resource's config.

This stat is of less value when there are tons of resources in Envoy, what matters more is how long does Envoy take to load each discoveryResponse.

OTOH, histograms are known to consume lots of RAM and CPU.

We should allow users to disable per-resource update-duration tracking (with a runtime flag) and maybe add a per-API level update-duration tracking instead.

@stevenzzzz stevenzzzz added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Nov 14, 2022
@stevenzzzz
Copy link
Contributor Author

@jmarantz @pradeepcrao

@stevenzzzz
Copy link
Contributor Author

@yanavlasov

@stevenzzzz stevenzzzz changed the title [stats, xds] disable per-resource update_duration tracking, add per-API update_duration tracking (if necessary). [stats, xds] disable per-resource config update stats tracking, add per-API update_duration tracking (if necessary). Nov 14, 2022
@zuercher zuercher added area/perf area/stats and removed triage Issue requires triage labels Nov 14, 2022
@stevenzzzz
Copy link
Contributor Author

@adisuissa could you no-stalebot this issue?

@adisuissa adisuissa added the no stalebot Disables stalebot from closing an issue label Dec 14, 2022
@pradeepcrao
Copy link
Contributor

Hi @stevenzzzz are you talking about this histogram? @adisuissa, is my understanding correct that we create only one Subscription object for every type of resource config and not for every instance of a resource? Are we expecting to see very many of these created in an Envoy process?

@adisuissa
Copy link
Contributor

Hi @stevenzzzz are you talking about this histogram? @adisuissa, is my understanding correct that we create only one Subscription object for every type of resource config and not for every instance of a resource? Are we expecting to see very many of these created in an Envoy process?

Depends on the resource type. For CDS and LDS for example, there will be a single subscription.
For EDS for example there will be multiple subscription objects (one for each cluster).

@pradeepcrao
Copy link
Contributor

Thanks Adi, makes sense.

@stevenzzzz
Copy link
Contributor Author

yes.
For RDS/EDS/VHDS or any "leaf XDS", it's per-resource, which is not ideal, and probably less meaningful.

Instead, I think we should trace the performance at "per-XDS response level".

@adisuissa
Copy link
Contributor

Instead, I think we should trace the performance at "per-XDS response level".

I'm not sure if this claim is correct, IIUC the "per-xDS response level". For example, if Envoy has 3 routes, and only 2 are being constantly updated, then if we just update when was the last RDS update that came, we lose the information on when the other route was updated.

@stevenzzzz
Copy link
Contributor Author

I am talking about the update_duration etc, which I think is very expensive and not very useful to a cloud proxy. I think we can have an option to allow user to choose which to track: per-resource level or "per-response" level.

@adisuissa
Copy link
Contributor

I am talking about the update_duration etc, which I think is very expensive and not very useful to a cloud proxy. I think we can have an option to allow user to choose which to track: per-resource level or "per-response" level.

Ah, thanks for the clarification. I read the title as "disable per-resource config update stats tracking", which I understood as more generic than just update_duration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/perf area/stats enhancement Feature requests. Not bugs or questions. no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

4 participants