Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional host_weight metric for per_endpoint_stats #33006

Closed
jrauschenbusch opened this issue Mar 20, 2024 · 5 comments
Closed

Additional host_weight metric for per_endpoint_stats #33006

jrauschenbusch opened this issue Mar 20, 2024 · 5 comments
Labels
area/stats enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently

Comments

@jrauschenbusch
Copy link

Title: Additional host_weight metric for per_endpoint_stats

Description:

For analyzing load balancing behavior it would be good to be able to have an host_weight metric per endpoint.

It's already possible to enable detailed endpoint metrics using track_cluster_stats. per_endpoint_stats. Just the calculated weight per endpoint is missing here.

Use case: In my scenario i observe a drop of throughput when a new host is added to the upstream cluster (scale-up). Despite using all best practices (active health checks, round robin LB with slow-start, pre-warmed HTTP handler, k8s readiness probes) they still appear. It seems that traffic of old cluster members is already decreased but the delta is not handled by the new host.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

@jrauschenbusch jrauschenbusch added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Mar 20, 2024
@jrauschenbusch
Copy link
Author

@ggreenway You've fortunately implemented the per_endpoint_stats feature. Is there an easy way to get an additional host_weight metric into this set?

@ggreenway
Copy link
Contributor

I don't think it would be difficult. We may want to add a config knob for additional host stats, given that enabling them already creates A LOT more total published metrics.

Then you can add another chunk of code like this one to publish:

// Add synthetic "healthy" gauge.

@ggreenway ggreenway added area/stats and removed triage Issue requires triage labels Mar 21, 2024
@jrauschenbusch
Copy link
Author

I know that this can lead to cardinality issues within a TSDB with even more and more metrics exposed. So definitely not a feature for a production environment. It's more about load tests on a non-production environment with a pre-defined set of envoy instances and upstream members.

Copy link

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 21, 2024
Copy link

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/stats enhancement Feature requests. Not bugs or questions. stale stalebot believes this issue/PR has not been touched recently
Projects
None yet
Development

No branches or pull requests

2 participants