Additional `host_weight` metric for per_endpoint_stats #33006

jrauschenbusch · 2024-03-20T16:15:33Z

Title: Additional host_weight metric for per_endpoint_stats

Description:

For analyzing load balancing behavior it would be good to be able to have an host_weight metric per endpoint.

It's already possible to enable detailed endpoint metrics using track_cluster_stats. per_endpoint_stats. Just the calculated weight per endpoint is missing here.

Use case: In my scenario i observe a drop of throughput when a new host is added to the upstream cluster (scale-up). Despite using all best practices (active health checks, round robin LB with slow-start, pre-warmed HTTP handler, k8s readiness probes) they still appear. It seems that traffic of old cluster members is already decreased but the delta is not handled by the new host.

[optional Relevant Links:]

Any extra documentation required to understand the issue.

The text was updated successfully, but these errors were encountered:

jrauschenbusch · 2024-03-21T13:34:02Z

@ggreenway You've fortunately implemented the per_endpoint_stats feature. Is there an easy way to get an additional host_weight metric into this set?

ggreenway · 2024-03-21T15:41:51Z

I don't think it would be difficult. We may want to add a config knob for additional host stats, given that enabling them already creates A LOT more total published metrics.

Then you can add another chunk of code like this one to publish:

envoy/source/common/upstream/host_utility.cc

Line 228 in c3da130

// Add synthetic "healthy" gauge.

jrauschenbusch · 2024-03-22T06:22:15Z

I know that this can lead to cardinality issues within a TSDB with even more and more metrics exposed. So definitely not a feature for a production environment. It's more about load tests on a non-production environment with a pre-defined set of envoy instances and upstream members.

github-actions · 2024-04-21T08:01:07Z

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions · 2024-04-28T08:01:26Z

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

jrauschenbusch added enhancement Feature requests. Not bugs or questions. triage Issue requires triage labels Mar 20, 2024

ggreenway added area/stats and removed triage Issue requires triage labels Mar 21, 2024

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Apr 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional `host_weight` metric for per_endpoint_stats #33006

Additional `host_weight` metric for per_endpoint_stats #33006

jrauschenbusch commented Mar 20, 2024

jrauschenbusch commented Mar 21, 2024

ggreenway commented Mar 21, 2024

jrauschenbusch commented Mar 22, 2024

github-actions bot commented Apr 21, 2024

github-actions bot commented Apr 28, 2024

Additional host_weight metric for per_endpoint_stats #33006

Additional host_weight metric for per_endpoint_stats #33006

Comments

jrauschenbusch commented Mar 20, 2024

jrauschenbusch commented Mar 21, 2024

ggreenway commented Mar 21, 2024

jrauschenbusch commented Mar 22, 2024

github-actions bot commented Apr 21, 2024

github-actions bot commented Apr 28, 2024

Additional `host_weight` metric for per_endpoint_stats #33006

Additional `host_weight` metric for per_endpoint_stats #33006