gRPC client might not be properly balancing requests #4274

jpkrohling · 2021-10-27T07:14:19Z

As reported in the Jaeger issue tracker, the OpenTelemetry Collector might not be properly load balancing requests across Jaeger Collector replicas.

Reference: jaegertracing/jaeger#1678

Adaendra · 2023-12-02T09:25:49Z

The issue is still append with the version v0.88.0

harpunius · 2024-04-15T11:10:50Z

We also have this problem with version 0.92.0.

jpkrohling · 2024-04-22T13:27:38Z

I had some time to play with this today, and I confirm that the current version of the collector doesn't allow for a client to properly load-balance requests to backends. To reproduce this, here are the steps:

k3d cluster create
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
kubectl wait --for=condition=Available deployments/cert-manager -n cert-manager

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
kubectl wait --for=condition=Available deployments/opentelemetry-operator-controller-manager -n opentelemetry-operator-system

kubectl create -f https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.73.2/bundle.yaml
kubectl wait --for=condition=Available deployments/prometheus-operator -n default

kubectl apply -f https://gist.githubusercontent.com/jpkrohling/ddb7d0074fa7858602302897ad495f35/raw/8527ccf2842bb3ebba18c89bf4573ee713ddced3/resources-current.yaml
kubectl wait --for=condition=Available deployments/client-collector -n observability

kubectl port-forward -n observability service/client-collector 4317:4317
kubectl port-forward -n observability service/prometheus-operated 9090:9090

telemetrygen traces --otlp-insecure --rate 1000 --duration 5m

Then open http://localhost:9090/graph?g0.expr=sum%20by%20(instance)%20(rate(otelcol_receiver_accepted_spans%7B%7D%5B1m%5D))&g0.tab=0&g0.display_mode=stacked&g0.show_exemplars=0&g0.range_input=30m (this should open Prometheus with the query sum by (instance) (rate(otelcol_receiver_accepted_spans{}[1m])) and stacked graph).

You'll see that only one of the servers is receiving spans.

Then, deploy the patched Collector and run telemetrygen again:

kubectl apply -f https://gist.githubusercontent.com/jpkrohling/ddb7d0074fa7858602302897ad495f35/raw/8527ccf2842bb3ebba18c89bf4573ee713ddced3/resources-patched.yaml
kubectl wait --for=condition=Available deployments/client-collector -n observability

telemetrygen traces --otlp-insecure --rate 1000 --duration 5m

This should show that the three backend collectors are receiving their even share of load. I ran the tests in the reverse order in the following image, but for reference, here's what I see:

There's one bug on the OTLP Exporter validation routine that prevents this from working properly, but other than that, it should work when having a Collector config like this, as seen in the gist I used above:

    exporters:
      otlp:
        endpoint: dns:///server-collector-headless.observability:4317
        balancer_name: round_robin

Fixes #4274 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling added the bug Something isn't working label Oct 27, 2021

jpkrohling self-assigned this Oct 27, 2021

jpkrohling mentioned this issue Oct 27, 2021

distirbution of traces/span amongst collector jaegertracing/jaeger#1678

Closed

seankhliao mentioned this issue Sep 10, 2023

kuberesolver support for grpc #8381

Open

github-actions bot added the Stale label Oct 28, 2023

github-actions bot removed the Stale label Dec 11, 2023

jpkrohling mentioned this issue Apr 22, 2024

[exporter/otlp] Allow DNS scheme to be used in endpoint #10010

Merged

codeboten closed this as completed in #10010 Apr 22, 2024

codeboten pushed a commit that referenced this issue Apr 22, 2024

[exporter/otlp] Allow DNS scheme to be used in endpoint (#10010)

bb48107

Fixes #4274 Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gRPC client might not be properly balancing requests #4274

gRPC client might not be properly balancing requests #4274

jpkrohling commented Oct 27, 2021

Adaendra commented Dec 2, 2023

harpunius commented Apr 15, 2024

jpkrohling commented Apr 22, 2024

gRPC client might not be properly balancing requests #4274

gRPC client might not be properly balancing requests #4274

Comments

jpkrohling commented Oct 27, 2021

Adaendra commented Dec 2, 2023

harpunius commented Apr 15, 2024

jpkrohling commented Apr 22, 2024