Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC client might not be properly balancing requests #4274

Closed
jpkrohling opened this issue Oct 27, 2021 · 3 comments · Fixed by #10010
Closed

gRPC client might not be properly balancing requests #4274

jpkrohling opened this issue Oct 27, 2021 · 3 comments · Fixed by #10010
Assignees
Labels
bug Something isn't working

Comments

@jpkrohling
Copy link
Member

As reported in the Jaeger issue tracker, the OpenTelemetry Collector might not be properly load balancing requests across Jaeger Collector replicas.

Reference: jaegertracing/jaeger#1678

@Adaendra
Copy link

Adaendra commented Dec 2, 2023

The issue is still append with the version v0.88.0

@github-actions github-actions bot removed the Stale label Dec 11, 2023
@harpunius
Copy link

We also have this problem with version 0.92.0.

@jpkrohling
Copy link
Member Author

I had some time to play with this today, and I confirm that the current version of the collector doesn't allow for a client to properly load-balance requests to backends. To reproduce this, here are the steps:

k3d cluster create
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.14.4/cert-manager.yaml
kubectl wait --for=condition=Available deployments/cert-manager -n cert-manager

kubectl apply -f https://github.com/open-telemetry/opentelemetry-operator/releases/latest/download/opentelemetry-operator.yaml
kubectl wait --for=condition=Available deployments/opentelemetry-operator-controller-manager -n opentelemetry-operator-system

kubectl create -f https://github.com/prometheus-operator/prometheus-operator/releases/download/v0.73.2/bundle.yaml
kubectl wait --for=condition=Available deployments/prometheus-operator -n default

kubectl apply -f https://gist.githubusercontent.com/jpkrohling/ddb7d0074fa7858602302897ad495f35/raw/8527ccf2842bb3ebba18c89bf4573ee713ddced3/resources-current.yaml
kubectl wait --for=condition=Available deployments/client-collector -n observability

kubectl port-forward -n observability service/client-collector 4317:4317
kubectl port-forward -n observability service/prometheus-operated 9090:9090

telemetrygen traces --otlp-insecure --rate 1000 --duration 5m

Then open http://localhost:9090/graph?g0.expr=sum%20by%20(instance)%20(rate(otelcol_receiver_accepted_spans%7B%7D%5B1m%5D))&g0.tab=0&g0.display_mode=stacked&g0.show_exemplars=0&g0.range_input=30m (this should open Prometheus with the query sum by (instance) (rate(otelcol_receiver_accepted_spans{}[1m])) and stacked graph).

You'll see that only one of the servers is receiving spans.

Then, deploy the patched Collector and run telemetrygen again:

kubectl apply -f https://gist.githubusercontent.com/jpkrohling/ddb7d0074fa7858602302897ad495f35/raw/8527ccf2842bb3ebba18c89bf4573ee713ddced3/resources-patched.yaml
kubectl wait --for=condition=Available deployments/client-collector -n observability

telemetrygen traces --otlp-insecure --rate 1000 --duration 5m

This should show that the three backend collectors are receiving their even share of load. I ran the tests in the reverse order in the following image, but for reference, here's what I see:

image

There's one bug on the OTLP Exporter validation routine that prevents this from working properly, but other than that, it should work when having a Collector config like this, as seen in the gist I used above:

    exporters:
      otlp:
        endpoint: dns:///server-collector-headless.observability:4317
        balancer_name: round_robin

codeboten pushed a commit that referenced this issue Apr 22, 2024
Fixes #4274

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants