Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Otel collector fails to establish a connection to jaeger #33995

Closed
mpbnkg opened this issue Jul 9, 2024 · 12 comments
Closed

Otel collector fails to establish a connection to jaeger #33995

mpbnkg opened this issue Jul 9, 2024 · 12 comments
Labels
bug Something isn't working needs triage New item requiring triage

Comments

@mpbnkg
Copy link

mpbnkg commented Jul 9, 2024

Component(s)

No response

What happened?

When I try to export metrics to jaeger via OTLP collector it fails with the following error. What is causing this given the fact that it was working a few weeks back?

2024-07-09T18:35:39.864Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.25.0.2:4317: connect: connection refused"", "interval": "3.136185958s"}

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317" # OTLP gRPC receiver
      - "9090:9090" # Prometheus exporter port
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "6831:6831/udp" # UDP port for Jaeger agent
      - "16686:16686" # Web UI
      - "14268:14268" # HTTP port for spans

otel-collector-config.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: otel-collector:4317

processors:

extensions:
  health_check: {}

exporters:
  logging:
    loglevel: debug

  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      exporters: [logging]
    logs:
      receivers: [otlp]
      exporters: [logging]

Collector version

latest

Environment information

Environment

OS: MAC

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: otel-collector:4317

processors:

extensions:
  health_check: {}

exporters:
  logging:
    loglevel: debug

  otlp:
    endpoint: jaeger:4317
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      exporters: [logging]
    logs:
      receivers: [otlp]
      exporters: [logging]

Log output

2024-07-09T18:35:39.864Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 172.25.0.2:4317: connect: connection refused"", "interval": "3.136185958s"}

Additional context

No response

@mpbnkg mpbnkg added bug Something isn't working needs triage New item requiring triage labels Jul 9, 2024
@PedemonteGiacomo
Copy link

I also obtain the same error and I'm not be able to fix it

@mpbnkg
Copy link
Author

mpbnkg commented Jul 10, 2024

@PedemonteGiacomo It seems like it's an issue with the latest version. I'm sure this code was working a few weeks ago, but I haven't used it since then. When I re-ran it today, it started failing for reasons I can't understand. Hopefully, we will get a response from the maintainers soon.

@PedemonteGiacomo
Copy link

PedemonteGiacomo commented Jul 10, 2024

@mpbnkg Also for me it was working fine with the Jaeger:4317 declaration. The problem is that the connection refused error seems to be more Jaeger side, to try this, I've chaned the export to the endpoint Jaeger:14250 and the connection is reached but then if I send traces through the collector, the traces are dropped and not exported correctly.

@PedemonteGiacomo
Copy link

I've checked that even if I use older version of the image the error remain. So now my question is: is the error related to otel-collector?

@mpbnkg
Copy link
Author

mpbnkg commented Jul 11, 2024

@PedemonteGiacomo Hopefully, we'll get some answers soon here :)

@crobert-1
Copy link
Member

You may need to configure depends_on in your docker compose file. This may just be a timing issue of trying to connect to a container that's not ready yet.

For reference, I'm running the Collector demo which has a very similar format and setup to your example here, and it's working locally for me. You're welcome to try it out and see what could be the impacting differences if depends_on doesn't fix it.

@mpbnkg
Copy link
Author

mpbnkg commented Jul 12, 2024

@crobert-1 Same thing when I even added depends_on jaeger in the otel collector section! Error:

info    exporterhelper/retry_sender.go:118      Exporting failed. Will retry the request after interval.  {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 172.26.0.2:4317: connect: connection refused\"", "interval": "28.015591359s"}.

Docker-compose

services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317" # OTLP gRPC receiver
      - "9090:9090" # Prometheus exporter port
    depends_on:
      - jaeger
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "6831:6831/udp" # UDP port for Jaeger agent
      - "16686:16686" # Web UI
      - "14268:14268" # HTTP port for spans

@PedemonteGiacomo
Copy link

As indicated by the error, the depends_on directive isn't the root cause of the problem since the pod or container IP is recognized successfully. The issue seems to be that Jaeger is refusing the connection.

To troubleshoot, I will attempt to use an older version of Jaeger and will update you on the results.

Additionally, @crobert-1 has raised an interesting point: another possible cause of failure could be a missing configuration in our otel-collector setup. The relevant configuration code is:

exporters:
  zipkin:
    endpoint: "http://zipkin-all-in-one:9411/api/v2/spans"
    format: proto

  otlp:
    endpoint: jaeger-all-in-one:4317
    tls:
      insecure: true

You can review the complete configuration code here

Please let me know if you have any further insights or suggestions.

@mpbnkg
Copy link
Author

mpbnkg commented Jul 12, 2024

@PedemonteGiacomo Zipkin doesn't have to do anything with Jaeger. It's a completely different tool, so we don't need Zipkin for sure.

To troubleshoot, I will attempt to use an older version of Jaeger and will update you on the results.

Yep, let me know please, thanks!

@PedemonteGiacomo
Copy link

PedemonteGiacomo commented Jul 12, 2024

Yes @mpbnkg, you're absolutely right about zipkin, sorry for the confusion.

I changed the version of the jaeger image to an older one, more precisely, I've choose one image that was not pushed within the last 10 days to be sure to get older than the "born" of this issue that we are investigating on.

Immagine 2024-07-12 113623

And I get my traces succesfully received by jaeger:

Immagine 2024-07-12 113945

So, I think is a jager problem with the latest or one of the changes done in one of the latest jager-all-in-one image

@mpbnkg
Copy link
Author

mpbnkg commented Jul 12, 2024

Oh, wow... Look at that, thanks :) Marking this ticket as resolved.

@crobert-1
Copy link
Member

Thanks for following up, glad to hear this was figured out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage
Projects
None yet
Development

No branches or pull requests

3 participants