Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[exporter/loadbalancing] otlp exporter fails to send Metrics to OTEL Collector backend loadbalancer #32026

Closed
nadavleva opened this issue Mar 28, 2024 · 4 comments
Labels
exporter/loadbalancing question Further information is requested

Comments

@nadavleva
Copy link

Component(s)

exporter/loadbalancing

What happened?

Description

I am trying the LoadBalancer exporter to learn the behavior and to apply OTEL HA setup.
I have OTEL Collector (data producer) running in Kubernetes Cluster with OTLP Exporter sending Logs and Metrics to backend OTEL.
The backend OTEL environment runs in docker-compose with the OTEL Collector (otel-backend) which receives traffic from
the data producer and send this traffic with Load Balancer OTLP Exporter for both Metric and Logs.
When checking the Metric behavior I get errors sending the Metric from the OTEL data producer to the OTEL backend, and no metrics are sent.
When running the Logs Resources, the Logs are sent without any issues or error.

Steps to Reproduce

  1. Send OTEL Metrics to OTEL Colelctor Backend
  2. OTEL Backend has a load balancer exporter that sends Metrics resources to three OTEL Collectors.

Expected Result

Metrics data is sent from OTEL to the backend OTEL Collector and distributed according to the load balancer.

Actual Result

Metrics resources are not sent to OTEL Backend and the following error is printed,

{"kind": "exporter", "data_type": "metrics", "name": "debug/1"}
2024-03-28T14:43:47.632Z error exporterhelper/retry_sender.go:126 Exporting failed. The error is not retryable. Dropping data. {"kind": "exporter", "data_type": "metrics", "name": "otlp/4568d48e-9c24-414e-97ec-3cf30282d9d7", "error": "Permanent error: rpc error: code = Unknown desc = unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name", "dropped_items": 116}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/retry_sender.go:126
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/metrics.go:170
go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).consume
go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/queue_sender.go:115
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue[...]).Consume
go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/internal/bounded_memory_queue.go:55
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*QueueConsumers[...]).Start.func1
go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/internal/consumers.go:43
2024-03-28T14:43:49.415Z info MetricsExporter {"kind": "exporter", "data_type": "metrics", "name": "debug/1", "resource metrics": 75, "metrics": 75, "data points": 75}

Collector version

opentelemetry-collector-contrib:0.91.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"

OpenTelemetry Collector configuration

extensions:
      health_check/1:
          endpoint: 0.0.0.0:13133
          tls:
              ca_file: "${OTEL_TLS_CA_FILE}"
              cert_file: "${OTEL_TLS_CERT_FILE}"
              key_file: "${OTEL_TLS_KEY_FILE}"
              reload_interval: 100000
          path: "/health/status"

      file_storage:
          directory: "${OTEL_FILE_STORAGE}"
          timeout: 1s
          compaction:
            on_start: true
            directory: /tmp/
            max_transaction_size: 65536

    receivers:
      otlp/ltm:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
            tls:
              client_ca_file: "${OTEL_TLS_CA_FILE}"
              cert_file: "${OTEL_TLS_CERT_FILE}"
              key_file: "${OTEL_TLS_KEY_FILE}"
              min_version: "1.2"
              reload_interval: 100000
          http:
            endpoint: 0.0.0.0:4318
            tls:
              client_ca_file: "${OTEL_TLS_CA_FILE}"
              cert_file: "${OTEL_TLS_CERT_FILE}"
              key_file: "${OTEL_TLS_KEY_FILE}"
              min_version: "1.2"
              reload_interval: 100000
      otlp/waf:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4319
            tls:
              client_ca_file: "${OTEL_TLS_CA_FILE}"
              cert_file: "${OTEL_TLS_CERT_FILE}"
              key_file: "${OTEL_TLS_KEY_FILE}"
              min_version: "1.2"
              reload_interval: 100000
          http:
            endpoint: 0.0.0.0:4320
            tls:
              client_ca_file: "${OTEL_TLS_CA_FILE}"
              cert_file: "${OTEL_TLS_CERT_FILE}"
              key_file: "${OTEL_TLS_KEY_FILE}"
              min_version: "1.2"
              reload_interval: 100000

    processors:
      memory_limiter:
        check_interval: 2ms
        limit_mib: 800
        spike_limit_mib: 150
      batch:

    exporters:
      otlp/4568d48e-9c24-414e-97ec-3cf30282d9d7:
        endpoint: 10.240.107.236:43117
        sending_queue:
            enabled: true
            queue_size: "5"
        retry_on_failure:
            enabled: true
            initial_interval: 2s
            max_elapsed_time: 6s
        tls:
            insecure: true
      #Enable on service pipeline for debugging output data/format
      file:
        path: /var/log/otlp.json

      #Console log verbosity levels: detailed|normal|basic
      debug:
        verbosity: basic
        sampling_initial: 0
        sampling_thereafter: 0

      debug/1:
        verbosity: detailed
    service:
      pipelines:
        metrics:
          receivers: [ otlp/ltm, otlp/waf]
          processors: [ memory_limiter, batch ]
          exporters: [debug/1, otlp/4568d48e-9c24-414e-97ec-3cf30282d9d7]
        logs:
          receivers: [ otlp/ltm, otlp/waf]
          processors: [ memory_limiter, batch ]
          exporters: [debug]

      extensions: [health_check/1, file_storage]

Log output

{"kind": "exporter", "data_type": "metrics", "name": "debug/1"}
2024-03-28T14:43:47.632Z	error	exporterhelper/retry_sender.go:126	Exporting failed. The error is not retryable. Dropping data.	{"kind": "exporter", "data_type": "metrics", "name": "otlp/4568d48e-9c24-414e-97ec-3cf30282d9d7", "error": "Permanent error: rpc error: code = Unknown desc = unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name; unable to get service name", "dropped_items": 116}
go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
	go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/retry_sender.go:126
go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
	go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/metrics.go:170
go.opentelemetry.io/collector/exporter/exporterhelper.(*queueSender).consume
	go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/queue_sender.go:115
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue[...]).Consume
	go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/internal/bounded_memory_queue.go:55
go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*QueueConsumers[...]).Start.func1
	go.opentelemetry.io/collector/exporter@v0.91.0/exporterhelper/internal/consumers.go:43
2024-03-28T14:43:49.415Z	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "debug/1", "resource metrics": 75, "metrics": 75, "data points": 75}

Additional context

Here is the backend OTEL Collector configuration

backend_otel_config.yaml.json

The docker-compose layout for the backend service runs 1 otel-collector as the backend and three Otel Collectors as the destination of the backend static load balancer

docker-compose.yaml.json

Some notes:

  1. The setup works well for Logs resources
  2. When changing the OTEL Collector (data producer) to send Metrics to one of the 3 OTEL Collectors (adomock containers) there are no errors in exporting Metrics
@nadavleva nadavleva added bug Something isn't working needs triage New item requiring triage labels Mar 28, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@crobert-1
Copy link
Member

This error is happening because the load balancing exporter is attempting to use the service.name resource attribute of the metric to route the metrics to their specified destination. (service.name is the default routing key for metrics when none is specified, as described in the README). However, your metrics must not have the service.name` resource attribute set, hitting this error.

Do you have a preferred routing key? If not, you could use metric which would simply use the metric name. If you want to keep default behavior, we'd have to investigate why the service.name resource attribute isn't included with your metrics.

You can use a debug exporter instead of the load balancing exporter to show detailed information on the metrics coming through to confirm the resource attribute is missing.

@crobert-1 crobert-1 added question Further information is requested and removed bug Something isn't working needs triage New item requiring triage labels Mar 28, 2024
@nadavleva
Copy link
Author

Thank you corbert, I tried your suggestion and it is working.

@crobert-1
Copy link
Member

Great to hear it! I'm going to close the issue for now, but feel free to reopen or file a new bug if something else comes up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
exporter/loadbalancing question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants