[spanmetrics connector] suddenly collector pod Evicted and unexpectedly increased 'calls_total' #31025

pingping95 · 2024-02-03T15:09:49Z

Component(s)

connector/spanmetrics

What happened?

Description

calls_total metric value rose sharply.

sum by (http_status_code) (rate(calls_total{service_name="my-app"}[1m]))

there was no called client and just received 'k8s health check probe' http call.

Steps to Reproduce

Expected Result

Actual Result

Collector version

0.92.0

Environment information

Environment

Amazon Linux 2
Kubernetes(EKS) 1.28

OpenTelemetry Collector configuration

config:

  # RECEIVER
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: ${env:MY_POD_IP}:4317
        http:
          endpoint: ${env:MY_POD_IP}:4318
    prometheus:
      config:
        scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
              - targets:
                  - ${env:MY_POD_IP}:8888

  # PROCESSOR
  processors:
    batch: {}
    memory_limiter:
      check_interval: 5s
      limit_percentage: 80
      spike_limit_percentage: 30

    filter:
      metrics:
        datapoint:
          - 'attributes["span.kind"] != "SPAN_KIND_SERVER"'


  # EXPORTER
  exporters:
    debug:
      verbosity: detailed

    prometheusremotewrite:
      endpoint: https://mimir.xxxxx.xxx/api/v1/push
      target_info:
        enabled: true
      external_labels:
        test: true

  # SPAN_KIND   : CLIENT, SERVER, INTERNAL, PUBLISHER, CONSUMER
  # STATUS_CODE : UNSET, OK, ERROR

  # CONNECTOR
  connectors:
    # 1. SERVER Metrics
    spanmetrics:
      histogram:
        explicit:
          buckets: [50ms, 100ms, 200ms, 500ms, 1s, 5s]
      aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
      metrics_flush_interval: 15s
      dimensions:
        - name: http.method
          default: GET
        - name: http.route
        - name: http.status_code
        - name: cluster

      exemplars:
        enabled: true
      events:
        enabled: true
        dimensions:
          - name: exception.type
          - name: exception.message
      resource_metrics_key_attributes:
        - service.name

  # EXTENSIONS
  extensions:
    health_check:
      endpoint: ${env:MY_POD_IP}:13133
    memory_ballast:
      size_in_percentage: 40

  # SERVICE
  service:
    extensions:
      - health_check
      - memory_ballast

    pipelines:
      traces:
        receivers:
          - otlp
        processors:
          - memory_limiter
          - batch
        exporters:
          - spanmetrics

      metrics:
        receivers:
          - spanmetrics
        processors:
          - memory_limiter
          - batch
          - filter
        exporters:
          - prometheusremotewrite
          - debug

Log output

No output

Additional context

I think this strange behavior cause is 'exemplars' settings ?

Could any advice about here?

I am plan to migrate from tempo metrics-generator to otel-collector`s spanmetrics connector.

load-balancing exporter -> spanmetrics connector otel collector
                                          -> tail_sampling otel collector

The text was updated successfully, but these errors were encountered:

github-actions · 2024-02-03T15:10:03Z

Pinging code owners:

connector/spanmetrics: @portertech

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2024-04-04T03:29:13Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

connector/spanmetrics: @portertech @Frapschen

See Adding Labels via Comments if you do not have permissions to add labels yourself.

pingping95 added bug Something isn't working needs triage New item requiring triage labels Feb 3, 2024

github-actions bot added the connector/spanmetrics label Feb 3, 2024

github-actions bot mentioned this issue Feb 6, 2024

Weekly Report: 2024-01-30 - 2024-02-06 #31055

Closed

This was referenced Feb 13, 2024

Weekly Report: 2024-02-06 - 2024-02-13 #31192

Closed

Weekly Report: 2024-02-13 - 2024-02-20 #31323

Closed

This was referenced Feb 20, 2024

Weekly Report: 2024-02-13 - 2024-02-20 asuresh4/opentelemetry-collector-contrib#11541

Open

Weekly Report: 2024-02-20 - 2024-02-27 #31422

Closed

Weekly Report: 2024-02-20 - 2024-02-27 asuresh4/opentelemetry-collector-contrib#11542

Open

This was referenced Mar 5, 2024

Weekly Report: 2024-02-27 - 2024-03-05 #31560

Closed

Weekly Report: 2024-02-27 - 2024-03-05 asuresh4/opentelemetry-collector-contrib#11543

Open

Weekly Report: 2024-03-05 - 2024-03-12 #31693

Closed

This was referenced Mar 19, 2024

Weekly Report: 2024-03-12 - 2024-03-19 #31825

Closed

Weekly Report: 2024-03-12 - 2024-03-19 asuresh4/opentelemetry-collector-contrib#11544

Open

Weekly Report: 2024-03-19 - 2024-03-26 #31947

Closed

github-actions bot mentioned this issue Apr 2, 2024

Weekly Report: 2024-03-26 - 2024-04-02 #32082

Closed

github-actions bot added the Stale label Apr 4, 2024

This was referenced Apr 9, 2024

Weekly Report: 2024-04-02 - 2024-04-09 #32230

Open

Weekly Report: 2024-04-09 - 2024-04-16 #32407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spanmetrics connector] suddenly collector pod Evicted and unexpectedly increased 'calls_total' #31025

[spanmetrics connector] suddenly collector pod Evicted and unexpectedly increased 'calls_total' #31025

pingping95 commented Feb 3, 2024 •

edited

github-actions bot commented Feb 3, 2024

github-actions bot commented Apr 4, 2024

[spanmetrics connector] suddenly collector pod Evicted and unexpectedly increased 'calls_total' #31025

[spanmetrics connector] suddenly collector pod Evicted and unexpectedly increased 'calls_total' #31025

Comments

pingping95 commented Feb 3, 2024 • edited

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Feb 3, 2024

github-actions bot commented Apr 4, 2024

pingping95 commented Feb 3, 2024 •

edited