[exporter/datadogexporter] Wrong ddtags on logs #17398

jmichalek132 · 2023-01-05T17:06:57Z

Component(s)

exporter/datadog

What happened?

Description

A certain amount of log lines have datadog tags containing metadata of the wrong k8s pod when batching is enabled.
The problem lies on this line:

opentelemetry-collector-contrib/exporter/datadogexporter/internal/logs/sender.go

Line 73 in b2d2792

tags := datadog.PtrString(payload[0].GetDdtags())

We copy the attributes only from the first item in the payload and set them as datadog tags for all the log lines in the payload. However, when we enable batching the payload can contain logs from multiple different pods. When this happens all the share the same tags even though they are from different pods making the tags incorrect.

By setting a breakpoint inside this function we can inspect the payload sent and verify the issue.

opentelemetry-collector-contrib/exporter/datadogexporter/internal/logs/sender.go

Line 79 in b2d2792

_, r, err := s.api.SubmitLog(ctx, payload, opts)

Snippet containing the function

func (a *LogsApi) SubmitLog(ctx _context.Context, body []HTTPLogItem, o ...SubmitLogOptionalParameters) (interface{}, *_nethttp.Response, error) {
	req, err := a.buildSubmitLogRequest(ctx, body, o...)
	if err != nil {
		var localVarReturnValue interface{}
		return localVarReturnValue, nil, err
	}

	return a.submitLogExecute(req)
}

This screenshot shows the issue when inspecting the payload sent using a debugger.

We have body containing logs from different pods and different ddtags and the o which contains only one set of ddtags. The o one seems to be preferred.

Steps to Reproduce

We ran otel collector as daemonset to collect logs traces and metrics and use otlp to send it to another otel collector ran as deployment for additional processing.

Expected Result

Properties such as pod name in tags match the pod name in attributes.

Actual Result

Example log line exported from datadog ui after ingestion:
You can see for example that the pod name in tags doesn't match the pod name in attributes. The pod name in attributes is the correct one.

{
  "id": "AgAAAYWBW3GXb6Tm_wAAAAAAAAAYAAAAAEFZV0JXNHhwQUFCdE9ZbW5xcGY0SVFBSQAAACQAAAAAMDE4NTgxNWItYWE5Ny00MGUxLTgyNTUtOGU2MWUwZWZiZGYx",
  "content": {
    "timestamp": "2023-01-05T09:55:31.863Z",
    "tags": [
      "container_id:31019c6707ad82a41d7ce4464ddaf81c532c7f9680cbc3d114dad0a97508ecfe",
      "source:undefined",
      "image_name:censored",
      "image_tag:v1.22.6-eksbuild.1",
      "kube_container_name:kube-proxy",
      "kube_namespace:kube-system",
      "otel_source:datadog_exporter",
      "container_name:kube-proxy",
      "pod_name:kube-proxy-8rfkc",
      "env:dev",
      "region:eu1"
    ],
    "host": "i-0c3c96c055c6ec25c",
    "service": "aws-cluster-autoscaler",
    "message": "I0105 09:55:31.863856       1 scale_down.go:918] No candidates for scale down",
    "attributes": {
      "hostname": "censored",
      "@timestamp": "2023-01-05T09:55:31Z",
      "container_name": "aws-cluster-autoscaler",
      "log": {
        "iostream": "stderr",
        "file": {
          "path": "/var/log/pods/kube-system_cluster-autoscaler-bc567684b-stcgs_1e69f29c-474d-4c2a-b0eb-df9c721f5973/aws-cluster-autoscaler/4.log"
        }
      },
      "service": "aws-cluster-autoscaler",
      "otel": {
        "timestamp": "1672912531863900731"
      },
      "time": "2023-01-05T09:55:31.863900731Z",
      "logtag": "F",
      "pod_name": "cluster-autoscaler-bc567684b-stcgs",
      "status": ""
    }
  }
}

Collector version

v0.67.0

Environment information

Environment

OS: (e.g., "Ubuntu 20.04")
Compiler(if manually compiled): (e.g., "go 14.2")

k8s version: Server Version: version.Info{Major:"1", Minor:"23+", GitVersion:"v1.23.13-eks-fb459a0", GitCommit:"55bd5d5cb7d32bc35e4e050f536181196fb8c6f7", GitTreeState:"clean", BuildDate:"2022-10-24T20:35:40Z", GoVersion:"go1.17.13", Compiler:"gc", Platform:"linux/amd64"}
Helm chart used

OpenTelemetry Collector configuration

# Configuration of opentelemetry collector running as deployment
    exporters:
      datadog:
        api:
          key: ${DD_API_KEY}
          site: datadoghq.eu
        host_metadata:
          enabled: true
          hostname_source: first_resource
        metrics:
          resource_attributes_as_tags: true
        traces:
          span_name_as_resource_name: false
      logging:
        loglevel: info
    extensions:
      health_check: {}
      memory_ballast:
        size_in_percentage: 20
      zpages: {}
    processors:
      batch:
        send_batch_size: 5
        timeout: 10s
      batch/logs:
        send_batch_max_size: 1000
        send_batch_size: 100
        timeout: 10s
      memory_limiter:
        check_interval: 5s
        limit_percentage: 80
        spike_limit_percentage: 25
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    service:
      extensions:
      - health_check
      - memory_ballast
      - zpages
      pipelines:
        logs:
          exporters:
          - datadog
          processors:
          - batch
          receivers:
          - otlp
      telemetry:
        metrics:
          address: 0.0.0.0:8888
---
# Configuration of opentelemetry collector running as daemonset
    exporters:
      logging: {}
      otlp:
        endpoint: opentelemetry-cluster:4317
        tls:
          insecure: true
    extensions:
      health_check: {}
      k8s_observer:
        node: ${K8S_NODE_NAME}
        observe_nodes: true
        observe_pods: true
      memory_ballast:
        size_in_percentage: 20
        size_mib: "819"
      zpages: {}
    processors:
      batch:
        send_batch_max_size: 200
        send_batch_size: 10
        timeout: 10s
      memory_limiter:
        check_interval: 5s
        limit_mib: 1638
        spike_limit_mib: 512
      resourcedetection/eks:
        detectors:
        - eks
        override: false
        timeout: 2s
    receivers:
      filelog:
        exclude:
        - /var/log/pods/**/opentelemetry-collector/*.log
        - /var/log/pods/**/linkerd-proxy/*.log
        - /var/log/pods/datadog_*/**/*.log
        include:
        - /var/log/pods/**/**/*.log
        include_file_name: false
        include_file_path: true
        operators:
        - id: parser-containerd
          regex: ^(?P<time>[^ ^Z]+Z) (?P<stream>stdout|stderr) (?P<logtag>[^ ]*) ?(?P<log>.*)$
          timestamp:
            layout: '%Y-%m-%dT%H:%M:%S.%LZ'
            parse_from: attributes.time
          type: regex_parser
        - default: extract_metadata_from_filepath
          id: get-format
          routes:
          - expr: attributes.log matches "^{\".*\":\".*\",\".*\":\".*\"}+"
            output: logback_json_parser
          type: router
        - id: logback_json_parser
          output: trace
          parse_from: attributes.log
          parse_to: body
          type: json_parser
        - id: trace
          output: severity_parser
          type: trace_parser
        - id: severity_parser
          parse_from: body.level
          type: severity_parser
        - id: extract_metadata_from_filepath
          parse_from: attributes["log.file.path"]
          regex: ^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]+)\/(?P<container_name>[^\._]+)\/(?P<restart_count>\d+)\.log$
          type: regex_parser
        - from: attributes.stream
          to: attributes["log.iostream"]
          type: move
        - from: attributes.container_name
          output: copycontainer
          to: resource["k8s.container.name"]
          type: move
        - from: resource["k8s.container.name"]
          id: copycontainer
          output: copycontainertosource
          to: resource["container.name"]
          type: copy
        - from: resource["k8s.container.name"]
          id: copycontainertosource
          to: resource["source"]
          type: copy
        - from: attributes.namespace
          to: resource["k8s.namespace.name"]
          type: move
        - from: attributes.pod_name
          to: resource["k8s.pod.name"]
          type: move
        - from: attributes.restart_count
          to: resource["k8s.container.restart_count"]
          type: move
        - from: attributes.uid
          to: resource["k8s.pod.uid"]
          type: move
        start_at: end
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:4318
    service:
      extensions:
      - k8s_observer
      - health_check
      - memory_ballast
      - zpages
      pipelines:
        logs:
          exporters:
          - otlp
          processors:
          - batch
          receivers:
          - filelog
      telemetry:
        metrics:
          address: 0.0.0.0:8888

Log output

No response

Additional context

I can provide more details if necessary.

The text was updated successfully, but these errors were encountered:

github-actions · 2023-01-05T17:07:39Z

Pinging code owners:

exporter/datadog: @KSerrania @mx-psi @gbbr @knusbaum @amenasria @dineshg13

See Adding Labels via Comments if you do not have permissions to add labels yourself.

github-actions · 2023-03-13T03:30:30Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

exporter/datadog: @mx-psi @gbbr @dineshg13 @liustanley @songy23

See Adding Labels via Comments if you do not have permissions to add labels yourself.

mx-psi · 2023-03-13T09:32:33Z

This was fixed by #17399

jmichalek132 added bug Something isn't working needs triage New item requiring triage labels Jan 5, 2023

mx-psi added priority:p2 Medium exporter/datadog Datadog components data:logs Logs related issues and removed bug Something isn't working needs triage New item requiring triage labels Jan 10, 2023

liustanley mentioned this issue Jan 13, 2023

[exporter/datadog] Support ddtags when batching logs #17399

Merged

liustanley mentioned this issue Feb 3, 2023

[exporter/datadog] Unit tests for logs sender and minor performance improvement #18271

Merged

github-actions bot added the Stale label Mar 13, 2023

mx-psi closed this as completed Mar 13, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[exporter/datadogexporter] Wrong ddtags on logs #17398

[exporter/datadogexporter] Wrong ddtags on logs #17398

jmichalek132 commented Jan 5, 2023

github-actions bot commented Jan 5, 2023

github-actions bot commented Mar 13, 2023

mx-psi commented Mar 13, 2023