[receiver/k8sobjectsreceiver] watcher not restarting when kubernetes hangs up. #18078

stokerjon · 2023-01-27T16:13:36Z

Component(s)

receiver/k8sobjects

What happened?

Description

Kubernetes periodically hangs up watch connections, the receiver does not then restart them.

Steps to Reproduce

Create a k8sobjectsreceiver watching for kubernetes events.

Expected Result

k8sobjectsreceiver will continue watching for events and when kubernetes hangs up restart the watch

Actual Result

Kubernetes hangs up k8sobjectsreceiver watch. but does not restart

Collector version

0.68.0

Environment information

Environment

OS: Amazon Linux

OpenTelemetry Collector configuration

exporters:
  splunk_hec/platform_logs:
    disable_compression: true
    endpoint: OMITTED
    index: OMITTED
    max_connections: 200
    profiling_data_enabled: false
    retry_on_failure:
      enabled: true
      initial_interval: 5s
      max_elapsed_time: 300s
      max_interval: 30s
    sending_queue:
      enabled: true
      num_consumers: 10
      queue_size: 5000
    source: kubernetes
    splunk_app_name: splunk-otel-collector
    splunk_app_version: 0.68.0
    timeout: 10s
    tls:
      insecure_skip_verify: false
    token: OMITTED
extensions:
      health_check: null
      memory_ballast:
        size_mib: OMITTED
processors:
  batch: null
  memory_limiter:
    check_interval: 2s
    limit_mib: OMITTED
  resource:
    attributes:
    - action: insert
      key: metric_source
      value: kubernetes
    - action: upsert
      key: k8s.cluster.name
      value: OMITTED
  resource/add_environment:
    attributes:
    - action: insert
      key: deployment.environment
      value: OMITTED
  resourcedetection:
    detectors:
    - env
    - eks
    - ec2
    - system
    override: true
    timeout: 10s
  transform/add_sourcetype:
    log_statements:
    - context: log
      statements:
      - set(resource.attributes["com.splunk.sourcetype"], Concat(["kube:object:",
        attributes["k8s.resource.name"]], ""))
receivers:
  k8sobjects:
    auth_type: serviceAccount
    objects:
    - group: events.k8s.io
      mode: watch
      name: events
service:
  extensions:
  - health_check
  - memory_ballast
  pipelines:
    logs/objects:
      exporters:
      - splunk_hec/platform_logs
      processors:
      - memory_limiter
      - batch
      - resourcedetection
      - resource
      - transform/add_sourcetype
      - resource/add_environment
      receivers:
      - k8sobjects

Log output

2023-01-27T10:12:45.167Z        warn    k8sobjectsreceiver@v0.68.0/receiver.go:162      Watch channel closed unexpectedly       {"kind": "receiver", "name": "k8sobjects", "pipeline": "logs", "resource": "events.k8s.io/v1, Resource=events"}

Additional context

No response

github-actions · 2023-01-27T16:15:02Z

Pinging code owners:

receiver/k8sobjects: @dmitryax @harshit-splunk

See Adding Labels via Comments if you do not have permissions to add labels yourself.

nicolastakashi · 2023-02-17T16:54:02Z

Hey, @atoulme and @stokerjon I faced the same issue, and I'm available to help with that fix.

I'm not sure how to force reproduce that issue, do you know how we can simulate this behavior?

atoulme · 2023-02-19T06:42:25Z

No idea. Maybe create a mock k8s object api and restart it?

stokerjon · 2023-02-19T11:51:36Z

Setting a short timeoutSeconds in ListOptions for the watch should have the same affect.
The hang up isn't random its happening every 30 min, so its probably a default timeout in the api server that hasn't been accounted for.

nicolastakashi · 2023-02-19T17:36:45Z

I managed it!
Just adjusting the unit test before opening a PR, if you want you can assign this issue to me @stokerjon

dmitryax · 2023-02-20T04:16:00Z

Thanks @nicolastakashi

stokerjon added bug Something isn't working needs triage New item requiring triage labels Jan 27, 2023

github-actions bot added the receiver/k8sobjects label Jan 27, 2023

atoulme added priority:p2 Medium and removed needs triage New item requiring triage labels Jan 27, 2023

dmitryax assigned nicolastakashi Feb 20, 2023

nicolastakashi mentioned this issue Feb 21, 2023

[receiver/k8sobjects] improving watch reliability on recoverable issues #18828

Merged

dmitryax closed this as completed Mar 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/k8sobjectsreceiver] watcher not restarting when kubernetes hangs up. #18078

[receiver/k8sobjectsreceiver] watcher not restarting when kubernetes hangs up. #18078

stokerjon commented Jan 27, 2023

github-actions bot commented Jan 27, 2023

nicolastakashi commented Feb 17, 2023

atoulme commented Feb 19, 2023

stokerjon commented Feb 19, 2023

nicolastakashi commented Feb 19, 2023 •

edited

Loading

dmitryax commented Feb 20, 2023

[receiver/k8sobjectsreceiver] watcher not restarting when kubernetes hangs up. #18078

[receiver/k8sobjectsreceiver] watcher not restarting when kubernetes hangs up. #18078

Comments

stokerjon commented Jan 27, 2023

Component(s)

What happened?

Description

Steps to Reproduce

Expected Result

Actual Result

Collector version

Environment information

Environment

OpenTelemetry Collector configuration

Log output

Additional context

github-actions bot commented Jan 27, 2023

nicolastakashi commented Feb 17, 2023

atoulme commented Feb 19, 2023

stokerjon commented Feb 19, 2023

nicolastakashi commented Feb 19, 2023 • edited Loading

dmitryax commented Feb 20, 2023

nicolastakashi commented Feb 19, 2023 •

edited

Loading