Skip to content

Internal PoC #1108

@ralphm

Description

@ralphm

Problem

We do not have a recommended way to make K8s logs available to the logs view in the Netdata UI.

Description

As we are continually improving our logging support, our customers are asking for supporting logging use cases beyond native logging into systemd journal and Windows Event Log. This internal PoC is to define a recommended way to provide access to K8s container logs in clear defined phases.

Definition of Done

The definition of done for each phase is:

  • Any required components (Netdata plugins, collectors, etc. and external tools) must be available for installation by a user. For Netdata Agent that means that it must be in a released (nightly) version.
  • A working helm configuration to deploy.
  • Any configuration and installation steps are documented in Learn. If this depends on a nightly version, the document should reflect that.

Concerns

  • Ease of install
  • Ease of configuration
  • Robustness of the setup
  • Ability to retain access to logs even when a given k8s node is no longer available.

Importance

must have

Value proposition

  1. Recommended way for users to see k8s container logs

Proposed implementation

Phases

Phase 1: no OTel, no centralization
Constraints
  • Without depending on otel.plugin and the Netdata Distribution of OpenTelemetry Collector (NDOC)
  • No logs centralization
Suggestions:
  1. A DeamonSet that:

    • Tails the JSON container logs in /var/log/containers/*.log
    • pass them through log2journal json to extract all JSON fields and emit a journal entry
    • pass them through systemd-cat-native --namespace k8s to index into the local journal
  2. ???

Phase 2: centralized logging server
Constraints
  • Centralized logging with the Netdata parent
Suggestions
  1. systemd journal remote pod

    • Create a pod that's co-located with the parent Agent to run the systemd-journal-remote service.
    • Point the DeamonSet pod of Phase 1 to this new pod.
    • Consider whether this needs host networking
  2. Local logging and then forwarding

    The idea here is to make it analogous to ingesting metrics in a child Agent and then streaming it to the Parent. The upshot of it is that logs from nodes that get shutdown don't get lost, including all of the non-container logs. The downside is more complexity.

    • Requires the systemd journal remote pod from 1.
    • Also requires the local systemd journal to forward its logs to the centralized logging setup for each k8s node
Phase 3: NDOC
Constraints
  • Use the Netdata Distribution of OpenTelemetry Collector (NDOC) instead of the custom DaemonSet for tailing logs.
  • Dynamic Configuration of NDOC is not in scope
Suggestions
  1. Hook up NDOC and otel.plugin

    • Using otel.plugin for ingesting OpenTelemetry logs coming into its GPRC endpoint into systemd journal
    • Using the Netdata Distribution of OpenTelemetry Collector (NDOC) to create a pipeline that:
      • Tails logs using the filelog receiver.
      • Exports them to otel.plugin using the otlp exporter.
      • Possibly uses the batch processor
    • NDOC should be launched by the Agent, so everything is in the same pod

Stretch goals:

  • Having NDOC pipeline configuration in the UI using dynamic configuration

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions