EFK: Add logs aggregation layer based on fluentd #51

ricsanfre · 2022-06-20T09:57:30Z

Enhancement Request

Add logs aggregation layer to logging architecture. From this layer logs can be aggregated, filtered and routed to different destinations to further processing (elasticsearch, kafka, s3, etc.)

source: Common architecture patterns with fluentd and fluentbit

Implementation details

Log aggregation layer can be based on fluentd or fluentbit. Both of them can be used as Log forwarders and Log aggregators fluentbit documentation. The difference is only in the number of plugins (input, output, etc) available.

Fluentbit does not support kafka input plugin (only output). Fluentd supports kafka integration as input and output. Fluentd should be the right choice for log aggregation layer, in case the logging architecture evolve in future to have a Kafka cluster as buffer mechanism between log forwarders and log aggregators.

source: One Year of Log Management at Vinted

Changes to the current logging architecture:

Fluentbit collectors need to be reconfigured to forward logs to the fluentd aggregator.
Fluentd need to be deployed in kubernetes cluster not as daemon set but as a deployment. See an example of how to do it here
Fluentd aggregator need to be configured to forward the logs to elasticsearch.
Fluentd aggregator can be configured to forward logs as well to a Kafka topic.

ricsanfre · 2022-06-24T07:28:40Z

Fluentd also need to be configured to export Prometheus metrics. See [fluentd documentation] (https://docs.fluentd.org/monitoring-fluentd/monitoring-prometheus)

ricsanfre · 2022-07-01T12:51:21Z

About exposing fluentd forwarder service to collect logs outside the cluster

Make fluentd forwarder port available from outside the cluster for collecting logs coming from external hosts (i.e. gateway) and remove the current exposure of ES service.
Communications with fluentd exposed service must be secured. TLS need to be enabled in the exposed service for encrypting the communications and authentication mechanism need to be activated.
Within the cluster Linkerd is already encrypting inter-pod communications, but authentication mechanism must be provided between fluentbit forwarders and fluentd aggregator.

For receiving logs outside the cluster, TLS need to be enabled anyway. TLS certificate can be automatically generated by cert-manager.

<source>
  @type forward
  port 24224
  bind 0.0.0.0
  <transport tls>
     cert_path /fluentd/certs/tls.crt
     private_key_path /fluentd/certs/tls.key
  </transport
 <security>
      self_hostname fluend-aggregator
      shared_key s1cret0
  </security>
</source>

ricsanfre · 2022-07-01T14:06:25Z

About TLS certificate generation and loading in fluentd POD

Generate fluentd TLS certificate with certmanager using custom cluster CA.

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: fluentd-tls
  namespace: k3s-logging
spec:
  # Secret names are always required.
  secretName: fluentd-tls
  duration: 2160h # 90d
  renewBefore: 360h # 15d
  commonName: fluentd.picluster.ricsanfre.com
  isCA: false
  privateKey:
    algorithm: ECDSA
    size: 256
  usages:
    - server auth
    - client auth
  # At least one of a DNS Name, URI, or IP address is required.
  dnsNames:
    - fluentd.picluster.ricsanfre.com
  # ClusterIssuer: ca-issuer.
  issuerRef:
    name: ca-issuer
    kind: ClusterIssuer
    group: cert-manager.io

Certmanager will create a TLS Secret:

apiVersion: v1
kind: Secret
metadata:
  name: fluentd-tls
  namespace: k3s-logging
data:
  tls.crt: base64 encoded cert
  tls.key: base64 encoded key
type: kubernetes.io/tls

That certificate can be mounted in fluentd pod as volume /fluentd/certs

apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: fluentd
  name: fluentd
  namespace: k3s-logging
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      containers:
      - image: "{{ efk_fluentd_aggregator_image }}"
        imagePullPolicy: Always
        name: fluentd
        env:
          # Elastic operator creates elastic service name with format cluster_name-es-http
          - name:  FLUENT_ELASTICSEARCH_HOST
            value: efk-es-http
            # Default elasticsearch default port
          - name:  FLUENT_ELASTICSEARCH_PORT
            value: "9200"
          # Elasticsearch user
          - name: FLUENT_ELASTICSEARCH_USER
            value: "elastic"
          # Elastic operator stores elastic user password in a secret
          - name: FLUENT_ELASTICSEARCH_PASSWORD
            valueFrom:
              secretKeyRef:
                name: "efk-es-elastic-user"
                key: elastic
          # Setting a index-prefix for fluentd. By default index is logstash
          - name:  FLUENT_ELASTICSEARCH_INDEX_NAME
            value: fluentd
          - name: FLUENT_ELASTICSEARCH_LOG_ES_400_REASON
            value: "true"
        ports:
        - containerPort: 24224
          name: forward
          protocol: TCP
        - containerPort: 24231
          name: prometheus
          protocol: TCP
        volumeMounts:
        - mountPath: /fluentd/etc
          name: config
          readOnly: true
        - mountPath: "/fluentd/certs"
          name: fluentd-tls
          readOnly: true
      volumes:
      - configMap:
          defaultMode: 420
          name: fluentd-config
        name: config

ricsanfre · 2022-07-22T14:22:55Z

About production ready forwarder/aggregator configuration

Fluentbit and fluentd filesystem buffering mechanisms should be enabled.
- Fluentbit filesystem input buffering
- Fluentd filesystem output buffering
Fluentd aggregator should be deployed in HA, Kubernetes deployment with several replicas. Kubernetes HPA (Horizontal POD Autoscaler) could be configured to automatically scale the number of replicas.
Fluentd could be deployed as Statefulset instead of Deployment with dedicated pvc for disk buffer. This way if pod is terminated buffer information is not lost.

ricsanfre · 2022-07-29T15:22:39Z

About the use of official helm fluentd chart

fluentd official helm chart also supports the deployment of fluentd as deployment or statefulset instead of daemonset. In case of deployment HPA is also supported.

values.yml could be something like this:

# Deploy fluentd as deployment
kind: "Deployment"
# Number of replicas
replicaCount: 1
# Enabling HPA
autoscaling:
  enabled: true
  minReplicas: 1
  maxReplicas: 100
  targetCPUUtilizationPercentage: 80

# Do not create serviceAccount, RBAC and podSecurityPolicy objects
serviceAccount:
  create: false
rbac:
  create: false
podSecurityPolicy:
  enabled: false

## Additional environment variables to set for fluentd pods
env:
  ...

# Volumes and VolumeMounts (only configuration files and certificates)
volumes:
- name: etcfluentd-main
  configMap:
    name: fluentd-main
    defaultMode: 0777
- name: etcfluentd-config
  configMap:
    name: fluentd-config
    defaultMode: 0777
- name: fluentd-tls
  secret:
    secretName: fluentd-tls

volumeMounts:
- name: etcfluentd-main
  mountPath: /etc/fluent
- name: etcfluentd-config
  mountPath: /etc/fluent/config.d/
- mountPath: /fluentd/certs
  name: fluentd-tls
  readOnly: true

service:
  type: "ClusterIP"
  annotations: {}
  # loadBalancerIP:
  # externalTrafficPolicy: Local
  ports:
  - name: "forwarder"
    protocol: TCP
    containerPort: 24224
  - name: prometheus
    containerPort: 24231
    protocol: TCP

## Fluentd list of plugins to install
##
plugins: []
# - fluent-plugin-out-http

## Add fluentd config files from K8s configMaps
##
configMapConfigs:
  - fluentd-prometheus-conf
# - fluentd-systemd-conf

## Fluentd configurations:
##
fileConfigs:
  01_sources.conf: |-
    ## logs from podman
    <source>
      @type tail
      @id in_tail_container_logs
      @label @KUBERNETES
      path /var/log/containers/*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      read_from_head true
      <parse>
        @type multi_format
        <pattern>
          format json
          time_key time
          time_type string
          time_format "%Y-%m-%dT%H:%M:%S.%NZ"
          keep_time_key false
        </pattern>
        <pattern>
          format regexp
          expression /^(?<time>.+) (?<stream>stdout|stderr)( (.))? (?<log>.*)$/
          time_format '%Y-%m-%dT%H:%M:%S.%NZ'
          keep_time_key false
        </pattern>
      </parse>
      emit_unmatched_lines true
    </source>
  02_filters.conf: |-
    <label @KUBERNETES>
      <match kubernetes.var.log.containers.fluentd**>
        @type relabel
        @label @FLUENT_LOG
      </match>
      # <match kubernetes.var.log.containers.**_kube-system_**>
      #   @type null
      #   @id ignore_kube_system_logs
      # </match>
      <filter kubernetes.**>
        @type kubernetes_metadata
        @id filter_kube_metadata
        skip_labels false
        skip_container_metadata false
        skip_namespace_metadata true
        skip_master_url true
      </filter>
      <match **>
        @type relabel
        @label @DISPATCH
      </match>
    </label>
  03_dispatch.conf: |-
    <label @DISPATCH>
      <filter **>
        @type prometheus
        <metric>
          name fluentd_input_status_num_records_total
          type counter
          desc The total number of incoming records
          <labels>
            tag ${tag}
            hostname ${hostname}
          </labels>
        </metric>
      </filter>
      <match **>
        @type relabel
        @label @OUTPUT
      </match>
    </label>
  04_outputs.conf: |-
    <label @OUTPUT>
      <match **>
        @type elasticsearch
        host "elasticsearch-master"
        port 9200
        path ""
        user elastic
        password changeme
      </match>
    </label>

ricsanfre added the enhancement New feature or request label Jun 20, 2022

ricsanfre added this to the backlog milestone Jun 20, 2022

ricsanfre modified the milestones: backlog, release 1.4 Jul 19, 2022

ricsanfre mentioned this issue Aug 3, 2022

Feature/fluentd #59

Merged

ricsanfre closed this as completed in #59 Aug 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EFK: Add logs aggregation layer based on fluentd #51

EFK: Add logs aggregation layer based on fluentd #51

ricsanfre commented Jun 20, 2022 •

edited

Loading

ricsanfre commented Jun 24, 2022

ricsanfre commented Jul 1, 2022 •

edited

Loading

ricsanfre commented Jul 1, 2022 •

edited

Loading

ricsanfre commented Jul 22, 2022

ricsanfre commented Jul 29, 2022

EFK: Add logs aggregation layer based on fluentd #51

EFK: Add logs aggregation layer based on fluentd #51

Comments

ricsanfre commented Jun 20, 2022 • edited Loading

Enhancement Request

Implementation details

ricsanfre commented Jun 24, 2022

ricsanfre commented Jul 1, 2022 • edited Loading

About exposing fluentd forwarder service to collect logs outside the cluster

ricsanfre commented Jul 1, 2022 • edited Loading

About TLS certificate generation and loading in fluentd POD

ricsanfre commented Jul 22, 2022

About production ready forwarder/aggregator configuration

ricsanfre commented Jul 29, 2022

About the use of official helm fluentd chart

ricsanfre commented Jun 20, 2022 •

edited

Loading

ricsanfre commented Jul 1, 2022 •

edited

Loading

ricsanfre commented Jul 1, 2022 •

edited

Loading