Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: OpenSearchErrorHandler - 400 #702

Closed
kaiohenricunha opened this issue Apr 19, 2023 · 5 comments
Closed

bug: OpenSearchErrorHandler - 400 #702

kaiohenricunha opened this issue Apr 19, 2023 · 5 comments
Labels
good first issue Good for newcomers

Comments

@kaiohenricunha
Copy link
Contributor

kaiohenricunha commented Apr 19, 2023

Describe the issue

I deployed a fluentbit-fluentd solution using the fluent-operator's chart. Most of the logs are being properly indexed on OpenSearch and enriched with Kubernetes metadata.

But many logs are being rejected with the following warn:

2018-11-13 11:16:38 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error="400 - Rejected by OpenSearch" location=nil tag="fluent.warn" time=2018-11-13 11:16:27.978851140 +0000 record={"error"=>"#<Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError: 400 - Rejected by OpenSearch>", "location"=>nil, "tag"=>"app.fluentcat", "time"=>2018-11-13 11:16:17.492985640 +0000, "record"=>{"message"=>"\xFF\xAD"}, "message"=>"dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error=\"400 - Rejected by OpenSearch\" location=nil tag=\"app.fluentcat\" time=2018-11-13 11:16:17.492985640 +0000 record={\"message\"=>\"\\xFF\\xAD\"}"}

What is worse is that these logs are not even discarded, fluentd keeps retrying to send them, which causes a flood of declined logs.

I set the log_os_400_reason to true on my ClusterOutput definition, and it gives me the exact reason for the error:

apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterOutput
metadata:
  name: cluster-output-opensearch
  labels:
    output.fluentd.fluent.io/enabled: "true"
    output.fluentd.fluent.io/tenant: "core"
spec:
  outputs:
    - customPlugin:
        config: |
          <match **>
            @type opensearch
            host "${FLUENT_OPENSEARCH_HOST}"
            port 443
            logstash_format  true
            logstash_prefix logs
            scheme https
            log_os_400_reason true
            @log_level ${FLUENTD_OUTPUT_LOGLEVEL:=debug}
            <endpoint>
              url "https://${FLUENT_OPENSEARCH_HOST}"
              region "${FLUENT_OPENSEARCH_REGION}"
              assume_role_arn "#{ENV['AWS_ROLE_ARN']}"
              assume_role_web_identity_token_file "#{ENV['AWS_WEB_IDENTITY_TOKEN_FILE']}"
            </endpoint>
          </match>

The warn now prints:

2023-04-19 15:10:36 +0000 [warn]: #0 dump an error event: error_class=Fluent::Plugin::OpenSearchErrorHandler::OpenSearchError error="400 - Rejected by OpenSearch [error type]: mapper_parsing_exception [reason]: 'object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value'" location=nil tag="kube.var.log.containers.XXXXXXX.log" time=2023-04-19 15:10:34.715435870 +0000 record={"log"=>"XXXXXXXXXXX", "kubernetes"=>{"pod_name"=>"XXXXX", "namespace_name"=>"XXXX", "labels"=>{"app"=>"XXXX", "pod-template-hash"=>"XXXX", "release"=>"XXXX"}, "annotations"=>{"checksum/config"=>"XXXX", "checksum/secrets"=>"XXXX", "kubernetes.io/psp"=>"eks.privileged"}, "container_name"=>"XXXX", "docker_id"=>"XXXX", "container_image"=>"docker.io/XXXX/XXXXX"}}
2023-04-19 15:10:36 +0000 [debug]: #0 Indexed (op = index), 656 bad_argument

So, the reason is a mapping collision it seems:

"[error type]: mapper_parsing_exception [reason]: 'object mapping for [kubernetes.labels.app] tried to parse field [app] as object, but found a concrete value"

My logs also have a kubernetes.labels.app.kubernetes.io/foo label that is conflicting. It has been discussed here.

We do have the suggested replaceDots feature on fluentbit output CRD, but only if I'm sending logs from fluentbit directly to OpenSearch,

I'm using forward as my ClusterOutput to send logs from Fluentbit to Fluentd, and then to OpenSearch, and forward output doesn't have a replaceDots parameter.

I tried using dynamic index templates like it is suggested by the fluent-plugin-opensearch documentation to deal with this error: https://github.com/fluent/fluent-plugin-opensearch/blob/main/README.Troubleshooting.md#random-400---rejected-by-opensearch-is-occured-why .

It doesn't work either.

I then tried a different index template like this, mapping the troublesome label:

PUT _template/logs
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "index": {
        "number_of_shards": "3",
        "number_of_replicas": "1",
        "refresh_interval": "30s"
      }
    },
    "mappings": {
      "_source": {
        "enabled": true
      },
      "properties": {
        "created_at": {
          "format": "EEE MMM dd HH:mm:ss Z YYYY",
          "type": "date"
        },
        "host_name": {
          "type": "keyword"
        },
        "kubernetes": {
          "properties": {
            "labels": {
              "properties": {
                "app": {
                  "type": "keyword"
                }
              }
            }
          }
        }
      }
    }
  }
}

It solved the error for kubernetes.labels.app, but started throwing for other labels such as:

mapper_parsing_exception [reason]: 'Could not dynamically add mapping for field [app.kubernetes.io/component]. Existing mapping for [kubernetes.labels.app] must be of type object but found [text].

What will be the equivalent replaceDots configuration for Fluentd ClusterOutput?
Is there any other viable solution?

To Reproduce

  1. Install the operator via Helm chart with fluentbit and fluentd enabled.
  2. Fluentbit should have forward as clusteroutput, which sends logs to fluentd service.
  3. Fluentbit should have kubernetes.annotations.enabled and kubernetes.labels.enabled set to true.
  4. Some pods with:
    kubernetes.labels.app
    and some pods with:
    kubernetes.labels.app.kubernetes.io/foo

Expected behavior

Logs being shipped to OpenSearch regardless of their labels. No OpenSearchErrorHandler - 400.

Your Environment

- Fluent Operator version: 2.1.0
- Container Runtime: docker
- Operating system: Linux
- Kernel version: 5.4.0-135-generic

How did you install fluent operator?

Helm chart.

Additional context

No response

@leonardolacerdaatlantico

I'm also experiencing this issue! Any ideas? Thanks

@benjaminhuo benjaminhuo added the good first issue Good for newcomers label Apr 21, 2023
@ArthurSudbrackIbarra
Copy link

Any ideas on what might be happening here?

@kaiohenricunha
Copy link
Contributor Author

kaiohenricunha commented Apr 24, 2023

Issue resolved by adding a custom dedot Fluentd ClusterFilter:

apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFilter
metadata:
  labels:
    filter.fluentd.fluent.io/enabled: "true"
    filter.fluentd.fluent.io/name: "de-dot"
  name: de-dot
spec:
  filters:
    - customPlugin:
        config: |
          <filter **>
            @type dedot
            de_dot_separator _
            de_dot_nested ${FLUENTD_DEDOT_NESTED:=true}
          </filter>
---
apiVersion: fluentd.fluent.io/v1alpha1
kind: ClusterFluentdConfig
metadata:
  labels:
    config.fluentd.fluent.io/enabled: "true"
  name: cluster-fluentd-config
spec:
  clusterFilterSelector:
    matchLabels:
      filter.fluentd.fluent.io/enabled: "true"
      filter.fluentd.fluent.io/name: "de-dot"
  clusterOutputSelector:
    matchLabels:
      output.fluentd.fluent.io/enabled: "true"
      output.fluentd.fluent.io/tenant: "core"
  watchedNamespaces: [] # watches all namespaces when empty
---

This is because field names containing dots can create ambiguity in certain data structures. For example, logs with the label kubernetes.labels.statefulset.kubernetes.io/pod-name will collide with the label kubernetes.labels.statefulset.kubernetes.io/pod-name.keyword.

In this case, OpenSearch will think that the json of this log should have the following format and try to repeat it twice:

{
  "kubernetes": {
    "labels": {
      "statefulset": {
        "kubernetes": {
          "io": {
            "pod-name": "some_value"
          }
        }
      }
    }
  }
}

This misinterpretation can cause unexpected behavior during indexing and querying, and might result in data loss or errors. By replacing dots with underscores using the de_dot filter, we can avoid such ambiguity and ensure that the field name is correctly interpreted.

After applying the dedot filter, it becomes:

{
  "kubernetes": {
    "labels": {
      "statefulset_kubernetes_io/pod-name": "some_pod_name",
      "statefulset_kubernetes_io/pod-name.keyword": "some_pod_name_keyword"
    }
  }
}

Dedot filter parameters would be a nice feature to have in the fluent-operator.

@kaiohenricunha
Copy link
Contributor Author

@stephweiss did you get it solved?

@stephweiss
Copy link

@stephweiss did you get it solved?

Yes thank you for asking. My config was wrong. It totally worked the way it is described here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

5 participants