Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multi-key fields from SemConv #2333

Open
gregkalapos opened this issue Apr 9, 2024 · 5 comments
Open

Support multi-key fields from SemConv #2333

gregkalapos opened this issue Apr 9, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@gregkalapos
Copy link

Summary

(Not sure if multi-key fields is the right term; if there is a better short name describing it, let's update the title.)

OTel SemConv defines fields which can have multiple keys - examples are:

There are 2 aspects of this:

  1. When the OTel SemConv <-> ECS merge happens, how do such fields get into ECS?
  2. How should the mapping look like for such fields for Elasticsearch?

We discussed this with @felixbarny shortly, regarding point 2:

  • We could use flattened field type
  • We could set enabled to false.
  • In APM we have a field called labels with similar dynamic keys, currently with this mapping:
        {
          "labels": {
            "path_match": "labels.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        }

Issue with above is that this leads to field explosion.

@gregkalapos gregkalapos added the enhancement New feature or request label Apr 9, 2024
@gregkalapos
Copy link
Author

Other similar case is http.request.headers and http.response.headers already existing in the traces-apm mapping:

          "http.request.headers": {
            "path_match": "http.request.headers.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }
        },
        {
          "http.response.headers": {
            "path_match": "http.response.headers.*",
            "match_mapping_type": "string",
            "mapping": {
              "type": "keyword"
            }
          }

@felixbarny
Copy link
Member

Other similar case is http.request.headers and http.response.headers already existing in the traces-apm mapping:

That's interesting. I wasn't aware that we were adding all HTTP headers to the mapping. @axw did you hear of situations where this lead to mapping explosions? It seems a bit dangerous to me as anyone can just create a bunch of requests with unique HTTP headers and force the backend into a field explosion.

This would also work in case subobjects is set to false

Click to expand example
DELETE felixbarny-test?ignore_unavailable=true
PUT felixbarny-test
{
  "mappings": {
    "properties": {
      "attributes": {
        "subobjects": false,
        "type": "object",
        "properties": {
          "http.request.header": {
            "type": "flattened"
          }
        }
      }
    }
  }
}

POST felixbarny-test/_doc?refresh
{
  "attributes": {
    "http.request.header.foo": "bar",
    "http.request.header.bar": "baz"
  }
}

GET felixbarny-test/_mapping

POST felixbarny-test/_search


# DELETE felixbarny-test?ignore_unavailable=true 200 OK
{
  "acknowledged": true
}
# PUT felixbarny-test 200 OK
{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "felixbarny-test"
}
# POST felixbarny-test/_doc?refresh 201 Created
{
  "_index": "felixbarny-test",
  "_id": "7C2P8I4BRVt7x5oqUVes",
  "_version": 1,
  "result": "created",
  "forced_refresh": true,
  "_shards": {
    "total": 2,
    "successful": 2,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}
# GET felixbarny-test/_mapping 200 OK
{
  "felixbarny-test": {
    "mappings": {
      "properties": {
        "attributes": {
          "subobjects": false,
          "properties": {
            "http.request.header": {
              "type": "flattened"
            }
          }
        }
      }
    }
  }
}
# POST felixbarny-test/_search 200 OK
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "felixbarny-test",
        "_id": "7C2P8I4BRVt7x5oqUVes",
        "_score": 1,
        "_source": {
          "attributes": {
            "http.request.header.foo": "bar",
            "http.request.header.bar": "baz"
          }
        }
      }
    ]
  }
}

This doesn't work when subobjects is set to false, because we can't add an object mapper for http.request.header with enabled: false to the mapping in a context where objects are disabled.

Click to expand example
DELETE felixbarny-test?ignore_unavailable=true
PUT felixbarny-test
{
  "mappings": {
    "properties": {
      "attributes": {
        "subobjects": false,
        "type": "object",
        "properties": {
          "http.request.header": {
            "type": "object",
            "enabled": false
          }
        }
      }
    }
  }
}

# DELETE felixbarny-test?ignore_unavailable=true 200 OK
{
  "acknowledged": true
}
# PUT felixbarny-test 400 Bad Request
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping: Object mapper [http.request.header] was found in a context where subobjects is set to false. Auto-flattening [http.request.header] failed because the value of [enabled] is [false]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: Object mapper [http.request.header] was found in a context where subobjects is set to false. Auto-flattening [http.request.header] failed because the value of [enabled] is [false]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Object mapper [http.request.header] was found in a context where subobjects is set to false. Auto-flattening [http.request.header] failed because the value of [enabled] is [false]"
    }
  },
  "status": 400
}

@axw
Copy link
Member

axw commented Apr 22, 2024

That's interesting. I wasn't aware that we were adding all HTTP headers to the mapping. @axw did you hear of situations where this lead to mapping explosions?

I haven't.

It seems a bit dangerous to me as anyone can just create a bunch of requests with unique HTTP headers and force the backend into a field explosion.

That's a good point, I don't think anyone considered this.

@trisch-me
Copy link
Contributor

@felixbarny should I bring into next semconv meeting your thoughts about field explosion? Or if you want you can create issue here

@felixbarny
Copy link
Member

I'm not sure if this is an issue with semantic conventions per-se. It depends on how backends can deal with namespaces that don't have a bounded number of fields. Some may be more resilient than others when it comes to the total number of fields. For example, some vendors may only store fields of certain namespaces for retrieval but don't maintain in-memory metadata about them. What we can do when it comes to mapping http.request.headers.* is to store this as a flattened field type, which avoids creating a field in the mapping for each field, so that there's no risk of a mapping explosion. That field type does come with certain tradeoffs but they seem reasonable in this case.

Still, I think it would be interesting to get some insight into how other backends intend to deal with these multi-key fields. So if you could bring that up in the next semconv meeting, that would be highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants