Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow composite runtime fields to add top level fields #87690

Open
ruflin opened this issue Jun 15, 2022 · 7 comments
Open

Allow composite runtime fields to add top level fields #87690

ruflin opened this issue Jun 15, 2022 · 7 comments
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team

Comments

@ruflin
Copy link
Member

ruflin commented Jun 15, 2022

Composite runtime fields are especially useful in the context of grok / dissect to extract multiple fields at once. AFAIK there is currently the limitation that all these runtime fields need to have the same prefix which has the issue, these fields can not be mapped to ECS properly. Below is a simplified example to demonstrate the problem.

DELETE _data_stream/logs-example-default
PUT _data_stream/logs-example-default

PUT logs-example-default/_mappings
{
  "runtime": {
    "example": {
      "type": "composite",
      "script": """
        Map fields=dissect('%{source.ip} [%{timestamp}] %{http.request.method}').extract(params["_source"]["message"]);
        DateTimeFormatter dtf = DateTimeFormatter.ofPattern("dd/MMM/yyyy:H:m:s Z");
        ZonedDateTime zdt = ZonedDateTime.parse(fields["timestamp"],dtf);
        long datetime = zdt.toInstant().toEpochMilli();
        
        fields["timestamp"] = datetime;
        emit(fields);
             
      """,
      "fields": {
        "timestamp": {
          "type": "date"
        },
        "source.ip": {
          "type": "ip"
        },
        "http.request.method": {
          "type": "keyword"
        }
      }
    }
  }
}

POST logs-example-default/_doc/
{
  "@timestamp": "2099-11-15T13:12:00",
  "message": "67.43.156.13 [07/Dec/2016:10:34:43 +0100] GET"
}

GET logs-example-default/_search
{
  "fields" : ["*"]
}

The data should be in the source.ip field but because of the limitation it is in example.source.ip. I tried to have an alias from source.ip to example.source.ip to at least get the query to work but I would also argue this is not a great solution as it would prevent from having documents with actual data in the source.ip field itself.

@nik9000 nik9000 added the :Search/Mapping Index mappings, including merging and defining field types label Jun 15, 2022
@elasticmachine elasticmachine added the Team:Search Meta label for search team label Jun 15, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@nik9000
Copy link
Member

nik9000 commented Jun 15, 2022

You should be able to use a field alias to "float" it out. But I've marked this as discuss so folks can talk and see what they'd like to do.

@javanna javanna removed their assignment Jun 15, 2022
@javanna
Copy link
Member

javanna commented Jun 15, 2022

heya @ruflin could you expand on why having a common prefix is a limitation and why you'd need to send these fields to the top-level? Do they replace some existing fields?

Wouldn't the field alias/additional runtime field solution be equivalent to a built-in solution when it comes to preventing a field with same name from holding data? This may stem from the fact that field aliases are defined under properties, so maybe I would try out a runtime field which does not prevent you from indexing data into a field with same name, although that would be shadowed and not accessible at search time.

@ruflin
Copy link
Member Author

ruflin commented Jun 17, 2022

When we ingest data, we try to map it to ECS The above example is a simplified example of nginx logs. Currently we do it all via ingest pipelines but in many cases, we don't have to index all the data but would like to it with runtime fields instead. The expected outcome is if we extract some fields, these should still be in ECS, for example source.ip and http.request.method.

I couldn't fully follow your second comment. But my ideal outcome would be, that I could have documents that have source.ip as an indexed field inside and other documents on the same data stream where it is a runtime field. But that is an additional goal after I can do the correct mapping to ECS fields. Even better would be if I could convert my composite runtime field to an index runtime field like I can do for other runtime fields.

@javanna
Copy link
Member

javanna commented Jun 17, 2022

But my ideal outcome would be, that I could have documents that have source.ip as an indexed field inside and other documents on the same data stream where it is a runtime field.

I see, but then each index would either have the field as a runtime field or as an indexed field? This reminds of the discussion happening in #86536 . The way we envisioned these changes so far is at the next rollover, hence you would not have an index with a mixed approach. In that case a field alias should work? What I was hinting at with the second part of my comment is that field aliases could be re-implemented as runtime fields. Effectively you can already implement a field alias through a runtime field but you need to define a script for it which is not fantastic for the user experience. if a field alias is defined under runtime, an indexed field with same name can still be mapped under properties, although shadowed. Though I was questioning whether this is a concern at all, assuming that each index should have only one variant of the field in question.

Converting a composite runtime field to indexed is on the roadmap, see #77625 .

@ruflin
Copy link
Member Author

ruflin commented Jun 20, 2022

Great to see composite runtime fields on the roadmap and #86536 is interesting indeed.

Taking all the above, going back to the initial question and putting aside the discussion around if runtime or mapped field is default on query time, I would still like to be able to set source.ip directly in the composite runtime field. Does my explanation around ECS help on why this is needed?

@javanna
Copy link
Member

javanna commented Jun 23, 2022

I had a chat with @ruflin and I have now a better understanding of the problem. Field aliases can only point to indexed fields, mapped under properties, and not to runtime fields. The current workaround is to create a runtime field with a script that emits the value of example.source.ip. One follow-up could be that field aliases should really be implemented as runtime fields (see #87969). Even better, one may wonder why there is a need to declare a second field to expose the grok sub-field to the top-level. This last point we have discussed quite a bit when we were designing the composite runtime field, but it does not hurt to look back and discuss it again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Mapping Index mappings, including merging and defining field types Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

4 participants