Update Entity Fields RFC for 9.2 Stack Release #2513

MikePaquette · 2025-08-15T15:35:50Z

1. What does this PR do?

Adds newly proposed entity.* fields to the Entity Fields RFC
Adds currently used entity.* fields to this RFC
Proposes a nested location in the naming hierarchy for entity.* fields populated by ECS producers
Proposes a root namespace location for entity.* fields when used by ECS consumers and entity-related data stores.

2. Which ECS fields are affected/introduced?

github-actions · 2025-08-15T15:36:00Z

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

eyalkraft · 2025-08-17T11:34:32Z

Hi Mike, Thanks for the proposal!

Is there a reason for having the new entity.schema_version instead of using the standard ecs.version?

MikePaquette · 2025-08-18T10:27:56Z

Is there a reason for having the new entity.schema_version instead of using the standard ecs.version?

Good question. I don't know when this was added, or how it is used, but I found it in the existing mappings in 8.19.1.

"entity": {
          "properties": {
            "definition_id": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "definition_version": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "display_name": {
              "type": "text",
              "fields": {
                "keyword": {
                  "type": "keyword",
                  "ignore_above": 1024
                }
              }
            },
            "id": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "identity_fields": {
              "type": "keyword"
            },
            "last_seen_timestamp": {
              "type": "date"
            },
            "name": {
              "type": "keyword"
            },
            "schema_version": {
              "type": "keyword",
              "ignore_above": 1024
            },
            "source": {
              "type": "keyword"
            },
            "type": {
              "type": "keyword",
              "ignore_above": 1024
            }
          }
        },

Add example for entity.attributes.Managed

mjwolf · 2025-08-29T23:20:30Z

rfcs/text/0049-entity-fields.md

 | Field | Type | Description |
 |-------|------|-------------|
+| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|
+| entity.definition_id | keyword | Used by Elastic solutions (e.g., Security, Observability) to denote the version of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|


entity.definition_id is repeated here

mjwolf · 2025-08-29T23:26:51Z

rfcs/text/0049-entity-fields.md


 | Field | Type | Description |
 |-------|------|-------------|
+| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|


I'm not too sure about this field being reserved. There are no other fields in ECS that are reserved for use by Elastic, and I don't really know if it makes sense to have them. Since the intention of ECS is a common schema that's shared with others, I don't if this would make sense to include.

Are there alternatives to adding this, such as using a custom field that's not defined in ECS?

Yes, I agree, it's an internal-only field - no need to have it defined in the ECS spec. We already intend to publish the entity store schema in our docs. Any problem with using entity.Definition_id as a custom field? (nesting a custom leaf field under an ECS-defined root object entity.*

@MikePaquette as we speak we are storing this information under entity.Metadata.EngineType. What do you think about keeping it under metadata?

mjwolf · 2025-08-29T23:29:38Z

rfcs/text/0049-entity-fields.md

 | entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. |
-| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. |
-| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. |
+| entity.attributes.* | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |


Suggested change

| entity.attributes.* | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |

| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attrbitues.Os_current` , `entity.attibutes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |

The existing flattened fields in ECS don't use .* in the name. I think it makes sense to remove here too, to be consistent

Thanks @mjwolf what do you mean by "flattened" here? As proposed, the object will contain keyword or boolean leaf fields, and not the flattened field datatype. No problem with removing the .* though, thanks.

mjwolf · 2025-08-29T23:31:55Z

rfcs/text/0049-entity-fields.md

+
+For ECS producers, such as Beats, Elastic Agent integrations, ingest pipelines, and other methods for shipping data to Elastic, the `entity.*` fields are expected to be nested as follows:
+- If the entity type is one of host, user, service, cloud, orchestrator), then the entity fields should be nested under the respecitve root field set, for example `host.entity.*` , `user.entity.*`, etc.
+- If the entity type is not one of the above, then that `entity.*` fields should be nested under a new root-level object, called `generic`, as `generic.entity.*`


Do you think generic will have any other fields apart from generic.entity.*

Not foreseen. Can anyone else think of a reason why we'd use the root field set generic.* for another purpose?

I can't think of any other reason

mjwolf · 2025-08-29T23:38:01Z

rfcs/text/0049-entity-fields.md

+
+### ECS Consumers or Data Stores
+
+For ECS consumers, such as the Elastic Security Solution entity store indices, the `entity.*` fields should be used directly at the root of the events.


Could you expand on what this means, I don't think I understand it. What do you mean by "used". Why can a data store only use entity.* if the producers can write to other top level fieldsets?

I think this is also an implementation specific detail that doesn't need to be part of ECS, so maybe it can be removed.

I agree we can remove this from the ECS spec, but the idea is that the entity store schema will include entity.* at the root level in each document. this will allow us and users to query the entity store for entity ID's (and other attributes) w/o burdening them with knowing under which entity class {host, user, service, generic, etc.} it might be stored.

rfcs/text/0049-entity-fields.md

typo Co-authored-by: Uri Weisman <68195305+uri-weisman@users.noreply.github.com>

fix typos Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

romulets · 2025-09-09T08:22:00Z

rfcs/text/0049-entity-fields.md


 | Field | Type | Description |
 |-------|------|-------------|
+| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|


@MikePaquette as we speak we are storing this information under entity.Metadata.EngineType. What do you think about keeping it under metadata?

romulets · 2025-09-09T08:23:36Z

rfcs/text/0049-entity-fields.md

 |-------|------|-------------|
+| entity.definition_id | keyword | Used Elastic solutions (e.g., Security, Observability) to denote the ID of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|
+| entity.definition_id | keyword | Used by Elastic solutions (e.g., Security, Observability) to denote the version of the entity definition which is used to extract entity details from ingested logs, events, intelligence, and other data types. Use of this value is reserved, and ECS producers, including data ingestion pipelines, must not populate this field|
+| entity.schema_version | keyword | Denotes the version of the entity schema,as published in Elastic Security documentation, to which this entity information conforms. Usually conforms to the Elastic Stack version.


Is this a new guideline? Or is it something that already happens? I'm not aware of an entity schema published in Elastic Security Docs that usually conforms to elastic stack version

No, I found this in the existing index mappings 8.19.1, and was not aware of it. Included here in case it was already in use, but it does not appear to be used, so we can remove it from this RFC.

rfcs/text/0049-entity-fields.md

romulets · 2025-09-09T08:29:46Z

rfcs/text/0049-entity-fields.md

 | entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. |
-| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. |
-| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. |
+| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |


The current pattern of snake case with capital first letter isn't friendly for most coding tools and linters, since it's not a standard pattern.

Could we adopt a pattern such as PascalCase. Reading the documentation over capitizaion for non ecs fields I see precedent for it, since it's mentioned both in HAProxy and NGINX examples

@mjwolf do you have a position on this topic?

ECS uses snake case for multiword fields, so I think it makes the most sense to keep using that. OTel semantic conventions also uses snake case.

I think for these examples in documentation it should keep using snake case. For the actual implementations, we don't require it, so other cases are allowed (as long as it starts with a capital).

for ECS, this is irrelevant because we should not provide examples using custom fields.

romulets · 2025-09-09T08:30:28Z

rfcs/text/0049-entity-fields.md

 | entity.reference | keyword | A URI, URL, or other direct reference to access or locate the entity in its source system. This could be an API endpoint, web console URL, or other addressable location. Format may vary by entity type and source system. |
-| entity.attributes.* | object | Normalized entity attributes using capitalized field names (e.g., `entity.attributes.StorageClass`, `entity.attributes.MfaEnabled`). Use this field set when you need specific data types, advanced search capabilities, or normalized values across different providers/sources. The capitalization pattern indicates these are entity-specific fields that won't be enumerated in the ECS schema. |
-| entity.raw.* | flattened | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. |
+| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |


I understand what you mean "static or semi-static attributes" as opposed to lifecycle and behaviour fields, but maybe we have a better way of describing what do we mean by that? By the static or semi-static definition, first_seen and issued_at would also fit under it, wouldn't it?

As I see attributes should be used to describe non temporal entity properties that could not be expressed in other parts of the entity field set.

I see your point, perhaps we can add "non-temporal" to the definition of the entity.attributes.* fields? Would that help?

I think so!

rfcs/text/0049-entity-fields.md

romulets · 2025-09-09T08:37:59Z

rfcs/text/0049-entity-fields.md

+| entity.attributes | object | A set of static or semi-static attributes of the entity. Usually boolean or keyword field data types. Examples include: `entity.attributes.Storage_class`, `entity.attributes.Mfa_enabled` , `entity.attributes.Privileged` , `entity.attributes.Granted_permissions` , `entity.attributes.Known_redirect` , `entity.attributes.Asset` , `entity.attributes.Managed` ,`entity.attributes.Os_current` , `entity.attributes.Os_patch_current` , `entity.attributes.Oauth_consent_restriction`). Use this field set when you need to track static or semi-static characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern for Examples indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |
+| entity.lifecycle.* | object | A set of temporal characteristics of the entity. Usually date field data type. Examples include: `entity.lifecycle.First_seen`, `entity.lifecycle.Last_activity` , `entity.lifecycle.Issued_at` , `entity.lifecycle.Last_password_change` ,etc. ). Use this field set when you need to track temporal characterstics of an entity for advanced searching and correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |
+| entity.behavior.* | object | A set of ephemeral characteristics of the entity, derived from observed behaviors during a specific time period. Behaviors are usually captured in event logs under fields such as `event.action` and other fields, but this field set captures "attributified" behavior indicators, using semantics like "this behavior was seen one or more times during this time period." Sytems using this field set may need to force a "reset" of these behavioral indicators at the end of their current period. Usually boolean field data type. Examples include: `entity.behavior.Used_usb_device`, `entity.behavior.Brute_force_victim` , `entity.behavior.New_country_login` ,etc. ). Use this field set when you need to capture and track ephemeral characterstics of an entity for advanced searching, correlation of normalized values across different providers/sources and entity types. Note the initial capitalization pattern indicates that any such fields are custom entity-specific fields that won't be enumerated in the ECS schema, and won't collide with any fields that may be defined by ECS in the future. |
+| entity.raw.* | object | Original, unmodified fields from the source system stored in a flattened format that maintains basic searchability. While `entity.attributes` should be used for normalized fields requiring advanced queries, this field preserves all source metadata with basic search capabilities. Supports existence queries, exact value matches, and simple aggregations. |


Should be of type flattened instead of object, no?

I think so for entity.raw . We've agreed to remove the .* from the objects, and this would be the same.

romulets · 2025-09-09T08:44:24Z

rfcs/text/0049-entity-fields.md

+
+For ECS producers, such as Beats, Elastic Agent integrations, ingest pipelines, and other methods for shipping data to Elastic, the `entity.*` fields are expected to be nested as follows:
+- If the entity type is one of host, user, service, cloud, orchestrator), then the entity fields should be nested under the respecitve root field set, for example `host.entity.*` , `user.entity.*`, etc.
+- If the entity type is not one of the above, then that `entity.*` fields should be nested under a new root-level object, called `generic`, as `generic.entity.*`


I can't think of any other reason

rfcs/text/0049-entity-fields.md

clarification. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

two typos. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

extraneous ")" Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

## Summary Add Upsert Entity API which reflects changes made via the API directly in the final entities index. #### What is implemented - Update documents - Allowed fields: - `entity.attributes.*` - `entity.lifecyle.*` - `entity.behavior.*` - Force update documents #### Added ES Assets: - Component Template `security_${type}_default-updates@platform` - Index Template `entities_v1_updates_security_${type}_default_index_template` - Index `.entities.v1.updates.security_${type}_default` #### What is not implemented - Create - ILM Policy to delete update documents #### How to test Ingest entities and run in the dev console: ``` PUT kbn:/api/entity_store/entities/generic { "entity": { "id": "<ID>", "attributes": { "StorageClass": "hot" } } } ``` ### How it works Before explaining the API itself, a refresher on the entity store <details> <summary> Entity Store Diagram </summary> ```mermaid flowchart TB subgraph Main Flow A[(.logs*)] ~~~~ B[Transform] B ---> |Fetches raw entity data| A B ---> | Sends Aggregated Data | G{Ingest Pipeline} G --> | Combines new and old data and stores it| C[(.entity.v1.latest*)] end G -.-> | Fetches data older than transform retention policy| D[(.enrich-index-entities)] subgraph Retention Policy Flow direction LR E((Kibana Task)) -->|trigger every hour| F[Enrich Policy Entities] F -.->| Fetches most upto date entities| C F --->| Stores data | D end ``` </details> Entity store works based on a Transform which has a look back period of X hours (current 3h). That means data older than look period won't be retained. To solve that an Enrich Policy is set in place that takes hourly snapshots of the current state of the entity store and makes it available to, via ingest pipeline, enrich entity updates and make sure that we have data older than look back period present. Awesome. This adds complexity to this feature. The goal is add an api that once called reflects data changes immediately in the latest index. A few things were considered: - ❌ Add a new document to an update index to be picked up by the transform. - That doesn't satisfy the requirement because changes will be made available only after a transform finishes its run - ❌ Perform update by query in the latest index. - That works great if the entity in the latest index doesn't get any other update via the transform - what we can't guarantee of course. So the arrived solution was to both perform update by query in the latest index and publish an update document to be picked up by the transform, this way we get the best of both worlds. - So first Update by query on `.entities.v1.latest.security_$TYPE_default` (update made via painless) - Indexes a new document on `.entities.v1.updates.security_$TYPE_default` to be picked up by the transform. ```mermaid flowchart LR A[User] -->|PUT /api/entity_store/entities/$TYPE| B[Kibana] B --> |update by query| C[(.entities.v1.latest.security_$TYPE_default)] B --> |create new doc| D[(.entities.v1.updates.security_$TYPE_default)] ``` We have considered adding a priority mechanism to the update index so we would make sure that documents published to it would be picked up. First we found out that we don't need to make sure a document is seen by the transform. By its definition, transforms process every document - it doesn't have any mechanism to drop documents in case processing is taking too long. Second, we can't do it because the aggregations we run on already sort to find latest values, and sort on multiple fields is not possible. ### Fields and Schema Prior to this PR non generic entities (`user`, `host`, and `service`) had no exposure to concepts defined in the proposed `entity.*` ECS Schema. We had to address this to be able to make changes to `entity.attributes`, `entity.lifecyle` and `entity.behavior` fields. [The current direction](elastic/ecs#2513) is that `entity.*` fields will be nested under `user`, `host`,`service` and `generic` for data input and the latest index, with the final entities, would have a root `entity.*` field set. In other words, there is a difference between entity data input location and entity data output location. The document ```json { "user": { "entity": { "id" : "romulo", "type": "aws-user" } } } ``` Will be represented in the latest index as ```json { "entity": { "id" : "romulo", "type": "aws-user" } } ``` Because of the current direction of the discussion we decided to go towards there already. Therefore this PR contains changes to the entity definitions themselves adding entity fields that uses data source `{TYPE}.entity.*` and as destination `entity.*` (`x-pack/solutions/security/plugins/security_solution/server/lib/entity_analytics/entity_store/entity_definitions/entity_descriptions/common.ts`). That also posed another question, what will be the input like? Will it accept entity "input" or entity "output" format? I had decided to stay close to "output" format, therefore accept `entity.*` json fields and would be applied to the entity store. The reason behind it is simplicity of API. I believe that having a inconsistent placement for `entity` in the api isn't a great experience, therefore always accepting ```json { "entity": { "id" : "romulo", "type": "aws-user" } } ``` is better imo. **That's contradictory to the input via logs however**. Curious to hear people's opinion. There is another problem that further deviates the API from any ECS definition (input or output). For fields under `entity.attributes`, `entity.lifecyle` and `entity.behavior` we decided to define them on ECS. And because they are "custom fields" product would like them to have a `Capital_snake_case` format, which is not a traditional and developing with TS in such a case is not really allowed at the moment. To curb that, the api expose those fields as `snake_case` and before storing convert them to `Capital_snake_case`. That was the best way I found while still having field definition on OpenAPI spec. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Mark Hopkin <mark.hopkin@elastic.co>

timestamp field note removed

updated definitions

added allowed values for entity.type

## Summary Add Upsert Entity API which reflects changes made via the API directly in the final entities index. #### What is implemented - Update documents - Allowed fields: - `entity.attributes.*` - `entity.lifecyle.*` - `entity.behavior.*` - Force update documents #### Added ES Assets: - Component Template `security_${type}_default-updates@platform` - Index Template `entities_v1_updates_security_${type}_default_index_template` - Index `.entities.v1.updates.security_${type}_default` #### What is not implemented - Create - ILM Policy to delete update documents #### How to test Ingest entities and run in the dev console: ``` PUT kbn:/api/entity_store/entities/generic { "entity": { "id": "<ID>", "attributes": { "StorageClass": "hot" } } } ``` ### How it works Before explaining the API itself, a refresher on the entity store <details> <summary> Entity Store Diagram </summary> ```mermaid flowchart TB subgraph Main Flow A[(.logs*)] ~~~~ B[Transform] B ---> |Fetches raw entity data| A B ---> | Sends Aggregated Data | G{Ingest Pipeline} G --> | Combines new and old data and stores it| C[(.entity.v1.latest*)] end G -.-> | Fetches data older than transform retention policy| D[(.enrich-index-entities)] subgraph Retention Policy Flow direction LR E((Kibana Task)) -->|trigger every hour| F[Enrich Policy Entities] F -.->| Fetches most upto date entities| C F --->| Stores data | D end ``` </details> Entity store works based on a Transform which has a look back period of X hours (current 3h). That means data older than look period won't be retained. To solve that an Enrich Policy is set in place that takes hourly snapshots of the current state of the entity store and makes it available to, via ingest pipeline, enrich entity updates and make sure that we have data older than look back period present. Awesome. This adds complexity to this feature. The goal is add an api that once called reflects data changes immediately in the latest index. A few things were considered: - ❌ Add a new document to an update index to be picked up by the transform. - That doesn't satisfy the requirement because changes will be made available only after a transform finishes its run - ❌ Perform update by query in the latest index. - That works great if the entity in the latest index doesn't get any other update via the transform - what we can't guarantee of course. So the arrived solution was to both perform update by query in the latest index and publish an update document to be picked up by the transform, this way we get the best of both worlds. - So first Update by query on `.entities.v1.latest.security_$TYPE_default` (update made via painless) - Indexes a new document on `.entities.v1.updates.security_$TYPE_default` to be picked up by the transform. ```mermaid flowchart LR A[User] -->|PUT /api/entity_store/entities/$TYPE| B[Kibana] B --> |update by query| C[(.entities.v1.latest.security_$TYPE_default)] B --> |create new doc| D[(.entities.v1.updates.security_$TYPE_default)] ``` We have considered adding a priority mechanism to the update index so we would make sure that documents published to it would be picked up. First we found out that we don't need to make sure a document is seen by the transform. By its definition, transforms process every document - it doesn't have any mechanism to drop documents in case processing is taking too long. Second, we can't do it because the aggregations we run on already sort to find latest values, and sort on multiple fields is not possible. ### Fields and Schema Prior to this PR non generic entities (`user`, `host`, and `service`) had no exposure to concepts defined in the proposed `entity.*` ECS Schema. We had to address this to be able to make changes to `entity.attributes`, `entity.lifecyle` and `entity.behavior` fields. [The current direction](elastic/ecs#2513) is that `entity.*` fields will be nested under `user`, `host`,`service` and `generic` for data input and the latest index, with the final entities, would have a root `entity.*` field set. In other words, there is a difference between entity data input location and entity data output location. The document ```json { "user": { "entity": { "id" : "romulo", "type": "aws-user" } } } ``` Will be represented in the latest index as ```json { "entity": { "id" : "romulo", "type": "aws-user" } } ``` Because of the current direction of the discussion we decided to go towards there already. Therefore this PR contains changes to the entity definitions themselves adding entity fields that uses data source `{TYPE}.entity.*` and as destination `entity.*` (`x-pack/solutions/security/plugins/security_solution/server/lib/entity_analytics/entity_store/entity_definitions/entity_descriptions/common.ts`). That also posed another question, what will be the input like? Will it accept entity "input" or entity "output" format? I had decided to stay close to "output" format, therefore accept `entity.*` json fields and would be applied to the entity store. The reason behind it is simplicity of API. I believe that having a inconsistent placement for `entity` in the api isn't a great experience, therefore always accepting ```json { "entity": { "id" : "romulo", "type": "aws-user" } } ``` is better imo. **That's contradictory to the input via logs however**. Curious to hear people's opinion. There is another problem that further deviates the API from any ECS definition (input or output). For fields under `entity.attributes`, `entity.lifecyle` and `entity.behavior` we decided to define them on ECS. And because they are "custom fields" product would like them to have a `Capital_snake_case` format, which is not a traditional and developing with TS in such a case is not really allowed at the moment. To curb that, the api expose those fields as `snake_case` and before storing convert them to `Capital_snake_case`. That was the best way I found while still having field definition on OpenAPI spec. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com> Co-authored-by: Mark Hopkin <mark.hopkin@elastic.co>

Update for 9.1 Stack Release

8aafc31

MikePaquette requested a review from a team as a code owner August 15, 2025 15:35

MikePaquette added 2 commits August 28, 2025 14:10

Merge branch 'main' into MikePaquette-patch-1

3fc39aa

Update 0049-entity-fields.md

4f8b4f2

Add example for entity.attributes.Managed

mjwolf reviewed Aug 29, 2025

View reviewed changes

uri-weisman reviewed Aug 31, 2025

View reviewed changes

rfcs/text/0049-entity-fields.md Outdated Show resolved Hide resolved

romulets reviewed Sep 2, 2025

View reviewed changes

rfcs/text/0049-entity-fields.md Outdated Show resolved Hide resolved

MikePaquette and others added 2 commits September 4, 2025 07:51

Update rfcs/text/0049-entity-fields.md

f0571ef

typo Co-authored-by: Uri Weisman <68195305+uri-weisman@users.noreply.github.com>

Update rfcs/text/0049-entity-fields.md

531d844

fix typos Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

uri-weisman mentioned this pull request Sep 7, 2025

[Entity Store] Extend Entities schema elastic/kibana#234247

Open

6 tasks

romulets reviewed Sep 9, 2025

View reviewed changes

MikePaquette and others added 3 commits September 9, 2025 07:35

Update rfcs/text/0049-entity-fields.md

4b079ec

clarification. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

Update rfcs/text/0049-entity-fields.md

1121d88

two typos. Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

Update rfcs/text/0049-entity-fields.md

9f3f0b6

extraneous ")" Co-authored-by: Rômulo Farias <romulodefarias@gmail.com>

romulets mentioned this pull request Sep 15, 2025

[Entity Store] Add Upsert Entity API elastic/kibana#234454

Merged

MikePaquette and others added 6 commits September 24, 2025 14:51

Update 0049-entity-fields.md

e304bd4

Update 0049-entity-fields.md

148ee1f

timestamp field note removed

Update 0049-entity-fields.md

306e766

updated definitions

Update 0049-entity-fields.md

b879c66

Update 0049-entity-fields.md

8323342

added allowed values for entity.type

Update entity fields RFC with new date and corrections

d5159f1

mjwolf approved these changes Sep 24, 2025

View reviewed changes

Merge branch 'main' into MikePaquette-patch-1

5043851

mjwolf merged commit 7df7f75 into main Sep 24, 2025
8 checks passed


		### ECS Consumers or Data Stores

		For ECS consumers, such as the Elastic Security Solution entity store indices, the `entity.*` fields should be used directly at the root of the events.

Update Entity Fields RFC for 9.2 Stack Release #2513

Update Entity Fields RFC for 9.2 Stack Release #2513

Uh oh!

Conversation

MikePaquette commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. What does this PR do?

2. Which ECS fields are affected/introduced?

Uh oh!

github-actions bot commented Aug 15, 2025

🤖 GitHub comments

Uh oh!

eyalkraft commented Aug 17, 2025

Uh oh!

MikePaquette commented Aug 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MikePaquette commented Aug 15, 2025 •

edited

Loading