[Change Proposal] Adding support for Enrichment Policy in Packages #292

P1llus · 2022-03-08T10:55:44Z

Summary

Enrichment policy support opens up the possibility for Elastic to create packages/integrations which sole purpose is usually used for various enrichments, and can become a great addition for end-users if setup correctly.

In this issue I will explain one of these use-cases, specifically CVE vulnerability enrichment.

In creating a new integration package, to ingest CPE/CVE vulnerability data from NIST, it will result in a new data stream stored with product name, versions and their related CVE vulnerabilities.

If the package also has an enrichment policy attached, end-users are able to either create custom ingest pipelines that uses the enrich processor or it can be used for enrichment in other integrations, in this case OSQuery.

When OSQuery retrieves information from hosts with premade queries on retrieving software and its versions, we are able to enrich host information with available CVE's.

How Enrichment Policies Work

As explained in the documentation an enrich processor has a few requirements:

Read index privileges for any indices used and the enrich_user [built-in role]https://www.elastic.co/guide/en/elasticsearch/reference/current/built-in-roles.html)
The source index/datastream needs to exist, and have information already stored, therefore it should always run on X interval after the package has installed, and not instantly.

After this the following steps needs to be performed:

Creating a new Enrich Policy

PUT /_enrich/policy/my-policy
{
  "match": {
    "indices": "PACKAGE_DATASTREAM",
    "match_field": "email",
    "enrich_fields": ["first_name", "last_name", "city", "zip", "state"]
  }
}

Once the policy has been created, you need to execute the policy.

PUT /_enrich/policy/my-policy/_execute

This will take all the data from the configured indices in the above policy, and copy them to a read-only index starting with enrich-*

This execute function will need to run on an interval, this is because the resulting enrich-* index is not updated automatically, and therefore would need some sort of scheduled execution of the API over a configured interval.
A policy cannot be changed, so on changes to the policy or removal of the package, the policy could be deleted Though this could be an optional step if people might be using the existing enrich indices that already exist.
Cleaning up enrich indices during deletion of packages?

Implementation in package-spec
This implementation is similar to the Elasticsearch Transform request: #23

Each package should have a file in:
<packageRoot>/data_stream/<data_stream>/elasticsearch/enrich/enrich.yml
The format of the file could be:

- policy:
    refresh_interval: 1h
    match:
      indices: PACKAGE_DATASTREAM
      match_field: software.version,
      enrich_fields:
        - category
        - kind
        - type

It would be useful but maybe not required, that the file could have configurable fields, similar to our *.yml.hbs files?

Some things to consider, but none required
This is a list of current "unknown behavior" that might have to be considered:

What happens if we create the enrich policy and run it before the data stream has any data?
What happens if an ingest pipeline references an enrich policy that does not exist? Does ignore_failure bypass this?
What should and should not be removed if a package is either updated or deleted? Maybe a UI component to ask for removal of related enrich policy and indices?
How does this interact with data streams in general compared to normal indices, and do we need to consider the possibility to have a different ILM policies when we might not want this to roll over as frequent, or at all?

The text was updated successfully, but these errors were encountered:

nicpenning · 2023-07-31T11:44:20Z

I would like to see this functionality as well. I have second use case.

Ingesting MAC address vendor information would allow the enrichment of DHCP and other events to provide some context to what devices have connected to the network. This can be accomplished by implementing an enrichment policy as described above.

kcreddy · 2024-08-20T11:50:00Z

I have another usecase where I want to create ES|QL visualisations with enrichment for joining data from 2 datastreams on event.id. This would also require enrich policy.

P1llus added the discuss Issue needs discussion label Mar 8, 2022

joshdover mentioned this issue Mar 8, 2022

[Discuss] Installing package assets that require elevated privileges #293

Open

jlind23 added the Team:Ecosystem Label for the Packages Ecosystem team label Mar 24, 2022

jsoriano mentioned this issue Aug 26, 2022

Add fields to integrations by default elastic/elastic-package#949

Closed

jsoriano mentioned this issue Jul 31, 2023

Enrich Data and Policies bundled in Packages ? elastic/elastic-package#1375

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Change Proposal] Adding support for Enrichment Policy in Packages #292

[Change Proposal] Adding support for Enrichment Policy in Packages #292

P1llus commented Mar 8, 2022 •

edited

Loading

nicpenning commented Jul 31, 2023

kcreddy commented Aug 20, 2024

[Change Proposal] Adding support for Enrichment Policy in Packages #292

[Change Proposal] Adding support for Enrichment Policy in Packages #292

Comments

P1llus commented Mar 8, 2022 • edited Loading

nicpenning commented Jul 31, 2023

kcreddy commented Aug 20, 2024

P1llus commented Mar 8, 2022 •

edited

Loading