Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Change Proposal] Adding support for Enrichment Policy in Packages #292

Open
P1llus opened this issue Mar 8, 2022 · 2 comments
Open

[Change Proposal] Adding support for Enrichment Policy in Packages #292

P1llus opened this issue Mar 8, 2022 · 2 comments
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team

Comments

@P1llus
Copy link
Member

P1llus commented Mar 8, 2022

Summary

Enrichment policy support opens up the possibility for Elastic to create packages/integrations which sole purpose is usually used for various enrichments, and can become a great addition for end-users if setup correctly.

In this issue I will explain one of these use-cases, specifically CVE vulnerability enrichment.

In creating a new integration package, to ingest CPE/CVE vulnerability data from NIST, it will result in a new data stream stored with product name, versions and their related CVE vulnerabilities.

If the package also has an enrichment policy attached, end-users are able to either create custom ingest pipelines that uses the enrich processor or it can be used for enrichment in other integrations, in this case OSQuery.

When OSQuery retrieves information from hosts with premade queries on retrieving software and its versions, we are able to enrich host information with available CVE's.

How Enrichment Policies Work

As explained in the documentation an enrich processor has a few requirements:

  1. Read index privileges for any indices used and the enrich_user [built-in role]https://www.elastic.co/guide/en/elasticsearch/reference/current/built-in-roles.html)

  2. The source index/datastream needs to exist, and have information already stored, therefore it should always run on X interval after the package has installed, and not instantly.

After this the following steps needs to be performed:

  1. Creating a new Enrich Policy
PUT /_enrich/policy/my-policy
{
  "match": {
    "indices": "PACKAGE_DATASTREAM",
    "match_field": "email",
    "enrich_fields": ["first_name", "last_name", "city", "zip", "state"]
  }
}
  1. Once the policy has been created, you need to execute the policy.

PUT /_enrich/policy/my-policy/_execute

This will take all the data from the configured indices in the above policy, and copy them to a read-only index starting with enrich-*

  1. This execute function will need to run on an interval, this is because the resulting enrich-* index is not updated automatically, and therefore would need some sort of scheduled execution of the API over a configured interval.

  2. A policy cannot be changed, so on changes to the policy or removal of the package, the policy could be deleted Though this could be an optional step if people might be using the existing enrich indices that already exist.

  3. Cleaning up enrich indices during deletion of packages?

Implementation in package-spec
This implementation is similar to the Elasticsearch Transform request: #23

Each package should have a file in:
<packageRoot>/data_stream/<data_stream>/elasticsearch/enrich/enrich.yml
The format of the file could be:

- policy:
    refresh_interval: 1h
    match:
      indices: PACKAGE_DATASTREAM
      match_field: software.version,
      enrich_fields:
        - category
        - kind
        - type

It would be useful but maybe not required, that the file could have configurable fields, similar to our *.yml.hbs files?

Some things to consider, but none required
This is a list of current "unknown behavior" that might have to be considered:

  1. What happens if we create the enrich policy and run it before the data stream has any data?
  2. What happens if an ingest pipeline references an enrich policy that does not exist? Does ignore_failure bypass this?
  3. What should and should not be removed if a package is either updated or deleted? Maybe a UI component to ask for removal of related enrich policy and indices?
  4. How does this interact with data streams in general compared to normal indices, and do we need to consider the possibility to have a different ILM policies when we might not want this to roll over as frequent, or at all?
@nicpenning
Copy link

I would like to see this functionality as well. I have second use case.

Ingesting MAC address vendor information would allow the enrichment of DHCP and other events to provide some context to what devices have connected to the network. This can be accomplished by implementing an enrichment policy as described above.

@kcreddy
Copy link
Contributor

kcreddy commented Aug 20, 2024

I have another usecase where I want to create ES|QL visualisations with enrichment for joining data from 2 datastreams on event.id. This would also require enrich policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs discussion Team:Ecosystem Label for the Packages Ecosystem team
Projects
None yet
Development

No branches or pull requests

4 participants