Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve flexibility over index.required_pipeline #49247

Closed
roncohen opened this issue Nov 18, 2019 · 3 comments · Fixed by #49470
Closed

Improve flexibility over index.required_pipeline #49247

roncohen opened this issue Nov 18, 2019 · 3 comments · Fixed by #49470
Assignees
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement >feature

Comments

@roncohen
Copy link

roncohen commented Nov 18, 2019

Describe the feature:

Following up on #46847 there are a couple of cases where we want to ensure that a specific pipeline is run on any documents that are ingested into an index. For example, you may want to set the event.ingested timestamp or ensure that the name of the API Key used is present in the document.

At the same time, we want to give users the flexibility they currently have to use a pipeline of their choosing to process the incoming data. We have index.required_pipeline, but it doesn't come with the flexibility we'd like.

@skearns64 suggested:

Sounds like we need an "append" pipeline, or an option to required to be "run first or run last"

If "append pipeline" means that Elasticsearch will automatically run the "append pipeline" on every indexed document after the pipeline specified with the request has been run, it sounds like the "append pipeline" option would solve the use-cases I'm familiar with.

I've not heard a compelling use case for "run first", but they could exist.

some questions that come to mind:

  • Does "append pipeline" let users specify a list of pipelines to append or only a single pipeline. You can achieve the same functionality by combining pipelines, but I can imagine it would be convenient to be able to specify a list.
  • how will it work with index.default_pipeline and index.required_pipeline
  • I don't know that index.required_pipeline has any use case that index.append_pipeline does not solve, but that could be due to lack of context on my part

cc @ruflin @webmat @clintongormley @jasontedor @bytebilly

(first Elasticsearch issue! 🎉 )

@pgomulka pgomulka added the :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP label Nov 18, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Ingest)

@jasontedor jasontedor self-assigned this Nov 18, 2019
@webmat
Copy link

webmat commented Nov 18, 2019

One very important aspect of this added flexibility is specifically to let the user add their own pipelines around stack-provided pipelines without having to modify the existing pipeline.

I've been feeling that need for a while, and one way I've been thinking about this is to provide "before hooks" or "after hooks", where users can insert their own pipelines anywhere they need.

When users are forced to modify pipelines provided by products in the stack -- like Beats modules -- they're signing up to permanently having to re-apply their changes whenever they upgrade the product. Or worse, they won't remember, and whatever they improved will be lost when they upgrade.

Approaching this in this in a generic fashion like before/after hooks would let users work around not only provided stack pipelines, but also around their own team structure & areas of responsibility.

Consider this example:

  • A stack-provided ingest pipeline like the Filebeat Apache httpd module's is used for multiple web apps
  • The team managing a central pipeline hooks after the module's pipeline to perform additional work relevant to all deployments.
    • E.g. adjusting older Beats module outputs to newer versions of ECS
  • The team managing one of the Apache deployments have their own adjustments they want to do, before the default processing kicks in.
    • E.g. In a prior life I would append 20-ish kv after a default apache log

With this in mind, I think it would be great to offer the ability to hook before/after via the API call, and via the index setting.

@webmat
Copy link

webmat commented Nov 18, 2019

Modifying a stack pipeline is possible but is nasty, as you can see here (scroll to "Ingest Node Pipeline").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >enhancement >feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants