Add the ability to require an ingest pipeline #46847

jasontedor · 2019-09-18T23:30:41Z

This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.

elasticmachine · 2019-09-18T23:30:43Z

Pinging @elastic/es-core-features

jakelandis · 2019-09-19T00:31:59Z

@jasontedor could you add a near clone of https://github.com/elastic/elasticsearch/blob/master/modules/ingest-common/src/test/resources/rest-api-spec/test/ingest/200_default_pipeline.yml to help test required pipelines from the various ways to issue index request.

martijnvg

Looks good. I left a few comments.

server/src/test/java/org/elasticsearch/index/RequiredPipelineIT.java

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

jasontedor · 2019-09-19T15:22:05Z

@elasticmachine run elasticsearch-ci/2

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java

jakelandis

LGTM - did some manual testing with multi-node forwarding and all works as expected.

martijnvg

LGTM

This commit adds the ability to require an ingest pipeline on an index. Today we can have a default pipeline, but that could be overridden by a request pipeline parameter. This commit introduces a new index setting index.required_pipeline that acts similarly to index.default_pipeline, except that it can not be overridden by a request pipeline parameter. Additionally, a default pipeline and a request pipeline can not both be set. The required pipeline can be set to _none to ensure that no pipeline ever runs for index requests on that index.

Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from elastic#46847

Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847

cwurm · 2019-10-08T13:19:52Z

Being able to force data to run through a pipeline before indexing is really useful. However, I worry about making the required pipeline the only one that executes when set.

As a concrete example, we've been talking about using the ingest timestamp instead of the event timestamp for running detection rules (searches/queries, but also ML jobs) in SIEM. This would make sure that we always consider all newly arrived data and are never fooled, e.g. by an attacker manipulating system time to send data pretending to be from the distant past or future.

For that, a required ingest pipeline with a set processor setting event.ingested to {{_ingest.timestamp}} would work. We could have all Beats setup such a pipeline on all the Beats indices. However, we would also still need the ability to have another pipeline execute before the required one, e.g. the data-parsing pipelines of any enabled Filebeat modules.

The required_pipeline could always execute last, but allow any other pipeline (default_pipeline or specified in the request) to run before it. What do you think?

/cc @tsg @MikePaquette @randomuserid

randomuserid · 2019-10-08T15:50:18Z

I am not sure I follow - is this a trade off between regular field parsing vs. having an ingest timestamp and using sort of alternative parsing scheme? For signal purposes, if we are going to use ingest timestamps, we would need to follow the general convention of giving each event both timestamps - ingest time and event time (the latter being the timestamp in the original event being ingested.) We would also need the events to be parsed into ECS fields as they are now in order for signals to evaluate them.

Independent of the ingest time idea, we always need the original event timestamps in order to create forensic timelines as analysts work on cases. If we had to make a Sophie’s choice, the original timestamp is more critical, even if it means running long and expensive searches in order to ensure solving for correctness.

cwurm · 2019-10-08T15:53:41Z

@randomuserid In my example, the ingest timestamp would not replace the event timestamp - the event timestamp would still be in @timestamp, the ingest timestamp in event.ingested (field name tbd). Let's continue the discussion about this concrete use case off this PR.

randomuserid · 2019-10-08T16:38:03Z

OK what are are applications for the ingest pipeline - maybe the ability to make a signal on an event tagged as important or special by an endpoint agent without consideration of time? As an alternative method of ensuring important events with weird timestamps are made into signals?

cwurm · 2019-10-08T20:34:03Z

@randomuserid It could be that, yeah. More straightforward maybe is adding an ingestion timestamp, or dropping a field (e.g. for privacy reasons), and probably many other things. Today, this kind of central control is often exercised in centrally managed Logstash pipelines, but required_pipeline would allow doing it in Elasticsearch as well.

The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (#46847) Resolves: #49106

The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (elastic#46847) Resolves: elastic#49106

The changes add more granularity for identiying the data ingestion user. The ingest pipeline can now be configure to record authentication realm and type. It can also record API key name and ID when one is in use. This improves traceability when data are being ingested from multiple agents and will become more relevant with the incoming support of required pipelines (#46847) Resolves: #49106

It's about *final* pipelines (not *required* pipelines) -- see elastic#46847 and elastic#49470 for the history here, 'required' pipelines were renamed to 'final' pipelines.

jasontedor added >enhancement :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP v8.0.0 v7.5.0 labels Sep 18, 2019

Add some REST tests

c636e93

jasontedor added 2 commits September 18, 2019 22:28

Fix double processing

5ddab06

Add assertion

3d618a7

jasontedor requested review from jakelandis and martijnvg September 19, 2019 02:36

Fix NPE in tests

e044c1c

martijnvg reviewed Sep 19, 2019

View reviewed changes

Always mark as resolved

01416c1

jasontedor requested a review from martijnvg September 19, 2019 15:08

martijnvg reviewed Sep 19, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java Show resolved Hide resolved

jakelandis reviewed Sep 19, 2019

View reviewed changes

server/src/main/java/org/elasticsearch/action/bulk/TransportBulkAction.java Show resolved Hide resolved

jakelandis approved these changes Sep 19, 2019

View reviewed changes

martijnvg approved these changes Sep 19, 2019

View reviewed changes

jasontedor merged commit 19b710a into elastic:master Sep 19, 2019

jasontedor deleted the required-pipeline branch September 19, 2019 20:38

jasontedor mentioned this pull request Sep 19, 2019

Move pipelines resolved assertion #46892

Merged

martijnvg mentioned this pull request Sep 24, 2019

Make ingest pipeline resolution logic unit testable #47026

Merged

martijnvg added a commit that referenced this pull request Sep 25, 2019

Make ingest pipeline resolution logic unit testable (#47026)

737376d

Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847

martijnvg added a commit that referenced this pull request Sep 25, 2019

Make ingest pipeline resolution logic unit testable (#47026)

eef1ba3

Extracted ingest pipeline resolution logic into a static method and added unit tests for pipeline resolution logic. Followup from #46847

tvernum mentioned this pull request Nov 18, 2019

Expose API key name to the ingest pipeline #49106

Closed

roncohen mentioned this pull request Nov 18, 2019

Improve flexibility over index.required_pipeline #49247

Closed

Mpdreamz mentioned this pull request Nov 19, 2019

[meta] 7.5 release elastic/elasticsearch-net#4232

Closed

24 tasks

codebrain mentioned this pull request Dec 16, 2019

Add the ability to require an ingest pipeline. elastic/elasticsearch-net#4274

Merged

ywangd mentioned this pull request Jan 22, 2020

Expose API key name to the ingest pipeline #51305

Merged

ywangd mentioned this pull request Feb 10, 2020

Expose more authentication info to ingest pipeline (#51305) #52119

Merged

codebrain mentioned this pull request Apr 1, 2020

7.7.0 meta ticket (Part 2) elastic/elasticsearch-net#4533

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add the ability to require an ingest pipeline #46847

Add the ability to require an ingest pipeline #46847

jasontedor commented Sep 18, 2019

elasticmachine commented Sep 18, 2019

jakelandis commented Sep 19, 2019

martijnvg left a comment

jasontedor commented Sep 19, 2019

jakelandis left a comment

martijnvg left a comment

cwurm commented Oct 8, 2019

randomuserid commented Oct 8, 2019

cwurm commented Oct 8, 2019 •

edited

Loading

randomuserid commented Oct 8, 2019

cwurm commented Oct 8, 2019

Add the ability to require an ingest pipeline #46847

Add the ability to require an ingest pipeline #46847

Conversation

jasontedor commented Sep 18, 2019

elasticmachine commented Sep 18, 2019

jakelandis commented Sep 19, 2019

martijnvg left a comment

Choose a reason for hiding this comment

jasontedor commented Sep 19, 2019

jakelandis left a comment

Choose a reason for hiding this comment

martijnvg left a comment

Choose a reason for hiding this comment

cwurm commented Oct 8, 2019

randomuserid commented Oct 8, 2019

cwurm commented Oct 8, 2019 • edited Loading

randomuserid commented Oct 8, 2019

cwurm commented Oct 8, 2019

cwurm commented Oct 8, 2019 •

edited

Loading