Adding ability to auto-install ingest pipelines #95782

eyalkoren · 2023-05-03T12:32:12Z

Adding the ability to auto-install ingest pipelines and refer to them from index templates through index.default_pipeline and index.final_pipeline.

This would enable required capabilities like #95551 and #95522.

…istry

elasticsearchmachine · 2023-05-03T12:34:11Z

Hi @eyalkoren, I've created a changelog YAML for you.

elasticsearchmachine · 2023-05-03T12:34:11Z

Pinging @elastic/es-data-management (Team:Data Management)

jbaiera

Small comment for the forbidden APIs check - It does look like the CI is failing on some templates not being ready, might be worth trying to reproduce locally?

...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/template/IndexTemplateRegistry.java

jbaiera

Doh! Put the previous comment on the wrong file 😵‍💫

...gin/core/src/test/java/org/elasticsearch/xpack/core/template/IndexTemplateRegistryTests.java

eyalkoren · 2023-05-04T06:55:27Z

@elastic/enterprise-search this PR adds the same capability you added through #95198, only to core. It does pretty much the exact same thing, but I also added a validation step to component templates, so that those are not allowed to be registered if they are dependent on an ingest pipeline that is not yet in place. I assume this doesn't break any assumption you are relying on, however it did break AnalyticsTemplateRegistryTests, because behavioral_analytics-events-settings.json has a final_pipeline dependency. Please review my fix for that.

…istry

afoucret · 2023-05-04T07:57:50Z

@eyalkoren I appreciate that you pick this change to be part of the core. I was about to prepare a PR for this now I have more bandwidth.

I will review the issue later today but would like to highlight two recently merged PRs to get sure these fetaures are part of the new implementation:

Checking that all nodes in the cluster are >= 8.8.0 before installing… #95780: A check for the cluster nodes version. It allows to install the pipelines only when all node in the cluster satisfy a minimal version. This have been added to prevent failure when using a new backward compatible feature in the pipelines during rolling upgrade
[8.8] Check if an analytics event data stream exists before installing pipeline. (#95621) #95775: We have added the ability to wait for a specific condition to be satisfied before installing the pipeline. Here it the existence of a data stream using it. It kind of conflicting with the validation you are doing. This was originally added to prevent geoip database to be downloaded when no analytics collection exists yet.

eyalkoren · 2023-05-04T08:34:37Z

@afoucret thanks for taking a look!
I know too little to judge about this, so input from @jbaiera and @dakrone will be required here.

My take on the two:

Checking that all nodes in the cluster are >= 8.8.0 before installing… #95780 - sounds like a generic thing, so probably makes sense to add. I will gladly add this, I will only wait for approval from the DM team
[8.8] Check if an analytics event data stream exists before installing pipeline. (#95621) #95775 - this actually sounds like a more case-specific issue and not sure the general logic is suitable for the core. I think it makes sense to install dependencies in order. What would happen if a template T refers to a pipeline P and both are added at the same cluster-change-event handling and T gets installed slightly before P (components are added asynchronously)? Could it be that an incoming request will cause the first backing index to use settings that don't include the pipeline and the following indices will? Can you think of a different event to trigger the late pipeline addition, and use a pipeline processor with ignore_missing_pipeline: true in the pipeline you register before registering the template?

eyalkoren · 2023-05-04T13:10:52Z

The only failing test now is EnterpriseSearchRestIT, which probably cannot be fixed without properly addressing @afoucret's comment above.
We either need to change the recently added Enterprise Search check, or the verification added within this PR.

@jbaiera @dakrone @felixbarny - your thoughts on this will be appreciated.

jbaiera · 2023-05-05T16:50:39Z

I don't think #95780 will be required for the index registry work since the StackTemplateRegistry requires the items to be installed on the master node only, which is guaranteed to be the latest version in the cluster based on our upgrade guidance.

As for the pipeline jam - I'm not sure I understand how this works today. Is the data stream created via an explicit API call or when a document is first ingested? When a document arrives for the data stream and it doesn't exist, the ingest service will resolve the template for the index the document would go into and uses the pipelines defined on the template, but if the pipelines only exist when the data stream exists then there will be no pipeline definition for the document and the ingest will fail, thus never creating the data stream. I'm assuming that the data stream is created via some other API call or else I'm missing something. @afoucret can you clarify this?

Assuming everything does work just fine otherwise, then we could modify the getPipeline method to accept the current cluster state as an argument so that it can modify which pipelines to return based on which resources already exist. Since this check is done on every cluster update we could easily detect the first time the dependent data stream is present and conditionally add the pipeline to the list to be installed.

eyalkoren · 2023-05-07T06:08:58Z

I don't think #95780 will be required for the index registry work since the StackTemplateRegistry requires the items to be installed on the master node only, which is guaranteed to be the latest version in the cluster based on our upgrade guidance.

Hmm, is that what we want though? Can this (applying pipeline only on master node) cause a scalability issue? Does this mean that the master is the only node with ingest role, or that only this pipeline is installed on it?
I am not knowledgeable enough about the finer details.

…istry

.../qa/rest/src/yamlRestTest/java/org/elasticsearch/xpack/entsearch/EnterpriseSearchRestIT.java

jbaiera

LGTM, thanks for the iterations!

eyalkoren added 2 commits May 3, 2023 15:23

Adding ability to auto-install ingest pipelines through index templates

8934c9e

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

4aaa1ff

…istry

eyalkoren changed the title ~~Adding ability to auto-install ingest pipelines through index templates~~ Adding ability to auto-install ingest pipelines May 3, 2023

eyalkoren self-assigned this May 3, 2023

elasticsearchmachine added needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v8.9.0 labels May 3, 2023

eyalkoren added >feature and removed needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team v8.9.0 labels May 3, 2023

elasticsearchmachine added the needs:triage Requires assignment of a team area label label May 3, 2023

eyalkoren added :Data Management/Data streams Data streams and their lifecycles v8.9.0 and removed needs:triage Requires assignment of a team area label labels May 3, 2023

elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 3, 2023

eyalkoren added 3 commits May 3, 2023 15:34

Update docs/changelog/95782.yaml

8d09963

Update changelog summary

0240b07

Guarding from nulls

5354b7c

This was referenced May 3, 2023

[Logs+] Assign a default @timestamp if missing #95551

Closed

[Logs+] Add JSON parsing pipeline #95522

Closed

[Logs+] Make default logs-*-* pipeline customizable #95537

Closed

dakrone requested a review from jbaiera May 3, 2023 16:21

jbaiera reviewed May 3, 2023

View reviewed changes

...k/plugin/core/src/main/java/org/elasticsearch/xpack/core/template/IndexTemplateRegistry.java Show resolved Hide resolved

jbaiera reviewed May 3, 2023

View reviewed changes

...gin/core/src/test/java/org/elasticsearch/xpack/core/template/IndexTemplateRegistryTests.java Outdated Show resolved Hide resolved

eyalkoren added 2 commits May 4, 2023 09:08

Avoid using forbidden API

ddef98e

Fixing AnalyticsTemplateRegistryTests to pass index template validation

7ee1694

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

e82e135

…istry

Fixing validation when IngestMetadata is null in cluster state

481f132

Merge remote-tracking branch 'upstream/main' into ingest-pipeline-reg…

3f89462

…istry

eyalkoren mentioned this pull request May 8, 2023

EnterpriseSearchRestIT failure #95917

Closed

Disabling EnterpriseSearchRestIT

0716d24

eyalkoren requested a review from jbaiera May 8, 2023 14:05

jbaiera reviewed May 8, 2023

View reviewed changes

.../qa/rest/src/yamlRestTest/java/org/elasticsearch/xpack/entsearch/EnterpriseSearchRestIT.java Outdated Show resolved Hide resolved

Fix test disabling method

9d1fab6

eyalkoren requested a review from jbaiera May 9, 2023 05:02

jbaiera approved these changes May 9, 2023

View reviewed changes

eyalkoren merged commit 1dda989 into elastic:main May 10, 2023
12 checks passed

eyalkoren deleted the ingest-pipeline-registry branch May 10, 2023 03:47

joemcelroy mentioned this pull request May 15, 2023

[Behavioral Analytics] Analytics pipeline to the index template registry #96104

Merged

eyalkoren mentioned this pull request May 23, 2023

[Discussion] Naming convention for builtin pipelines #96267

Closed

eyalkoren mentioned this pull request Sep 12, 2023

x-pack/plugin/apm: introduce x-pack-apm plugin #97546

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ability to auto-install ingest pipelines #95782

Adding ability to auto-install ingest pipelines #95782

eyalkoren commented May 3, 2023 •

edited

elasticsearchmachine commented May 3, 2023

elasticsearchmachine commented May 3, 2023

jbaiera left a comment

jbaiera left a comment

eyalkoren commented May 4, 2023

afoucret commented May 4, 2023

eyalkoren commented May 4, 2023

eyalkoren commented May 4, 2023

jbaiera commented May 5, 2023

eyalkoren commented May 7, 2023 •

edited

jbaiera left a comment

Adding ability to auto-install ingest pipelines #95782

Adding ability to auto-install ingest pipelines #95782

Conversation

eyalkoren commented May 3, 2023 • edited

elasticsearchmachine commented May 3, 2023

elasticsearchmachine commented May 3, 2023

jbaiera left a comment

Choose a reason for hiding this comment

jbaiera left a comment

Choose a reason for hiding this comment

eyalkoren commented May 4, 2023

afoucret commented May 4, 2023

eyalkoren commented May 4, 2023

eyalkoren commented May 4, 2023

jbaiera commented May 5, 2023

eyalkoren commented May 7, 2023 • edited

jbaiera left a comment

Choose a reason for hiding this comment

eyalkoren commented May 3, 2023 •

edited

eyalkoren commented May 7, 2023 •

edited