Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle a default/request pipeline and a final pipeline with minimal additional overhead #93329

Merged
merged 16 commits into from
Jan 30, 2023

Conversation

joegallo
Copy link
Contributor

@joegallo joegallo commented Jan 27, 2023

Closes #81244
Closes #92843
Closes #93118

Tightens up the document handling aspects of executePipelines and its callees. innerExecute becomes trivial, and nearly drops out (renamed to executePipeline where it remains just to adapt handler shapes).

At a high level, the execution goes from:

- pipeline 1 (default/request pipeline):
  - parse json
  - execute processors
  - generate json
- pipeline 2 (final pipeline):
  - parse json
  - execute processors
  - generate json

to

- parse json
- pipeline 1 (default/request pipeline):
  - execute processors
- pipeline 2 (final pipeline):
  - execute processors
- generate json

The difference in the flame graph is pretty clear. Before:

Screen Shot 2023-01-25 at 3 29 39 PM

After:

Screen Shot 2023-01-27 at 12 59 31 PM

And the performance is much better, as one would expect, with the total time spent in any ingest code for the nightly security benchmark dropping from 4994128 to 3568490 millis -- a decrease of 29%.

This is the direct follow up to #93213, but I've been working up to over a while -- #93119 and #93120 added the tests that are now made passing by this PR, while #92203, #92308, #92455 laid some of the groundwork for the eventual document listener cleanup.

It used to make sense for this to live in the ingest service, because
we avoided allocating an ingest document if the pipeline was
empty. Now we already have the document regardless, so this can just
live in IngestDocument anyway.

In any case, this would be a rare and unusual thing to have happen at
all. I don't want to drop the logic completely, but I'm also not
worried about the performance implications of where it lives.
@joegallo joegallo added release highlight :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP >refactoring Team:Data Management Meta label for data/management team v8.7.0 labels Jan 27, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@joegallo joegallo added the >bug label Jan 27, 2023
@elastic elastic deleted a comment from elasticsearchmachine Jan 27, 2023
@elastic elastic deleted a comment from elasticsearchmachine Jan 27, 2023
@elastic elastic deleted a comment from elasticsearchmachine Jan 27, 2023
// shortcut if the pipeline is empty
if (pipeline.getProcessors().isEmpty()) {
handler.accept(this, null);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this going to potentially confuse people who are looking at ingest metrics? I would think it would be a pretty rare case -- how much does this optimization save us? Is it worth the potential future "why are my metrics wrong?" support tickets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be wrong about the metrics?

Copy link
Contributor Author

@joegallo joegallo Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also note that this is the same logic as before, it's just in a slightly different place. (See c2fbb08 for the deets.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I don't know what would be wrong about the metrics -- I thought I had traced where this would impact them last week, but now I have no idea what I was seeing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, it's certainly a valid question to ask and we are indeed in a maze of twisty passages all alike. 😄

Copy link
Member

@masseyke masseyke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@joegallo joegallo merged commit 3e3b271 into elastic:main Jan 30, 2023
@joegallo joegallo deleted the ingest-service-internal-rewrite branch January 30, 2023 16:48
onFinished.onResponse(null);
// update the index request's source and (potentially) cache the timestamp for TSDB
updateIndexRequestSource(indexRequest, ingestDocument);
cacheRawTimestamp(indexRequest, ingestDocument);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 nice work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Data Management/Ingest Node Execution or management of Ingest Pipelines including GeoIP release highlight Team:Data Management Meta label for data/management team v8.7.0
Projects
None yet
4 participants