Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

Merged
merged 15 commits into from Oct 3, 2019

Conversation

yrodiere
Copy link
Member

@yrodiere yrodiere commented Oct 1, 2019

https://hibernate.atlassian.net/browse/HSEARCH-3084

This brings two main changes:

  1. On startup, the initialization works (creating/validating ES indexes, creating filesystem directories for Lucene indexes) is now executed in parallel for all indexes.
  2. On shutdown, we now stop accepting works for all indexes, then we wait for ongoing works to finish executing. Before this PR, we were stopping and waiting for each index in turn, which would lead to less-than-ideal behavior when multiple indexes share a single work queue (it's the case for Elasticsearch when mass indexing).

I was hoping that item 1 would improve startup performance with Elasticsearch when there are many indexes, since we would create all indexes in parallel instead of one after another, and each creation takes about 200ms, not counting network latency.
It seems I was wrong: when I test with a single-instance ES cluster, there is absolutely no difference, and when I test with a 5-instance cluster (3 masters, 2 replicas) and 8 indexes, it only takes ~25% less time. I'd have expected something like 80% less.

The problem seems to be that Elasticsearch nodes use some sort of global lock when they create indexes, so even if I send many index creations in parallel, they are executed one after another...

I would be inclined to merge this PR anyway, for these reasons:

  1. I only tested on the loopback interface (ES running locally and accessed through 127.0.0.1). ES requests other than index creations (get index health, get index settings, ...) are sent, executed, and the response received in about 5ms. Non-loopback network interfaces will definitely show more latency, and in these cases I think sending requests in parallel could improve performance slightly.
  2. I think the new design makes more sense, especially on shutdown: the start/pre-stop/stop methods will be useful when we decide to allow starting/stopping index managers at runtime.
  3. The BatchingExecutor class is now much simpler.
  4. We're a bit closer to being able to use the Lucene backend in Quarkus, since a few filesystem operations are now delayed until the second phase of bootstrap (when Quarkus starts its native image) instead of being executed during the first phase (when Quarkus compiles the application).

@fax4ever fax4ever self-requested a review October 2, 2019 08:26
@fax4ever fax4ever self-assigned this Oct 2, 2019
Copy link
Contributor

@fax4ever fax4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
You also improve a lot the existing code!
I've just few marginal doubts on it.

Necessary because of ES schema validation in particular.
…agers

Make sure to always include the shard ID, in particular.
There's no reason to go through an interface exposed in a separate
package: the consumer and the implementation of the
createWriteOrchestrator() method are located in the same package.
1. Get rid of the Phaser and use much simpler code.
2. Expose a CompletableFuture<?> to wait for full completion (will be
useful in the next commits).
Mainly, this means that on shutdown, we'll stop accepting works
for all backends/index managers immediately, and *then* we'll wait for
ongoing works to complete.

The improvement over the previous behavior will probably be negligible,
but what's more important is that the path towards an API for explicitly
starting/pre-stopping/stopping index managers is now clearer.
It's not a lot, but it's better than nothing.
… a previous workset failed

The problem is unrelated to this PR, but was detected thanks to the tests
I added to make sure the new implementation works correctly.
@yrodiere
Copy link
Member Author

yrodiere commented Oct 2, 2019

Rebased and addressed your comment, thanks!
Waiting for the build to end, then I'll merge.

@yrodiere yrodiere merged commit 3b91ab8 into hibernate:master Oct 3, 2019
@yrodiere
Copy link
Member Author

yrodiere commented Oct 3, 2019

Merged, thanks!

@yrodiere yrodiere deleted the HSEARCH-3084 branch October 25, 2019 17:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants