HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

yrodiere · 2019-10-01T11:13:34Z

https://hibernate.atlassian.net/browse/HSEARCH-3084

This brings two main changes:

On startup, the initialization works (creating/validating ES indexes, creating filesystem directories for Lucene indexes) is now executed in parallel for all indexes.
On shutdown, we now stop accepting works for all indexes, then we wait for ongoing works to finish executing. Before this PR, we were stopping and waiting for each index in turn, which would lead to less-than-ideal behavior when multiple indexes share a single work queue (it's the case for Elasticsearch when mass indexing).

I was hoping that item 1 would improve startup performance with Elasticsearch when there are many indexes, since we would create all indexes in parallel instead of one after another, and each creation takes about 200ms, not counting network latency.
It seems I was wrong: when I test with a single-instance ES cluster, there is absolutely no difference, and when I test with a 5-instance cluster (3 masters, 2 replicas) and 8 indexes, it only takes ~25% less time. I'd have expected something like 80% less.

The problem seems to be that Elasticsearch nodes use some sort of global lock when they create indexes, so even if I send many index creations in parallel, they are executed one after another...

I would be inclined to merge this PR anyway, for these reasons:

I only tested on the loopback interface (ES running locally and accessed through 127.0.0.1). ES requests other than index creations (get index health, get index settings, ...) are sent, executed, and the response received in about 5ms. Non-loopback network interfaces will definitely show more latency, and in these cases I think sending requests in parallel could improve performance slightly.
I think the new design makes more sense, especially on shutdown: the start/pre-stop/stop methods will be useful when we decide to allow starting/stopping index managers at runtime.
The BatchingExecutor class is now much simpler.
We're a bit closer to being able to use the Lucene backend in Quarkus, since a few filesystem operations are now delayed until the second phase of bootstrap (when Quarkus starts its native image) instead of being executed during the first phase (when Quarkus compiles the application).

fax4ever

Great work!
You also improve a lot the existing code!
I've just few marginal doubts on it.

...ava/org/hibernate/search/backend/elasticsearch/index/impl/ElasticsearchIndexManagerImpl.java

...ibernate/search/backend/lucene/orchestration/impl/LuceneReadWorkOrchestratorImplementor.java

…ng a single REST API call

Necessary because of ES schema validation in particular.

…in Lucene index managers

… started in Lucene index managers

…ntContexts by default

…agers Make sure to always include the shard ID, in particular.

There's no reason to go through an interface exposed in a separate package: the consumer and the implementation of the createWriteOrchestrator() method are located in the same package.

1. Get rid of the Phaser and use much simpler code. 2. Expose a CompletableFuture<?> to wait for full completion (will be useful in the next commits).

Mainly, this means that on shutdown, we'll stop accepting works for all backends/index managers immediately, and *then* we'll wait for ongoing works to complete. The improvement over the previous behavior will probably be negligible, but what's more important is that the path towards an API for explicitly starting/pre-stopping/stopping index managers is now clearer.

It's not a lot, but it's better than nothing.

… a previous workset failed The problem is unrelated to this PR, but was detected thanks to the tests I added to make sure the new implementation works correctly.

yrodiere · 2019-10-02T15:14:44Z

Rebased and addressed your comment, thanks!
Waiting for the build to end, then I'll merge.

yrodiere · 2019-10-03T06:40:49Z

Merged, thanks!

yrodiere added the Ready for review label Oct 1, 2019

fax4ever self-requested a review October 2, 2019 08:26

fax4ever self-assigned this Oct 2, 2019

fax4ever approved these changes Oct 2, 2019

View reviewed changes

yrodiere added 15 commits October 2, 2019 17:02

HSEARCH-3084 Retrieve Elasticsearch metadata (mapping + settings) usi…

621bc45

…ng a single REST API call

HSEARCH-3084 Make failure collectors thread-safe

3ccb9cd

Necessary because of ES schema validation in particular.

HSEARCH-3084 Make the Elasticsearch lifecycle management asynchronous

e42058b

HSEARCH-3084 Delay shard starting until after all shards are created …

9d90de9

…in Lucene index managers

HSEARCH-3084 Delay directory creation/initialization until shards are…

be6446f

… started in Lucene index managers

HSEARCH-3084 Do not display the "Context: " prefix when rendering Eve…

51cbe1f

…ntContexts by default

HSEARCH-3084 Handle index names more consistently in Lucene index man…

bb8ca2f

…agers Make sure to always include the shard ID, in particular.

HSEARCH-3084 Simplify Lucene write orchestrator creation

350270f

There's no reason to go through an interface exposed in a separate package: the consumer and the implementation of the createWriteOrchestrator() method are located in the same package.

HSEARCH-3084 Make the Lucene directory initialization asynchronous

bf74917

HSEARCH-3084 Start index managers in parallel

841c435

HSEARCH-3084 Simplify BatchingExecutor

d9889a2

1. Get rid of the Phaser and use much simpler code. 2. Expose a CompletableFuture<?> to wait for full completion (will be useful in the next commits).

HSEARCH-3084 Add test dependency to awaitility for asynchronous testing

266198e

HSEARCH-3084 Add basic unit tests for BatchingExecutor

ea3a46e

It's not a lot, but it's better than nothing.

HSEARCH-3084 Always execute all worksets in BatchingExecutor, even if…

927af64

… a previous workset failed The problem is unrelated to this PR, but was detected thanks to the tests I added to make sure the new implementation works correctly.

yrodiere force-pushed the HSEARCH-3084 branch from 4a6d350 to 927af64 Compare October 2, 2019 15:13

yrodiere added +1 just push it! and removed Ready for review labels Oct 2, 2019

yrodiere merged commit 3b91ab8 into hibernate:master Oct 3, 2019

yrodiere deleted the HSEARCH-3084 branch October 25, 2019 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

yrodiere commented Oct 1, 2019

fax4ever left a comment

yrodiere commented Oct 2, 2019

yrodiere commented Oct 3, 2019

HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

HSEARCH-3084 Initialize and close index managers / backends in parallel #2109

Conversation

yrodiere commented Oct 1, 2019

fax4ever left a comment

Choose a reason for hiding this comment

yrodiere commented Oct 2, 2019

yrodiere commented Oct 3, 2019