HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

yrodiere · 2017-08-04T12:50:35Z

This PR is based on #1500 , which should be merged first.

Relevant ticket: https://hibernate.atlassian.net//browse/HSEARCH-2837

@Sanne As mentioned on HipChat, the interruptions you experienced were probably generated by JMH, not by Hibernate Search: the threads that got interrupted were not part of a pool owned by Hibernate Search, so Hibernate Search had no reason to even try interrupting them.

Thus I think that in your case, throwing an error is the right choice: we need to inform the client that his attempt at submitting a new changeset failed, since it failed for a reason that is outside of our control.

=> As discussed together, I removed the cause in the SearchException, which should make the stack trace a bit smaller.

We do have issues with overly verbose error messages in threads owned by Hibernate Search, but that's when processing changesets, not when submitting them (https://hibernate.atlassian.net/browse/HSEARCH-2832). However, errors during processing are usually passed to the ErrorHandler, so I'm not sure bypassing the ErrorHandler and logging a warning would be the best solution. Please comment on HSEARCH-2832 if you have an opinion about that issue: I personally don't know what to do about it...

We already return an Optional, so callers already have to deal with missing values. Better return a missing value than throw an AssertionFailure... This will make error handling for bulk works easier, since we won't have to care about null responses anymore (it may happen in unit tests when mocks are incorrectly configured in particular).

…ing exceptions This will make it much easier to handle exceptions in CompletableFutures in the next commits. Granted, this makes the exception traces longer, but to be fair it only *adds* to the traces, so users won't have to scroll more unless they want to know more.

This is the first step toward making asynchronous requests more "reactive".

…WorkProcessor But still execute works sequentially for now.

…equenceBuilder Those two classes encapsulate the logic of bulking and building a sequence of works, making it easier to orchestrate works in many different ways (see the following commits). Also, compared the the previous way of executing works, this fixes the following issues: 1. In async mode, a failure will now only affect the changeset of the failing work, subsequent changesets will execute normally. And (that's the hard part) bulks can still span multiple changesets: each changeset will only be affected by failures from its own bulked works. 2. The stack traces of failures in bulked works are now much more similar to failures in non-bulked works. 3. That's just a side-effect, but bulked works can now return a result, though for now the result is ignored. This mainly means that if one day we need to inspect the result of bulked works (for statistics, in particular), it will be that much easier. 4. We now have thorough unit tests for work bulking and sequencing.

… inter-work dependency when running asynchronously This will allow more flexibility in work orchestration in the following commits.

If, between the end of the processing loop and the call to processingScheduled.set( false ) at the end of processing, another thread somehow managed to submit a changeset and call awaitCompletion(), then this thread ended up not waiting for its changeset, but only for the previous ones. This commit fixes the issue by avoiding the use of multiple instances of CountDownLatch, and instead relying on Phaser so that we can safely change what waiting threads are waiting for (i.e. we can just say "oh sorry, you were waiting for the previous runnable, but another one needs to be ran before I let you go"). Also part of the solution is systematically checking whether a new processing runnable must be scheduled before arriving at the phaser.

This could lead to better performance with large Elasticsearch connection pools when works affect multiple indexes.

…ntext impl

Those works are executed out of order anyway, and the only way for the client to be sure they've been executed is to perform a flush (which is followed by a refresh), so there's no point trying to refresh for every single work.

For stream works, we only submit single-work changesets, which means the decision on whether to bulk the work or not will always happen immediately after each work, when we only have one work to bulk. Thus if we set the minimum to a value higher than 1, we would always decide not to start a bulk (because there would always be only one work to bulk), which would result in terrible performance.

…ked together We still don't fix the issue of works being executed out of order, because that's not our concern in this commit. Ultimately we may want to have one shared, serial orchestrator per index manager.

The downside is we may not be able to bulk as much as we used to, but there are a few advantages too: 1. We're finally able to force executing synchronous works in order (by using one serial orchestrator per index). Note that this may impact performance negatively, but at least we'll avoid some errors. 2. We can finally disable the 'refresh' in bulk API calls when 'refreshAfterWrite' is disabled for the index. Previously we couldn't, because this parameter can take a different value for each index manager.

If we don't, we run the risk of OutOfMemoryErrors when a huge stream of works is pushed continuously to the index manager (for instance, when mass indexing).

…strator There's no need for such a delay: * if works are submitted more slowly than they are processed, then there's no need to try doing more bulking (especially if it means adding an artificial delay) * if works are submitted faster than they are processed, then the queue should progressively fill up, we'll start doing bulking, and we'll end up ignoring the delay anyway.

…n-stream orchestrators

…k to the ES client * Include the orchestrator name in the error message * Do not use the interrupted exception as a cause in the SearchException, for the sake of brevity * And while we're at it, create the exception using the JBoss logger.

1. Throw an exception when trying to submit a changeset during shutdown 1. Throw an exception when trying to submit a changeset after shutdown

Sanne · 2017-08-09T13:53:07Z

elasticsearch/src/main/java/org/hibernate/search/elasticsearch/logging/impl/Log.java

+
+	@Message(id = ES_BACKEND_MESSAGES_START_ID + 91,
+			value = "The thread was interrupted while a changeset was being submitted to '%1$s'."
+					+ " The changeset has been ignored." )


I'd like to replace "ignored" with "discarded". would you agree?

Sanne · 2017-08-09T13:56:25Z

...bernate/search/elasticsearch/processor/impl/BatchingSharedElasticsearchWorkOrchestrator.java

@@ -49,6 +51,8 @@
 	private final BlockingQueue<Changeset> changesetQueue;
 	private final List<Changeset> changesetBuffer;
 	private final AtomicBoolean processingScheduled;
+	private boolean open;


I'll add a comment for the open field to point out it's guarded by the shutdownLock

Sanne · 2017-08-09T14:24:47Z

merged. Thanks!

yrodiere added the Waiting for other pull request Based on another PR that should be merged first label Aug 4, 2017

yrodiere force-pushed the HSEARCH-2837 branch from ef5f160 to 282901c Compare August 4, 2017 15:55

yrodiere added 19 commits August 4, 2017 20:06

HSEARCH-2764 Use CompletableFuture in ElasticsearchWorks

82c22ce

This is the first step toward making asynchronous requests more "reactive".

HSEARCH-2764 Use CompletableFuture for orchestration in Elasticsearch…

f1f65b1

…WorkProcessor But still execute works sequentially for now.

HSEARCH-2764 Introduce ElasticsearchWorkOrchestrator to better manage…

71da8d5

… inter-work dependency when running asynchronously This will allow more flexibility in work orchestration in the following commits.

HSEARCH-2764 Use an orchestrator for synchronous safe processing

e49686d

HSEARCH-2764 Add a parallel orchestrator for streamed work

052d4af

This could lead to better performance with large Elasticsearch connection pools when works affect multiple indexes.

HSEARCH-2764 Clarify the purpose of each ElasticsearchWorkExecutionCo…

1430d51

…ntext impl

HSEARCH-2764 Allow synchronous works from different threads to be bul…

d7aeee1

…ked together We still don't fix the issue of works being executed out of order, because that's not our concern in this commit. Ultimately we may want to have one shared, serial orchestrator per index manager.

HSEARCH-2764 Limit the size of Elasticsearch work queues

0af8910

If we don't, we run the risk of OutOfMemoryErrors when a huge stream of works is pushed continuously to the index manager (for instance, when mass indexing).

HSEARCH-2764 Remove the now useless "sync" parameter when creating no…

6ad4bb9

…n-stream orchestrators

HSEARCH-2837 Secure the ES orchestrator shutdown process

2148627

1. Throw an exception when trying to submit a changeset during shutdown 1. Throw an exception when trying to submit a changeset after shutdown

yrodiere force-pushed the HSEARCH-2837 branch from 282901c to 2148627 Compare August 4, 2017 18:06

Sanne added Ready for review and removed Waiting for other pull request Based on another PR that should be merged first labels Aug 9, 2017

Sanne reviewed Aug 9, 2017

View reviewed changes

Sanne closed this Aug 9, 2017

yrodiere deleted the HSEARCH-2837 branch January 12, 2018 09:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

yrodiere commented Aug 4, 2017 •

edited

Loading

Sanne Aug 9, 2017

yrodiere Aug 9, 2017

Sanne Aug 9, 2017 •

edited

Loading

yrodiere Aug 9, 2017

Sanne commented Aug 9, 2017

HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

Conversation

yrodiere commented Aug 4, 2017 • edited Loading

Sanne Aug 9, 2017

Choose a reason for hiding this comment

yrodiere Aug 9, 2017

Choose a reason for hiding this comment

Sanne Aug 9, 2017 • edited Loading

Choose a reason for hiding this comment

yrodiere Aug 9, 2017

Choose a reason for hiding this comment

Sanne commented Aug 9, 2017

yrodiere commented Aug 4, 2017 •

edited

Loading

Sanne Aug 9, 2017 •

edited

Loading