-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501
Conversation
We already return an Optional, so callers already have to deal with missing values. Better return a missing value than throw an AssertionFailure... This will make error handling for bulk works easier, since we won't have to care about null responses anymore (it may happen in unit tests when mocks are incorrectly configured in particular).
…ing exceptions This will make it much easier to handle exceptions in CompletableFutures in the next commits. Granted, this makes the exception traces longer, but to be fair it only *adds* to the traces, so users won't have to scroll more unless they want to know more.
This is the first step toward making asynchronous requests more "reactive".
…WorkProcessor But still execute works sequentially for now.
…equenceBuilder Those two classes encapsulate the logic of bulking and building a sequence of works, making it easier to orchestrate works in many different ways (see the following commits). Also, compared the the previous way of executing works, this fixes the following issues: 1. In async mode, a failure will now only affect the changeset of the failing work, subsequent changesets will execute normally. And (that's the hard part) bulks can still span multiple changesets: each changeset will only be affected by failures from its own bulked works. 2. The stack traces of failures in bulked works are now much more similar to failures in non-bulked works. 3. That's just a side-effect, but bulked works can now return a result, though for now the result is ignored. This mainly means that if one day we need to inspect the result of bulked works (for statistics, in particular), it will be that much easier. 4. We now have thorough unit tests for work bulking and sequencing.
… inter-work dependency when running asynchronously This will allow more flexibility in work orchestration in the following commits.
If, between the end of the processing loop and the call to processingScheduled.set( false ) at the end of processing, another thread somehow managed to submit a changeset and call awaitCompletion(), then this thread ended up not waiting for its changeset, but only for the previous ones. This commit fixes the issue by avoiding the use of multiple instances of CountDownLatch, and instead relying on Phaser so that we can safely change what waiting threads are waiting for (i.e. we can just say "oh sorry, you were waiting for the previous runnable, but another one needs to be ran before I let you go"). Also part of the solution is systematically checking whether a new processing runnable must be scheduled before arriving at the phaser.
This could lead to better performance with large Elasticsearch connection pools when works affect multiple indexes.
Those works are executed out of order anyway, and the only way for the client to be sure they've been executed is to perform a flush (which is followed by a refresh), so there's no point trying to refresh for every single work.
For stream works, we only submit single-work changesets, which means the decision on whether to bulk the work or not will always happen immediately after each work, when we only have one work to bulk. Thus if we set the minimum to a value higher than 1, we would always decide not to start a bulk (because there would always be only one work to bulk), which would result in terrible performance.
…ked together We still don't fix the issue of works being executed out of order, because that's not our concern in this commit. Ultimately we may want to have one shared, serial orchestrator per index manager.
The downside is we may not be able to bulk as much as we used to, but there are a few advantages too: 1. We're finally able to force executing synchronous works in order (by using one serial orchestrator per index). Note that this may impact performance negatively, but at least we'll avoid some errors. 2. We can finally disable the 'refresh' in bulk API calls when 'refreshAfterWrite' is disabled for the index. Previously we couldn't, because this parameter can take a different value for each index manager.
If we don't, we run the risk of OutOfMemoryErrors when a huge stream of works is pushed continuously to the index manager (for instance, when mass indexing).
…strator There's no need for such a delay: * if works are submitted more slowly than they are processed, then there's no need to try doing more bulking (especially if it means adding an artificial delay) * if works are submitted faster than they are processed, then the queue should progressively fill up, we'll start doing bulking, and we'll end up ignoring the delay anyway.
…n-stream orchestrators
…k to the ES client * Include the orchestrator name in the error message * Do not use the interrupted exception as a cause in the SearchException, for the sake of brevity * And while we're at it, create the exception using the JBoss logger.
1. Throw an exception when trying to submit a changeset during shutdown 1. Throw an exception when trying to submit a changeset after shutdown
|
||
@Message(id = ES_BACKEND_MESSAGES_START_ID + 91, | ||
value = "The thread was interrupted while a changeset was being submitted to '%1$s'." | ||
+ " The changeset has been ignored." ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to replace "ignored" with "discarded". would you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
@@ -49,6 +51,8 @@ | |||
private final BlockingQueue<Changeset> changesetQueue; | |||
private final List<Changeset> changesetBuffer; | |||
private final AtomicBoolean processingScheduled; | |||
private boolean open; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a comment for the open field to point out it's guarded by the shutdownLock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
merged. Thanks! |
This PR is based on #1500 , which should be merged first.
Relevant ticket: https://hibernate.atlassian.net//browse/HSEARCH-2837
@Sanne As mentioned on HipChat, the interruptions you experienced were probably generated by JMH, not by Hibernate Search: the threads that got interrupted were not part of a pool owned by Hibernate Search, so Hibernate Search had no reason to even try interrupting them.Thus I think that in your case, throwing an error is the right choice: we need to inform the client that his attempt at submitting a new changeset failed, since it failed for a reason that is outside of our control.=> As discussed together, I removed the cause in the SearchException, which should make the stack trace a bit smaller.
We do have issues with overly verbose error messages in threads owned by Hibernate Search, but that's when processing changesets, not when submitting them (https://hibernate.atlassian.net/browse/HSEARCH-2832). However, errors during processing are usually passed to the ErrorHandler, so I'm not sure bypassing the ErrorHandler and logging a warning would be the best solution. Please comment on HSEARCH-2832 if you have an opinion about that issue: I personally don't know what to do about it...