Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HSEARCH-2837 Clarify errors when interrupted during submission of work to the ES client #1501

Closed
wants to merge 19 commits into from

Conversation

yrodiere
Copy link
Member

@yrodiere yrodiere commented Aug 4, 2017

This PR is based on #1500 , which should be merged first.

Relevant ticket: https://hibernate.atlassian.net//browse/HSEARCH-2837

@Sanne As mentioned on HipChat, the interruptions you experienced were probably generated by JMH, not by Hibernate Search: the threads that got interrupted were not part of a pool owned by Hibernate Search, so Hibernate Search had no reason to even try interrupting them.

Thus I think that in your case, throwing an error is the right choice: we need to inform the client that his attempt at submitting a new changeset failed, since it failed for a reason that is outside of our control.

=> As discussed together, I removed the cause in the SearchException, which should make the stack trace a bit smaller.

We do have issues with overly verbose error messages in threads owned by Hibernate Search, but that's when processing changesets, not when submitting them (https://hibernate.atlassian.net/browse/HSEARCH-2832). However, errors during processing are usually passed to the ErrorHandler, so I'm not sure bypassing the ErrorHandler and logging a warning would be the best solution. Please comment on HSEARCH-2832 if you have an opinion about that issue: I personally don't know what to do about it...

@yrodiere yrodiere added the Waiting for other pull request Based on another PR that should be merged first label Aug 4, 2017
We already return an Optional, so callers already have to deal with
missing values. Better return a missing value than throw an
AssertionFailure...

This will make error handling for bulk works easier, since we won't have
to care about null responses anymore (it may happen in unit tests when
mocks are incorrectly configured in particular).
…ing exceptions

This will make it much easier to handle exceptions in CompletableFutures
in the next commits.

Granted, this makes the exception traces longer, but to be fair it only
*adds* to the traces, so users won't have to scroll more unless they
want to know more.
This is the first step toward making asynchronous requests more
"reactive".
…WorkProcessor

But still execute works sequentially for now.
…equenceBuilder

Those two classes encapsulate the logic of bulking and building a
sequence of works, making it easier to orchestrate works in many
different ways (see the following commits).

Also, compared the the previous way of executing works, this fixes the
following issues:

 1. In async mode, a failure will now only affect the changeset of the
    failing work, subsequent changesets will execute normally.
    And (that's the hard part) bulks can still span multiple changesets:
    each changeset will only be affected by failures from its own
    bulked works.
 2. The stack traces of failures in bulked works are now much more
    similar to failures in non-bulked works.
 3. That's just a side-effect, but bulked works can now return a result,
    though for now the result is ignored. This mainly means that
    if one day we need to inspect the result of bulked works
    (for statistics, in particular), it will be that much easier.
 4. We now have thorough unit tests for work bulking and sequencing.
… inter-work dependency when running asynchronously

This will allow more flexibility in work orchestration in the following
commits.
If, between the end of the processing loop and the call to
processingScheduled.set( false ) at the end of processing, another
thread somehow managed to submit a changeset and call awaitCompletion(),
then this thread ended up not waiting for its changeset, but only for
the previous ones.

This commit fixes the issue by avoiding the use of multiple instances of
CountDownLatch, and instead relying on Phaser so that we can safely
change what waiting threads are waiting for (i.e. we can just say
"oh sorry, you were waiting for the previous runnable, but another one
needs to be ran before I let you go"). Also part of the solution is
systematically checking whether a new processing runnable must be
scheduled before arriving at the phaser.
This could lead to better performance with large Elasticsearch
connection pools when works affect multiple indexes.
Those works are executed out of order anyway, and the only way for
the client to be sure they've been executed is to perform a flush (which
is followed by a refresh), so there's no point trying to refresh for
every single work.
For stream works, we only submit single-work changesets, which means
the decision on whether to bulk the work or not will always happen
immediately after each work, when we only have one work to bulk.
Thus if we set the minimum to a value higher than 1, we would always
decide not to start a bulk (because there would always be only one
work to bulk), which would result in terrible performance.
…ked together

We still don't fix the issue of works being executed out of order,
because that's not our concern in this commit.
Ultimately we may want to have one shared, serial orchestrator per index
manager.
The downside is we may not be able to bulk as much as we used to, but
there are a few advantages too:

1. We're finally able to force executing synchronous works in order
(by using one serial orchestrator per index). Note that this may impact
performance negatively, but at least we'll avoid some errors.
2. We can finally disable the 'refresh' in bulk API calls when
'refreshAfterWrite' is disabled for the index. Previously we couldn't,
because this parameter can take a different value for each index
manager.
If we don't, we run the risk of OutOfMemoryErrors when a huge stream
of works is pushed continuously to the index manager (for instance,
when mass indexing).
…strator

There's no need for such a delay:

 * if works are submitted more slowly than they are processed, then
there's no need to try doing more bulking (especially if it means adding
an artificial delay)
 * if works are submitted faster than they are processed, then the queue
should progressively fill up, we'll start doing bulking, and we'll end
up ignoring the delay anyway.
…k to the ES client

 * Include the orchestrator name in the error message
 * Do not use the interrupted exception as a cause in the
SearchException, for the sake of brevity
 * And while we're at it, create the exception using the JBoss logger.
1. Throw an exception when trying to submit a changeset during shutdown
1. Throw an exception when trying to submit a changeset after shutdown
@Sanne Sanne added Ready for review and removed Waiting for other pull request Based on another PR that should be merged first labels Aug 9, 2017

@Message(id = ES_BACKEND_MESSAGES_START_ID + 91,
value = "The thread was interrupted while a changeset was being submitted to '%1$s'."
+ " The changeset has been ignored." )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to replace "ignored" with "discarded". would you agree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@@ -49,6 +51,8 @@
private final BlockingQueue<Changeset> changesetQueue;
private final List<Changeset> changesetBuffer;
private final AtomicBoolean processingScheduled;
private boolean open;
Copy link
Member

@Sanne Sanne Aug 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a comment for the open field to point out it's guarded by the shutdownLock

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

@Sanne
Copy link
Member

Sanne commented Aug 9, 2017

merged. Thanks!

@Sanne Sanne closed this Aug 9, 2017
@yrodiere yrodiere deleted the HSEARCH-2837 branch January 12, 2018 09:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants