Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] improve handling of failures on first search #35269

Merged
merged 3 commits into from Nov 8, 2018

Conversation

hendrikmuhs
Copy link
Contributor

@hendrikmuhs hendrikmuhs commented Nov 5, 2018

Improve error handling in the Indexer if an exception occurs during saving state and/or during the very 1st retrieval (query execution)

Notes

  • Tagging this as Rollup, because roolup is the only user of this code at the moment.
  • I am not aware of any reported bugs in the wild, the 2 problems have been found as part reading the code and re-using it. So severity seems low, but I still classify this as bug

/cc @jimczi @polyfractal

@hendrikmuhs hendrikmuhs added >bug v7.0.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.6.0 labels Nov 5, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc));
} catch (Exception e) {
onFailure(exc);
onFailure(new RuntimeException("Failed to save State", e));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the odd part of this PR. We have 2 failures, the one this method has been called with and the other one which happened during save.

Any better idea than calling the callback twice?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hendrikmuhs ! I left one comment

@@ -252,7 +257,12 @@ public synchronized boolean maybeTriggerAsyncJob(long now) {
protected abstract void onAbort();

private void finishWithFailure(Exception exc) {
doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc));
try {
doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doSaveState is async so there is no need to try/catch this call. We ignore exceptions in doSaveState, this is ok IMO since saving the state here is just to ensure that we'll start from the last commit point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true for the rollup implementation (at the moment), but not from a generic Async-2-PhaseIndexer perspective if the implementation of doSaveState(...) is buggy. In such a case error handling is trapped in 2 exception handlers:

  1. any error codition, exception is thrown
  2. finishWithFailure called the 1st time, throws another exception from doSaveState
  3. exception is handled and calls finishWithFailure a 2nd time
  4. finishAndSetState throws Indexer job encountered an illegal state [STARTED] as step 2 was incomplete
  5. no handler for this exception

Just want to make sure the problem is understood (the unit test should also show it). I admit it's somewhat made up a bit, assuming a buggy doSaveState.

Alternatively I could just accept the fact that doSaveState simply has to be implemented so it never throws, for which case I would be fine with a small doc fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should save the state at the end of the job. In fact the indexer should never save any state, this should be done outside of the task. I have a branch that does that but it's not ready yet. In the mean time I think it's fine to just ignore the exceptions thrown by doSaveState and just call onFailure with the real one (the one thrown by the search/bulk).

@hendrikmuhs
Copy link
Contributor Author

@jimczi I removed the scenario of the broken doSaveState, so you can re-factor/fix this issue as part of your PR.

Can you ping me when your PR is ready? It will affect me on my feature branch.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hendrikmuhs !

Can you ping me when your PR is ready? It will affect me on my feature branch.

Sure, I'll do

@hendrikmuhs hendrikmuhs changed the title [Rollup] improve handling of failures on save and first search [Rollup] improve handling of failures on first search Nov 6, 2018
@hendrikmuhs hendrikmuhs merged commit 433469d into elastic:master Nov 8, 2018
hendrikmuhs pushed a commit that referenced this pull request Nov 8, 2018
Improve error handling in the Indexer if an exception occurs during the very 1st retrieval (query execution)
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.6.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants