New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] improve handling of failures on first search #35269
[Rollup] improve handling of failures on first search #35269
Conversation
Pinging @elastic/es-search-aggs |
doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc)); | ||
} catch (Exception e) { | ||
onFailure(exc); | ||
onFailure(new RuntimeException("Failed to save State", e)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the odd part of this PR. We have 2 failures, the one this method has been called with and the other one which happened during save.
Any better idea than calling the callback twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmuhs ! I left one comment
@@ -252,7 +257,12 @@ public synchronized boolean maybeTriggerAsyncJob(long now) { | |||
protected abstract void onAbort(); | |||
|
|||
private void finishWithFailure(Exception exc) { | |||
doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc)); | |||
try { | |||
doSaveState(finishAndSetState(), position.get(), () -> onFailure(exc)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doSaveState
is async so there is no need to try/catch this call. We ignore exceptions in doSaveState
, this is ok IMO since saving the state here is just to ensure that we'll start from the last commit point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true for the rollup implementation (at the moment), but not from a generic Async-2-PhaseIndexer perspective if the implementation of doSaveState(...) is buggy. In such a case error handling is trapped in 2 exception handlers:
- any error codition, exception is thrown
finishWithFailure
called the 1st time, throws another exception fromdoSaveState
- exception is handled and calls
finishWithFailure
a 2nd time finishAndSetState
throwsIndexer job encountered an illegal state [STARTED]
as step 2 was incomplete- no handler for this exception
Just want to make sure the problem is understood (the unit test should also show it). I admit it's somewhat made up a bit, assuming a buggy doSaveState
.
Alternatively I could just accept the fact that doSaveState
simply has to be implemented so it never throws, for which case I would be fine with a small doc fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should save the state at the end of the job. In fact the indexer should never save any state, this should be done outside of the task. I have a branch that does that but it's not ready yet. In the mean time I think it's fine to just ignore the exceptions thrown by doSaveState
and just call onFailure
with the real one (the one thrown by the search/bulk).
@jimczi I removed the scenario of the broken Can you ping me when your PR is ready? It will affect me on my feature branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @hendrikmuhs !
Can you ping me when your PR is ready? It will affect me on my feature branch.
Sure, I'll do
e18103e
to
9c51db9
Compare
Improve error handling in the Indexer if an exception occurs during the very 1st retrieval (query execution)
Improve error handling in the Indexer if an exception occurs
during saving state and/orduring the very 1st retrieval (query execution)Notes
bug
/cc @jimczi @polyfractal