Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ActionListener.map exception handling #50886

Merged

Conversation

henningandersen
Copy link
Contributor

@henningandersen henningandersen commented Jan 11, 2020

map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.

This affects many APIs across the code base. This is the first of a series
of PRs changing exception handling to adhere to the principle that an
ActionListener implementation is responsible for its own exception handling.

The demo program test here illustrates the current inconsistencies in how exceptions in listeners are handled.

map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
@henningandersen henningandersen added >bug WIP :Core/Infra/Core Core issues without another label >breaking-java :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v8.0.0 v7.6.0 labels Jan 11, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Core)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

Copy link
Contributor

@pugnascotia pugnascotia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

How do you feel about adding some JavaDoc to the new test cases, to explain what the expectations are?

delegate.onFailure(e);
return;
}
delegate.onResponse(mapped);
Copy link
Member

@original-brownbear original-brownbear Jan 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the point in this change, but note that I added the .map shortcut back when I added it to dry up a bunch of ActionListener.wrap(..., listener::onFailure) spots.
I think we'd basically have to audit every spot that we use .map in now and make sure that the listener/delegate will actually handle it's own onResponse failures (from a quick read over the spots we use map in this may already hold true).

Maybe we should assert this and do something like:

try {
   delegate.onResponse(mapped);
} catch (Exception e) {
   assert false: e;
   throw e;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the assert. Local CI seems happy about it.

@henningandersen
Copy link
Contributor Author

Thanks @pugnascotia , I added javadoc here: 198651b

The DFS action relied on map notifying onFailure (sort of, at least this
way it is bwc). But there seems to be no reason it cannot simply use
the ChannelActionListener, so change it into using that.
@henningandersen
Copy link
Contributor Author

Failure already reported here.

@henningandersen
Copy link
Contributor Author

Removed WIP on this, since I think this can go in alone.

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not gone through all callers of this method to check whether this is safe, but left a few small comments here.

try {
delegate.onResponse(mapped);
} catch (RuntimeException e) {
assert false : new AssertionError("map: listener.onResponse failed", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why assert here but not when calling delegate.onFailure(e);?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, added that in ca31964 and tests seems unaffected.

});
(request, channel, task) ->
searchService.executeDfsPhase(request, (SearchShardTask) task,
new ChannelActionListener<>(channel, DFS_ACTION_NAME, request))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ChannelActionListener also has this weird double-sending logic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but the code used to propagate the exception out. This seemed to be a left-over from when the ChannelActionListener was introduced. The old map exception handling would ensure onFailure were called on exception. This means this change is effectively a no-op now (with the caveat that onFailure will be called on exceptions like for all other ChannelActionListener usages).

We should notice that DirectTransportChannel will not bubble out exceptions from invoking the TransportResponseHandler. So it seems likely that the primary exceptions bubbled out are related to communicating the response over a wire, in which case invoking onFailure might be desirable (for instance an NPE on serialization).

So I would like to keep using ChannelActionListener here and then deal with ChannelActionListener in a follow-up. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

Copy link
Member

@original-brownbear original-brownbear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks Henning :)

@henningandersen
Copy link
Contributor Author

@elasticmachine test this please

@henningandersen
Copy link
Contributor Author

Build failure looks unrelated, reported it here: #51347 .
@elasticmachine run elasticsearch-ci/2

@henningandersen
Copy link
Contributor Author

One more test round:
@elasticmachine test this please

@henningandersen
Copy link
Contributor Author

henningandersen commented Jan 29, 2020

I have not gone through all callers of this method to check whether this is safe

I went through all calls in production (not test) code. Most are "clearly" OK, falling into one of these categories:

  1. Clearly not able to throw except in severe bug cases like NPE.
  2. Inner listener will not throw since it is either ActionListener.wrap og ChannelActionListener - though there is an addendum to that statement below.
  3. Guarded by assertions enabled check (i.e. not active in production).
  4. A choice is made to use map or not, clearly the intention was not to get the exception behavior (e.g. TransportBulkAction.BulkRequestModifier.wrapActionListenerIfNeeded).
  5. Ultimately caught and onFailure called (transport service, security, ActionRunnable, StepListener/ListenableFuture).
  6. Used with NotifyOnceListener inside only.

Following I did not follow to the end:

PersistentTasksService: this affects all methods here. I checked a subset of the usages only:

  • startPersistentTask:
    • CCR/resume follow ends in CCR ResponseHandler which does counting that will disregard an additional onFailure call anyway.
    • Rollup and transform uses wrap to create the listener
  • sendCompletionRequest: only has trace logging listeners.
  • updatePersistentTaskState: has a variety of listeners sent to it. Many use wrap, but some are complex and hard to chase to the end (so I did not).

The small addendum mentioned previously is that I did not consider the effect of having for instance map(map(wrap(onResponse, onFailure))) where onResponse and onFailure are functions that both throw. In that case, each additional map wrapping would previously invoke onFailure once more, such that onFailure in this case would be called 3 times.

@henningandersen henningandersen merged commit 92ee0f7 into elastic:master Jan 29, 2020
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jan 29, 2020
ActionListener.map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
henningandersen added a commit that referenced this pull request Jan 29, 2020
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jan 29, 2020
Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in elastic#50886.

Related to elastic#50886 and elastic#51302
henningandersen added a commit that referenced this pull request Jan 30, 2020
Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in #50886.

Related to #50886 and #51302
henningandersen added a commit that referenced this pull request Jan 30, 2020
Datafeeds being closed while starting could result in and NPE. This was
handled as any other failure, masking out the NPE. However, this
conflicts with the changes in #50886.

Related to #50886 and #51302
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jan 30, 2020
ActionListener.map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
henningandersen added a commit that referenced this pull request Jan 30, 2020
ActionListener.map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
@henningandersen
Copy link
Contributor Author

Notice that this PR was merged, then reverted. After #51646 was merged, the commit was cherry-picked (identical, no changes) to reintroduce it.

henningandersen added a commit that referenced this pull request Jan 30, 2020
ActionListener.map would call listener.onFailure for exceptions from
listener.onResponse, but this means we could double trigger some
listeners which is generally unexpected. Instead, we should assume that
a listener's onResponse (and onFailure) implementation is responsible
for its own exception handling.
henningandersen added a commit to henningandersen/elasticsearch that referenced this pull request Jan 31, 2020
ActionListener.completeWith would catch exceptions from
listener.onResponse and deliver them to lister.onFailure, essentially
double notifying the listener. Instead we now assert that listeners do
not throw when using ActionListener.completeWith.

Relates elastic#50886
henningandersen added a commit that referenced this pull request Feb 3, 2020
ActionListener.completeWith would catch exceptions from
listener.onResponse and deliver them to lister.onFailure, essentially
double notifying the listener. Instead we now assert that listeners do
not throw when using ActionListener.completeWith.

Relates #50886
henningandersen added a commit that referenced this pull request Feb 3, 2020
ActionListener.completeWith would catch exceptions from
listener.onResponse and deliver them to lister.onFailure, essentially
double notifying the listener. Instead we now assert that listeners do
not throw when using ActionListener.completeWith.

Relates #50886
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>breaking-java >bug :Core/Infra/Core Core issues without another label :Distributed/Distributed A catch all label for anything in the Distributed Area. If you aren't sure, use this one. v7.7.0 v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants