Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling plugin may attempt to send response twice #93691

Closed
danielmitterdorfer opened this issue Feb 10, 2023 · 1 comment · Fixed by #93692
Closed

Profiling plugin may attempt to send response twice #93691

danielmitterdorfer opened this issue Feb 10, 2023 · 1 comment · Fixed by #93692
Assignees
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.7.0

Comments

@danielmitterdorfer
Copy link
Member

Elasticsearch Version

8.7.0-SNAPSHOT

Installed Plugins

No response

Java Version

bundled

OS Version

N/A

Problem Description

During benchmarking I noticed the following stack trace in the ES logs:

[2023-02-10T09:12:28,400][WARN ][r.suppressed             ] [elasticsearch-0] path: /_profiling/stacktraces, params: {}
java.lang.IllegalStateException: Channel is already closed
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.close(RestController.java:772) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.rest.RestController$ResourceHandlingHttpChannel.sendResponse(RestController.java:759) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.rest.action.RestResponseListener.processResponse(RestResponseListener.java:26) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.rest.action.RestActionListener.onResponse(RestActionListener.java:38) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.rest.action.RestCancellableNodeClient$1.onResponse(RestCancellableNodeClient.java:87) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.rest.action.RestCancellableNodeClient$1.onResponse(RestCancellableNodeClient.java:81) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:209) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:165) ~[?:?]
        at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:250) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.xpack.profiler.TransportGetProfilingAction$DetailsHandler.mayFinish(TransportGetProfilingAction.java:427) ~[?:?]
        at org.elasticsearch.xpack.profiler.TransportGetProfilingAction$DetailsHandler.onExecutableDetailsResponse(TransportGetProfilingAction.java:419) ~[?:?]
        at org.elasticsearch.xpack.profiler.TransportGetProfilingAction$5.onResponse(TransportGetProfilingAction.java:315) ~[?:?]
        at org.elasticsearch.xpack.profiler.TransportGetProfilingAction$5.onResponse(TransportGetProfilingAction.java:312) ~[?:?]
        at org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:209) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:165) ~[?:?]
        at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:250) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.get.TransportMultiGetAction$1.finishHim(TransportMultiGetAction.java:155) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.get.TransportMultiGetAction$1.onResponse(TransportMultiGetAction.java:138) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.get.TransportMultiGetAction$1.onResponse(TransportMultiGetAction.java:130) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.client.internal.node.NodeClient$SafelyWrappedActionListener.onResponse(NodeClient.java:160) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:209) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.tasks.TaskManager$1.onResponse(TaskManager.java:203) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ContextPreservingActionListener.onResponse(ContextPreservingActionListener.java:31) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.xpack.security.action.filter.SecurityActionFilter.lambda$applyInternal$2(SecurityActionFilter.java:165) ~[?:?]
        at org.elasticsearch.action.ActionListener$DelegatingFailureActionListener.onResponse(ActionListener.java:250) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.ActionListenerResponseHandler.handleResponse(ActionListenerResponseHandler.java:43) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleResponse(TransportService.java:1367) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.processResponse(TransportService.java:1466) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.transport.TransportService$DirectResponseChannel.sendResponse(TransportService.java:1437) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.transport.TaskTransportChannel.sendResponse(TaskTransportChannel.java:44) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ChannelActionListener.lambda$onResponse$0(ChannelActionListener.java:38) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.ActionListener.run(ActionListener.java:567) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:38) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.support.ChannelActionListener.onResponse(ChannelActionListener.java:20) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:50) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.ActionRunnable$2.accept(ActionRunnable.java:47) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.action.ActionRunnable$3.doRun(ActionRunnable.java:72) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:958) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26) ~[elasticsearch-8.8.0-SNAPSHOT.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1589) ~[?:?]

This indicates that we attempt to send the response twice, see:

close();
delegate.sendResponse(response);
success = true;

and

// attempt to close once atomically
if (closed.compareAndSet(false, true) == false) {
throw new IllegalStateException("Channel is already closed");
}

This is caused by a race condition in the profiling plugin. It guards with two atomics against duplicate responses but if the method is called after both atomics have just been decremented to zero, we will send the response twice:

public void mayFinish() {
if (expectedStackFrameSlices.get() == 0 && expectedExecutableSlices.get() == 0) {
builder.setExecutables(executables);
builder.setStackFrames(stackFrames);
log.debug("retrieveStackTraceDetails took [" + (System.nanoTime() - start) / 1_000_000.0d + " ms].");
submitListener.onResponse(builder.build());
}
}

Steps to Reproduce

Issue repeated calls against /_profiling/stacktrace. As this is a concurrency issue, it may take a few iterations to manifest.

Logs (if relevant)

No response

@danielmitterdorfer danielmitterdorfer added >bug :Search/Search Search-related issues that do not fall into other categories v8.7.0 labels Feb 10, 2023
@danielmitterdorfer danielmitterdorfer self-assigned this Feb 10, 2023
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@elasticsearchmachine elasticsearchmachine added the Team:Search Meta label for search team label Feb 10, 2023
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Feb 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes elastic#93691
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Feb 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes elastic#93691
danielmitterdorfer added a commit that referenced this issue Feb 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes #93691
danielmitterdorfer added a commit to danielmitterdorfer/elasticsearch that referenced this issue Feb 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes elastic#93691
elasticsearchmachine pushed a commit that referenced this issue Feb 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes #93691
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this issue Feb 16, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes elastic#93691
saarikabhasi pushed a commit to saarikabhasi/elasticsearch that referenced this issue Apr 10, 2023
For performance reasons the profiling plugin issues multiple requests
concurrently. It uses internal handler classes to keep track of state.
When all responses of dependent requests have arrived, it assembles the
response and sends it to the client. The final state handler used two
atomic counters to track outstanding requests. This has led to a race
condition: When the two code paths finished at the same time, the plugin
attempted to send the response twice.

With this commit we simplify state handling by only using a single
atomic counter for both code paths. We decrement the counter only in a
single place and will send the response iff all dependent requests have
finished.

Closes elastic#93691
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team v8.7.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants