New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CI] SuggestStatsIT testSimpleStats failing #104651
Comments
Pinging @elastic/es-distributed (Team:Distributed) |
Setting it to low-risk since the failure happened only once on CI and is not reproducible locally. |
Hmm I disagree, I think this is a genuine memory leak somewhere in the REST layer so it seems like it could affect many (any) other REST APIs. The stack trace suggests to me that the leak is actually somewhere in |
@DaveCTurner How did you figure out that the leak is in |
Yeah, I looked at the other tests that ran earlier in the same JVM and
|
It looks like this has been around a while, I see the same issue in an 8.10.4 cluster in production:
Still a bad bug but I don't think we can call it a blocker if it's already been released. |
In the failure in the OP I see also this in the logs for
I think that'd explain it, we already shut down the network event loop so we can't do any more writes or even complete any more channel promises. |
Today a `HttpResponse` is always released via a `ChannelPromise` which means the release happens on a network thread. However, it's possible we try and send a `HttpResponse` after the node has got far enough through shutdown that it doesn't have any running network threads left, which means the response just leaks. This is no big deal in production, it becomes irrelevant when the process exits, but in tests we start and stop many nodes within the same process so mustn't leak anything. At this point in shutdown, all HTTP channels are now closed, so it's sufficient to check whether the channel is open first, and to fail the listener on the calling thread if not. That's what this commit does. Closes elastic#104651
Given that this test failure was a leak caused by a slightly unexpected situation at shutdown, it's not really a big deal in production, so now I agree that this is |
Today a `HttpResponse` is always released via a `ChannelPromise` which means the release happens on a network thread. However, it's possible we try and send a `HttpResponse` after the node has got far enough through shutdown that it doesn't have any running network threads left, which means the response just leaks. This is no big deal in production, it becomes irrelevant when the process exits, but in tests we start and stop many nodes within the same process so mustn't leak anything. At this point in shutdown, all HTTP channels are now closed, so it's sufficient to check whether the channel is open first, and to fail the listener on the calling thread if not. That's what this commit does. Closes #104651
Today a `HttpResponse` is always released via a `ChannelPromise` which means the release happens on a network thread. However, it's possible we try and send a `HttpResponse` after the node has got far enough through shutdown that it doesn't have any running network threads left, which means the response just leaks. This is no big deal in production, it becomes irrelevant when the process exits, but in tests we start and stop many nodes within the same process so mustn't leak anything. At this point in shutdown, all HTTP channels are now closed, so it's sufficient to check whether the channel is open first, and to fail the listener on the calling thread if not. That's what this commit does. Closes elastic#104651
* Fix leaked HTTP response sent after close (#105293) Today a `HttpResponse` is always released via a `ChannelPromise` which means the release happens on a network thread. However, it's possible we try and send a `HttpResponse` after the node has got far enough through shutdown that it doesn't have any running network threads left, which means the response just leaks. This is no big deal in production, it becomes irrelevant when the process exits, but in tests we start and stop many nodes within the same process so mustn't leak anything. At this point in shutdown, all HTTP channels are now closed, so it's sufficient to check whether the channel is open first, and to fail the listener on the calling thread if not. That's what this commit does. Closes #104651 * Fix backport
Looks like a genuine leak to me, perhaps we only just started to see this because of #104588?
Build scan:
https://gradle-enterprise.elastic.co/s/xszrggp3wtbyk/tests/:server:internalClusterTest/org.elasticsearch.index.suggest.stats.SuggestStatsIT/testSimpleStats
Reproduction line:
Applicable branches:
main
Reproduces locally?:
Didn't try
Failure history:
Failure dashboard for
org.elasticsearch.index.suggest.stats.SuggestStatsIT#testSimpleStats
Failure excerpt:
The text was updated successfully, but these errors were encountered: