-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLRC should detect I/O Reactor failure and log the halting exception #49124
Comments
Pinging @elastic/es-core-features (:Core/Features/Java Low Level REST Client) |
Detecting the stopped state, and retrieving the audit log is simple enough if we have a reference to the active IOReactor instance, but getting that instance looks easier said than done. The IOReactor object is instantiated by the client builder, and is buried under the connection manager with no reasonable way to inject, intercept, or access the instance. We can attempt to replicate the connection manager construction logic and retain a handle to the IOReactor instance for visibility into its state. However, this can be broken if a user provides their own connection manager implementation to the client via a custom Another option is to make this detection best effort using reflection operations to break the glass and attempt to dig down into the connection manager for a client and obtain the IOReactor instance that way. This is probably the messiest solution here and I'd like to avoid it if at all possible. |
After discussing in the team sync today we decided that will will replicate the connection manager construction logic in order to obtain the IOReactor. If user code provides a connection manager we will do a best effort attempt at retrieving the IOReactor from the connection manager, logging a warning if we are unable. We also discussed getting a reproduction of some scenarios where the IOReactor can be shutdown due to exceptions. |
After doing some experiments and digging through code, I think it makes sense to leave a summary of where this effort is: First: It does not look like we will be able to instrument the IOReactor in the client, at least not without some very messy reflection code. The client builder object maintains all of its builder fields as private fields, and each builder method is finalized. This keeps us from extending it and capturing inputs to replace the connection manager instance in a safe way. Without this ability to extend the builder, it becomes very easy to break SSL connections in the client, as the connection manager that we would be substituting depends on the SSL context provided to the builder, which is inaccessible to us. Second: Browsing the changes in the HTTP Components library for version 5, it seems that the IOReactor code now logs the exceptions by default. This means that once version 5 is released, any instrumentation we add can essentially be thrown out unless we can think of some other use for it. We should keep an eye on the HTTP Components library and should upgrade to version 5 when it becomes available. |
have you had a look of this: https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/index.html |
@jbaiera Do you have a timeline of when the Elasticsearch library will use the recently released HTTP Components 5.0? |
@stanpalatnik No timeline as of yet, though I'll add a personal todo item to start taking a look at it! |
We have experienced this problem recently and I do have a stack trace for the fist error. Conditions:
Hope it might help to identify the problem. It may be in
|
I think I had given the reason above. As a workaround, you can use reflection to fix the client setup issue. |
HttpClient 5.0 has resolved this issue almost and will not crash on most cases any more. Thus you can either upgrade to HttpClient 5.0 or follow the 4.x guidance to setup IOReactorExceptionHandler which will fix this issue for most cases too. https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601 For existing users, reflection can be used to setup IOReactorExceptionHandler by utilizing setHttpClientConfigCallback on RestClient builder. |
@ruilr, please do you have examples of how you could use reflection on the IOReactorExceptionHandler? I'm new to this. |
Have you solved this problem? |
@ruilr can you provide example of how to setup IOReactorExceptionHandler with setHttpClientConfigCallback ? |
+1 |
They fixed this in HTTP Core v4.4.13, and the ES java client is using v4.4.12, so to fix it I just excluded the apache httpcomponents from the ES java client and imported the newer version on my own:
After that everything seems to be working fine =] |
let me clarify, there is no "fix" for this issue, it is because ElasticSearch Rest Client is not setting exception handler thus the IOReactor will hang up, this is by design of Http Async Client, it is a bug in ElasticSearch Rest Client, not in Http Async Client. I am posting this explain on the behalf of my company, but it will require verbose process to get approval to publish any fix code because the fix code has been used inside the company and been considered as company property. Thus I submitted the root cause, provided the reference link. Elasticsearch team has the obligation to provide fix on their new release, and also should provide patch workaround to legacy releases as I suggested to utilize reflection. As I said in my first post, this by design "issue" has been resolved for most cases in new 5.x, but not all, further discussion can be found at: |
Thanks for the clarification @ruilr - much appreciated. |
So we were having from time to time the error "I/O reactor terminated abnormally[ERROR]", also getting "Request cannot be executed; I/O reactor status: STOPPED" error and we saw that it was related to https://issues.apache.org/jira/browse/HTTPCORE-607 && https://issues.apache.org/jira/browse/HTTPCORE-609, so after upgrading to the libs versions that I posted we have been running the system since June without any incident whatsoever |
I won't waste time to explain again, please read the manual: https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601 And this is a known design "issue" or choice of http async client, and Elasticsearch Rest Client need to set an exception handler to proceed the process or it will hangup, but obviously the developer did not read the manual carefully. This is not a good design, thus why they changed this mechanism and then 5.x has a default handler to proceed on Exceptions, but still will hang up on Errors. The details can be found at the mail link I pasted. Again, it is not any known bug, it is a combination of weak design choice and misusing from users. Don't waste time try to upgrade to new 4.x version, it is already providing you the way to avoid hanging up, you are just not doing so, there is nothing to fix at http async client side, except it is still risky in some extremely cases, which is being discussed in the mail I pasted too. If you want to fix this issue, just ask Elasticsearch team to do what they should do, to fix the misusage, to set an exception handler, though it is not enough yet, but would be able to handle most cases. Again, if you want more detailed understanding of this issue, please read the mail link I have given. |
@matrixcine @reddroid555 @wahahasssss Here is my configuration with the Elasticsearch RestClient builder to set up this exception handler. I used the example that @ruilr linked to here https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601. I stole the exception handler impl from here, there are a lot of examples in GitHub to review if you search on new IOReactorExceptionHandler(). RestClientBuilder builder = RestClient.builder(...);
CustomHttpClientConfigCallback configurationCallback = new CustomHttpClientConfigCallback();
builder.setHttpClientConfigCallback(configurationCallback);
public static class CustomHttpClientConfigCallback implements RestClientBuilder.HttpClientConfigCallback {
@Override
public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {
// Add custom exception handler.
// - https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601
// - This always handles the exception and just logs a warning.
try {
DefaultConnectingIOReactor ioReactor = new DefaultConnectingIOReactor();
ioReactor.setExceptionHandler(new IOReactorExceptionHandler() {
@Override
public boolean handle(IOException e) {
logger.warn("System may be unstable: IOReactor encountered a checked exception : " + e.getMessage(), e);
return true; // Return true to note this exception as handled, it will not be re-thrown
}
@Override
public boolean handle(RuntimeException e) {
logger.warn("System may be unstable: IOReactor encountered a runtime exception : " + e.getMessage(), e);
return true; // Return true to note this exception as handled, it will not be re-thrown
}
});
httpClientBuilder.setConnectionManager(new PoolingNHttpClientConnectionManager(ioReactor));
} catch (IOReactorException e) {
throw new RuntimeException(e);
}
return httpClientBuilder;
}
} It compiles, but that is about all I know, but from what I've learned here from @ruilr I think this should do the trick. There is likely some nuanced balance of of what exceptions to handle. I'm going brute force at the moment and handling all exceptions. Thanks! This thread has been very helpful. |
We are facing the same issue. It was working absolutely fine with singleton object of the RestHighLevelClient. We started seeing this issue happening after we changed our code based to 6 Java Objects to talk to 6 different cluster based on the index. Once you see this issue with any instance, the subsequent call to that instance were also failing with the same issue. Means, the instance was becoming unhealthy cause of the issue. Only way for you to do terminate the instance. For short term fix, we caught the exception using the spring controller advice - exception handler (Solution 3 in below link) and re-creating the closed RestHighLevelClient object again. That's how we handle the exception, however, the issue is still there in some extend with limited to the request where you see that issue. https://www.baeldung.com/exception-handling-for-rest-with-spring Thanks @robotdan for providing the above solution, but somehow it didn't work for me. Maybe, I was missing something. Based on my experience on this, I believe that this exception should handle by elastic API, the client application should not bother about this. (Was working absolutely fine with singleton object vs 6 java object not) @jbaiera - I see this issue is in OPEN state. Do we know is there any fix available for this or any WIP efforts going on it. Any ETA please. Looking into this thread conversation. I really like the way everybody is helping to each other. Thank you very much Let me know if you should have any question. |
We are facing the same issue. Using as below Application:
APIs: addresses string search, by many other parameters using search, get, msearch, mget Elastic APIs
Thanks @robotdan for providing the above solution, but somehow it didn't work for me as well. Maybe, I was missing something. Currently to avoid Request cannot be executed; I/O reactor status: STOPPED issue, based on this exception occurrence executing fall back logic Fall back logic Reference Please let us know better solution for this issue. Note: We are seeing this issue for below configuration as well. |
Hi, I'm quoting @robotdan below, as well as @luneo7 above, in order to share with you my finding right now...
[EDIT:] Last but not least, can we know since when that behaviour appears ? is specific to v7.x of elastic ? Or is it much older ? Would it mean that httpcore evolved badly (from our point of view) ? Best regards, and cheers to all, TvS |
Trying to set the ` private static void setIoReactorExceptionHandler(HttpAsyncClientBuilder clientBuilder) {
|
the link is 404 now |
New link is https://hc.apache.org/httpcomponents-core-4.4.x/current/tutorial/html/nio.html#d5e601 |
I think this is the case even in 4.1.4 which is the version we're using today: this.reactorThread = threadFactory.newThread(new Runnable() {
@Override
public void run() {
try {
final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
connmgr.execute(ioEventDispatch);
} catch (final Exception ex) {
log.error("I/O reactor terminated abnormally", ex);
} finally {
status.set(Status.STOPPED);
}
}
}); |
What is the current state of this issue? |
Please ignore my comment above. Problem has been solved. The root cause was on the side of my application, which tried to login with the same account multiple times, which in turn lead to multiple unnecessary save() and retrieveById() statements on the respective indices. To diagnose a similar problem, it helps to closely examine the stacktrace and - if the error occurs in several different unit tests - look for any commonalities there, such as the same method being called on the index. HTH. |
请问,现在哪个版本解决了这个问题 |
Version 8 seems to have resolved this. |
mark 目前仍未解决,即使设置了IOReactorExceptionHandler也无济于事 |
I/O reactor status: STOPPED,How do I solve it. |
Could you please share the PR or commit for version 8 which seems to be fixing it. |
有人解决了吗 |
LLRC makes use of the Apache HTTPComponents Async Client. The client is powered by an internal I/O Reactor that reacts to and dispatches IO events as they happen. In some cases, this I/O Reactor can encounter exceptions from higher up in the call stack or even from the Java NIO library. If these exceptions are not handled, the I/O Reactor shuts down and leaves the RestClient instance stuck.
It can be difficult to discern what causes the reactor to shut down, as the offending exceptions are not always written to the log. Despite exceptions not being logged to a logger, they are collected in an internal audit log on the IOReactor implementation.
We should find a way to:
This would simplify the process of troubleshooting why the rest client has died and allow for a better exception to be thrown rather than the arcane
I/O reactor status: STOPPED
message.The text was updated successfully, but these errors were encountered: