Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLRC should detect I/O Reactor failure and log the halting exception #49124

Open
jbaiera opened this issue Nov 14, 2019 · 36 comments
Open

LLRC should detect I/O Reactor failure and log the halting exception #49124

jbaiera opened this issue Nov 14, 2019 · 36 comments
Labels
:Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >enhancement Team:Data Management Meta label for data/management team

Comments

@jbaiera
Copy link
Member

jbaiera commented Nov 14, 2019

LLRC makes use of the Apache HTTPComponents Async Client. The client is powered by an internal I/O Reactor that reacts to and dispatches IO events as they happen. In some cases, this I/O Reactor can encounter exceptions from higher up in the call stack or even from the Java NIO library. If these exceptions are not handled, the I/O Reactor shuts down and leaves the RestClient instance stuck.

It can be difficult to discern what causes the reactor to shut down, as the offending exceptions are not always written to the log. Despite exceptions not being logged to a logger, they are collected in an internal audit log on the IOReactor implementation.

We should find a way to:

  1. Detect that the I/O Reactor has stopped.
  2. Log the contents of the I/O Reactor to the client's logger.

This would simplify the process of troubleshooting why the rest client has died and allow for a better exception to be thrown rather than the arcane I/O reactor status: STOPPED message.

@jbaiera jbaiera added >enhancement :Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch labels Nov 14, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features (:Core/Features/Java Low Level REST Client)

@jbaiera
Copy link
Member Author

jbaiera commented Nov 14, 2019

Detecting the stopped state, and retrieving the audit log is simple enough if we have a reference to the active IOReactor instance, but getting that instance looks easier said than done. The IOReactor object is instantiated by the client builder, and is buried under the connection manager with no reasonable way to inject, intercept, or access the instance.

We can attempt to replicate the connection manager construction logic and retain a handle to the IOReactor instance for visibility into its state. However, this can be broken if a user provides their own connection manager implementation to the client via a custom HttpClientConfigCallback. Additionally, should the default connection manager creation logic change, we would need to update our custom build logic.

Another option is to make this detection best effort using reflection operations to break the glass and attempt to dig down into the connection manager for a client and obtain the IOReactor instance that way. This is probably the messiest solution here and I'd like to avoid it if at all possible.

@jbaiera
Copy link
Member Author

jbaiera commented Nov 21, 2019

After discussing in the team sync today we decided that will will replicate the connection manager construction logic in order to obtain the IOReactor. If user code provides a connection manager we will do a best effort attempt at retrieving the IOReactor from the connection manager, logging a warning if we are unable. We also discussed getting a reproduction of some scenarios where the IOReactor can be shutdown due to exceptions.

@jbaiera
Copy link
Member Author

jbaiera commented Dec 12, 2019

After doing some experiments and digging through code, I think it makes sense to leave a summary of where this effort is:

First: It does not look like we will be able to instrument the IOReactor in the client, at least not without some very messy reflection code. The client builder object maintains all of its builder fields as private fields, and each builder method is finalized. This keeps us from extending it and capturing inputs to replace the connection manager instance in a safe way. Without this ability to extend the builder, it becomes very easy to break SSL connections in the client, as the connection manager that we would be substituting depends on the SSL context provided to the builder, which is inaccessible to us.

Second: Browsing the changes in the HTTP Components library for version 5, it seems that the IOReactor code now logs the exceptions by default. This means that once version 5 is released, any instrumentation we add can essentially be thrown out unless we can think of some other use for it. We should keep an eye on the HTTP Components library and should upgrade to version 5 when it becomes available.

@xzer
Copy link

xzer commented Mar 27, 2020

@stanpalatnik
Copy link

@jbaiera Do you have a timeline of when the Elasticsearch library will use the recently released HTTP Components 5.0?

@jbaiera
Copy link
Member Author

jbaiera commented Mar 30, 2020

@stanpalatnik No timeline as of yet, though I'll add a personal todo item to start taking a look at it!

@vlad-aleksandrov
Copy link

We have experienced this problem recently and I do have a stack trace for the fist error.

Conditions:

  • The latest elasticsearch client 7.6.2
  • apache HTTP client: httpasyncclient-4.1.4.jar httpclient-4.5.1.jar httpcore-4.4.jar httpcore-nio-4.4.12.jar
  • High frequency of long running request (a few seconds per query).
  • Cluster allocation set to none for rolling restart.
  • Elasticsearch process was stopping manually on one node.

Hope it might help to identify the problem. It may be in org.apache.http.ConnectionClosedException: Connection closed unexpectedly HTTP connection state...

ERROR [hystrix-find-3]{0.0.0.0}[EsOperationFailureListener.onFailure(21)] Elasticsearch request failed on [host=http://xxx.xxx.xxx.xxx:9200]
java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
	at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:251)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:235)
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1514)
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1484)
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1454)
	at org.elasticsearch.client.RestHighLevelClient.search(RestHighLevelClient.java:970)
	at (...private code...)
	at sun.reflect.GeneratedMethodAccessor274.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at com.netflix.hystrix.contrib.javanica.command.MethodExecutionAction.execute(MethodExecutionAction.java:116)
	at com.netflix.hystrix.contrib.javanica.command.MethodExecutionAction.executeWithArgs(MethodExecutionAction.java:93)
	at com.netflix.hystrix.contrib.javanica.command.MethodExecutionAction.execute(MethodExecutionAction.java:78)
	at com.netflix.hystrix.contrib.javanica.command.GenericCommand$1.execute(GenericCommand.java:48)
	at com.netflix.hystrix.contrib.javanica.command.AbstractHystrixCommand.process(AbstractHystrixCommand.java:145)
	at com.netflix.hystrix.contrib.javanica.command.GenericCommand.run(GenericCommand.java:45)
	at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:302)
	at com.netflix.hystrix.HystrixCommand$2.call(HystrixCommand.java:298)
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46)
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
	at rx.Observable.unsafeSubscribe(Observable.java:10151)
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
	at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
	at rx.Observable.unsafeSubscribe(Observable.java:10151)
	at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:41)
	at rx.internal.operators.OnSubscribeDoOnEach.call(OnSubscribeDoOnEach.java:30)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:48)
	at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
	at rx.Observable.unsafeSubscribe(Observable.java:10151)
	at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94)
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56)
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47)
	at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69)
	at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
	Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
		at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
		... 59 more
		Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
			at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
			at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
			... 58 more
			Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
				at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
				at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
				... 57 more
				Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
					at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
					at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
					... 56 more
					Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
						at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
						at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
						... 55 more
						Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
							at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
							at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
							... 54 more
							Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
								at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
								at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
								... 53 more
								Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
									at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
									at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
									... 52 more
									Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
										at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
										at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
										... 51 more
										Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
											at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
											at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
											... 50 more
											Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
												at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
												at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
												... 49 more
												Suppressed: java.lang.RuntimeException: Request cannot be executed; I/O reactor status: STOPPED
													at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
													at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
													... 48 more
													Suppressed: java.lang.RuntimeException: I/O reactor has been shut down
														at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:831)
														at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
														... 47 more
														Suppressed: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
															at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
															at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
															... 46 more
															Suppressed: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
																at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:813)
																at org.elasticsearch.client.RestClient.performRequest(RestClient.java:248)
																... 45 more
															Caused by: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
																at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:146)
																at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71)
																at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39)
																at org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100)
																at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:277)
																at org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:449)
																at org.apache.http.impl.nio.reactor.AbstractIOReactor.hardShutdown(AbstractIOReactor.java:590)
																at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:305)
																at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
																at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
																... 1 more
														Caused by: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
															at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:146)
															at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71)
															at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39)
															at org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100)
															at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:277)
															at org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:449)
															at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:283)
															at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
															at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591)
															... 1 more
													Caused by: java.lang.IllegalStateException: I/O reactor has been shut down
														at org.apache.http.util.Asserts.check(Asserts.java:34)
														at org.apache.http.impl.nio.reactor.DefaultConnectingIOReactor.connect(DefaultConnectingIOReactor.java:231)
														at org.apache.http.nio.pool.AbstractNIOConnPool.processPendingRequest(AbstractNIOConnPool.java:481)
														at org.apache.http.nio.pool.AbstractNIOConnPool.lease(AbstractNIOConnPool.java:280)
														at org.apache.http.impl.nio.conn.PoolingNHttpClientConnectionManager.requestConnection(PoolingNHttpClientConnectionManager.java:295)
														at org.apache.http.impl.nio.client.AbstractClientExchangeHandler.requestConnection(AbstractClientExchangeHandler.java:377)
														at org.apache.http.impl.nio.client.DefaultClientExchangeHandlerImpl.start(DefaultClientExchangeHandlerImpl.java:129)
														at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:141)
														at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
														... 47 more
												Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
													at org.apache.http.util.Asserts.check(Asserts.java:46)
													at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
													at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
													at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
													... 48 more
											Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
												at org.apache.http.util.Asserts.check(Asserts.java:46)
												at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
												at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
												at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
												... 49 more
										Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
											at org.apache.http.util.Asserts.check(Asserts.java:46)
											at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
											at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
											at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
											... 50 more
									Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
										at org.apache.http.util.Asserts.check(Asserts.java:46)
										at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
										at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
										at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
										... 51 more
								Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
									at org.apache.http.util.Asserts.check(Asserts.java:46)
									at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
									at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
									at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
									... 52 more
							Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
								at org.apache.http.util.Asserts.check(Asserts.java:46)
								at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
								at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
								at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
								... 53 more
						Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
							at org.apache.http.util.Asserts.check(Asserts.java:46)
							at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
							at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
							at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
							... 54 more
					Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
						at org.apache.http.util.Asserts.check(Asserts.java:46)
						at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
						at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
						at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
						... 55 more
				Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
					at org.apache.http.util.Asserts.check(Asserts.java:46)
					at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
					at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
					at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
					... 56 more
			Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
				at org.apache.http.util.Asserts.check(Asserts.java:46)
				at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
				at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
				at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
				... 57 more
		Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
			at org.apache.http.util.Asserts.check(Asserts.java:46)
			at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
			at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
			at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
			... 58 more
	Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
		at org.apache.http.util.Asserts.check(Asserts.java:46)
		at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
		at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
		at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
		... 59 more
Caused by: java.lang.IllegalStateException: Request cannot be executed; I/O reactor status: STOPPED
	at org.apache.http.util.Asserts.check(Asserts.java:46)
	at org.apache.http.impl.nio.client.CloseableHttpAsyncClientBase.ensureRunning(CloseableHttpAsyncClientBase.java:90)
	at org.apache.http.impl.nio.client.InternalHttpAsyncClient.execute(InternalHttpAsyncClient.java:123)
	at org.elasticsearch.client.RestClient.performRequest(RestClient.java:244)
	... 60 more

@xzer
Copy link

xzer commented Apr 24, 2020

I think I had given the reason above. As a workaround, you can use reflection to fix the client setup issue.

@rjernst rjernst added the Team:Data Management Meta label for data/management team label May 4, 2020
@ruilr
Copy link

ruilr commented Jun 19, 2020

HttpClient 5.0 has resolved this issue almost and will not crash on most cases any more.

Thus you can either upgrade to HttpClient 5.0 or follow the 4.x guidance to setup IOReactorExceptionHandler which will fix this issue for most cases too.

https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601

For existing users, reflection can be used to setup IOReactorExceptionHandler by utilizing setHttpClientConfigCallback on RestClient builder.

@matrixcine
Copy link

@ruilr, please do you have examples of how you could use reflection on the IOReactorExceptionHandler? I'm new to this.

@wahahasssss
Copy link

@matrixcine, please do you have examples of how you could use reflection on the IOReactorExceptionHandler? I'm new to this.

Have you solved this problem?

@reddroid555
Copy link

HttpClient 5.0 has resolved this issue almost and will not crash on most cases any more.

Thus you can either upgrade to HttpClient 5.0 or follow the 4.x guidance to setup IOReactorExceptionHandler which will fix this issue for most cases too.

https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601

For existing users, reflection can be used to setup IOReactorExceptionHandler by utilizing setHttpClientConfigCallback on RestClient builder.

@ruilr can you provide example of how to setup IOReactorExceptionHandler with setHttpClientConfigCallback ?

@wahahasssss
Copy link

HttpClient 5.0 has resolved this issue almost and will not crash on most cases any more.
Thus you can either upgrade to HttpClient 5.0 or follow the 4.x guidance to setup IOReactorExceptionHandler which will fix this issue for most cases too.
https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601
For existing users, reflection can be used to setup IOReactorExceptionHandler by utilizing setHttpClientConfigCallback on RestClient builder.

@ruilr can you provide example of how to setup IOReactorExceptionHandler with setHttpClientConfigCallback ?

+1

@luneo7
Copy link

luneo7 commented Sep 14, 2020

They fixed this in HTTP Core v4.4.13, and the ES java client is using v4.4.12, so to fix it I just excluded the apache httpcomponents from the ES java client and imported the newer version on my own:

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>${elasticsearch.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpclient</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpcore</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpcore-nio</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.12</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore-nio</artifactId>
            <version>4.4.13</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore</artifactId>
            <version>4.4.13</version>
        </dependency>

After that everything seems to be working fine =]

@robotdan
Copy link

robotdan commented Sep 16, 2020

@luneo7 Thanks for posting your fix. Do you know for sure the fix is in 4.4.13?

Apache HTTP Core Release notes https://archive.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt

Here are the bugs in 4.4.13 https://issues.apache.org/jira/browse/HTTPCORE-612?jql=project%20%3D%20HTTPCORE%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%204.4.13%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

The only issue I found that looks to be similar was in 4.4.12 https://issues.apache.org/jira/browse/HTTPCORE-629

-- Edit --

My mistake, issue HTTPCORE-629 linked above affects 4.4.12, is not fixed in 4.4.12. It is listed as a duplicate of https://issues.apache.org/jira/browse/HTTPASYNC-160 which is listed as fixed in version 4.1.5 and 5.x of the HTTP Client component.

From what I can tell, the latest released version of the Async Client is version 4.1.4. https://search.maven.org/artifact/org.apache.httpcomponents/httpasyncclient/4.1.4/jar
https://downloads.apache.org/httpcomponents/httpasyncclient/RELEASE_NOTES-4.1.x.txt

@ruilr
Copy link

ruilr commented Sep 16, 2020

let me clarify, there is no "fix" for this issue, it is because ElasticSearch Rest Client is not setting exception handler thus the IOReactor will hang up, this is by design of Http Async Client, it is a bug in ElasticSearch Rest Client, not in Http Async Client.

I am posting this explain on the behalf of my company, but it will require verbose process to get approval to publish any fix code because the fix code has been used inside the company and been considered as company property. Thus I submitted the root cause, provided the reference link.

Elasticsearch team has the obligation to provide fix on their new release, and also should provide patch workaround to legacy releases as I suggested to utilize reflection.

As I said in my first post, this by design "issue" has been resolved for most cases in new 5.x, but not all, further discussion can be found at:

http://mail-archives.apache.org/mod_mbox/hc-dev/202006.mbox/%3c9ef305d3a9587370ded5611b1dc55c25c29e3fc4.camel@apache.org%3e

@robotdan
Copy link

Thanks for the clarification @ruilr - much appreciated.

@luneo7
Copy link

luneo7 commented Sep 16, 2020

So we were having from time to time the error "I/O reactor terminated abnormally[ERROR]", also getting "Request cannot be executed; I/O reactor status: STOPPED" error and we saw that it was related to https://issues.apache.org/jira/browse/HTTPCORE-607 && https://issues.apache.org/jira/browse/HTTPCORE-609, so after upgrading to the libs versions that I posted we have been running the system since June without any incident whatsoever

@ruilr
Copy link

ruilr commented Sep 16, 2020

I won't waste time to explain again, please read the manual:

https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601

And this is a known design "issue" or choice of http async client, and Elasticsearch Rest Client need to set an exception handler to proceed the process or it will hangup, but obviously the developer did not read the manual carefully.

This is not a good design, thus why they changed this mechanism and then 5.x has a default handler to proceed on Exceptions, but still will hang up on Errors. The details can be found at the mail link I pasted.

Again, it is not any known bug, it is a combination of weak design choice and misusing from users. Don't waste time try to upgrade to new 4.x version, it is already providing you the way to avoid hanging up, you are just not doing so, there is nothing to fix at http async client side, except it is still risky in some extremely cases, which is being discussed in the mail I pasted too.

If you want to fix this issue, just ask Elasticsearch team to do what they should do, to fix the misusage, to set an exception handler, though it is not enough yet, but would be able to handle most cases.

Again, if you want more detailed understanding of this issue, please read the mail link I have given.

http://mail-archives.apache.org/mod_mbox/hc-dev/202006.mbox/%3c9ef305d3a9587370ded5611b1dc55c25c29e3fc4.camel@apache.org%3e

@robotdan
Copy link

@matrixcine @reddroid555 @wahahasssss

Here is my configuration with the Elasticsearch RestClient builder to set up this exception handler. I used the example that @ruilr linked to here https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601.

I stole the exception handler impl from here, there are a lot of examples in GitHub to review if you search on new IOReactorExceptionHandler().

RestClientBuilder builder = RestClient.builder(...);

CustomHttpClientConfigCallback configurationCallback = new CustomHttpClientConfigCallback();
builder.setHttpClientConfigCallback(configurationCallback);

public static class CustomHttpClientConfigCallback implements RestClientBuilder.HttpClientConfigCallback {
  @Override
  public HttpAsyncClientBuilder customizeHttpClient(HttpAsyncClientBuilder httpClientBuilder) {

    // Add custom exception handler.
    // - https://hc.apache.org/httpcomponents-core-ga/tutorial/html/nio.html#d5e601
    // - This always handles the exception and just logs a warning.
    try {
      DefaultConnectingIOReactor ioReactor = new DefaultConnectingIOReactor();
      ioReactor.setExceptionHandler(new IOReactorExceptionHandler() {
        @Override
        public boolean handle(IOException e) {
          logger.warn("System may be unstable: IOReactor encountered a checked exception : " + e.getMessage(), e);
          return true; // Return true to note this exception as handled, it will not be re-thrown
        }

        @Override
        public boolean handle(RuntimeException e) {
          logger.warn("System may be unstable: IOReactor encountered a runtime exception : " + e.getMessage(), e);
          return true; // Return true to note this exception as handled, it will not be re-thrown
        }
      });

      httpClientBuilder.setConnectionManager(new PoolingNHttpClientConnectionManager(ioReactor));
    } catch (IOReactorException e) {
      throw new RuntimeException(e);
    }

    return httpClientBuilder;
  }
}

It compiles, but that is about all I know, but from what I've learned here from @ruilr I think this should do the trick. There is likely some nuanced balance of of what exceptions to handle. I'm going brute force at the moment and handling all exceptions.

Thanks! This thread has been very helpful.

@nikunjmulani
Copy link

nikunjmulani commented Oct 13, 2020

We are facing the same issue. It was working absolutely fine with singleton object of the RestHighLevelClient. We started seeing this issue happening after we changed our code based to 6 Java Objects to talk to 6 different cluster based on the index. Once you see this issue with any instance, the subsequent call to that instance were also failing with the same issue. Means, the instance was becoming unhealthy cause of the issue. Only way for you to do terminate the instance.

For short term fix, we caught the exception using the spring controller advice - exception handler (Solution 3 in below link) and re-creating the closed RestHighLevelClient object again. That's how we handle the exception, however, the issue is still there in some extend with limited to the request where you see that issue.

https://www.baeldung.com/exception-handling-for-rest-with-spring

Thanks @robotdan for providing the above solution, but somehow it didn't work for me. Maybe, I was missing something.

Based on my experience on this, I believe that this exception should handle by elastic API, the client application should not bother about this. (Was working absolutely fine with singleton object vs 6 java object not)

@jbaiera - I see this issue is in OPEN state. Do we know is there any fix available for this or any WIP efforts going on it. Any ETA please.

Looking into this thread conversation. I really like the way everybody is helping to each other. Thank you very much

Let me know if you should have any question.

@harshahst
Copy link

harshahst commented Oct 14, 2020

We are facing the same issue.

Using as below

Application:

  1. Spring Boot 2.1.5.RELEASE
  2. Elasticsearch libraries 7.7.0 - org.elasticsearch:elasticsearch, org.elasticsearch.client:elasticsearch-rest-client, org.elasticsearch.client:elasticsearch-rest-high-level-client
  3. Elasticsearch 7.7.0 clusters

APIs: addresses string search, by many other parameters using search, get, msearch, mget Elastic APIs

  1. We create RestHighLevelClient bean during startup of Spring Boot APP and using same singleton bean for each search/get Elastic APIs.
  2. Address typeahead API, for each keystroke we trigger Elastic search API to provide matched addresses - we are seeing too many Request cannot be executed; I/O reactor status: STOPPED.
  3. Other APIs like geo_distance query based on lat, long etc also having this error.

Thanks @robotdan for providing the above solution, but somehow it didn't work for me as well. Maybe, I was missing something.

Currently to avoid Request cannot be executed; I/O reactor status: STOPPED issue, based on this exception occurrence executing fall back logic

Fall back logic
we create another temporary RestHighLevelClient, process the request, then closing this respective client. However this issue is happening many times, and fall back logic is executing many times.

Reference
apache/skywalking#5099

Please let us know better solution for this issue.

Note: We are seeing this issue for below configuration as well.
Spring Boot 2.1.5.RELEASE, Elasticsearch 5.6.0 libraries, Elasticsearch 2.4.1 clusters.

@tstebut
Copy link

tstebut commented Jan 31, 2021

They fixed this in HTTP Core v4.4.13, and the ES java client is using v4.4.12, so to fix it I just excluded the apache httpcomponents from the ES java client and imported the newer version on my own:

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-client</artifactId>
            <version>${elasticsearch.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpclient</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpcore</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>org.apache.httpcomponents</groupId>
                    <artifactId>httpcore-nio</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpclient</artifactId>
            <version>4.5.12</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore-nio</artifactId>
            <version>4.4.13</version>
        </dependency>

        <dependency>
            <groupId>org.apache.httpcomponents</groupId>
            <artifactId>httpcore</artifactId>
            <version>4.4.13</version>
        </dependency>

After that everything seems to be working fine =]

Hi, I'm quoting @robotdan below, as well as @luneo7 above, in order to share with you my finding right now...
Actually, it's totally right that is a by-design issue.
Just look at this Issue https://issues.apache.org/jira/browse/HTTPCORE-654, it's totally accurate... more recently in 4.4.13 actually, some guy Danmeng Tu, manifested the idea such behave is a Bug...I let you confirm our thoughts then by reading what Mr Oleg Kalnichevsky instantly replied before closing the Jira ticket.

@luneo7 Thanks for posting your fix. Do you know for sure the fix is in 4.4.13?

Apache HTTP Core Release notes https://archive.apache.org/dist/httpcomponents/httpcore/RELEASE_NOTES-4.4.x.txt

Here are the bugs in 4.4.13 https://issues.apache.org/jira/browse/HTTPCORE-612?jql=project%20%3D%20HTTPCORE%20AND%20status%20in%20(Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%204.4.13%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

The only issue I found that looks to be similar was in 4.4.12 https://issues.apache.org/jira/browse/HTTPCORE-629

-- Edit --

My mistake, issue HTTPCORE-629 linked above affects 4.4.12, is not fixed in 4.4.12. It is listed as a duplicate of https://issues.apache.org/jira/browse/HTTPASYNC-160 which is listed as fixed in version 4.1.5 and 5.x of the HTTP Client component.

From what I can tell, the latest released version of the Async Client is version 4.1.4. https://search.maven.org/artifact/org.apache.httpcomponents/httpasyncclient/4.1.4/jar
https://downloads.apache.org/httpcomponents/httpasyncclient/RELEASE_NOTES-4.1.x.txt

[EDIT:] Last but not least, can we know since when that behaviour appears ? is specific to v7.x of elastic ? Or is it much older ? Would it mean that httpcore evolved badly (from our point of view) ?

Best regards, and cheers to all,

TvS

@codekarma
Copy link

Trying to set the IoReactorExceptionHandler but the problem is that the connection manager is not instantiated until the build() method is called on HttpAsyncClientBuilder. This happens when the rest client is instantiated. What this mean is that we cannot set the IoReactorExceptionHandler in the HttpClientConfigCallback unless we are overriding the connection manager because the connection manager is still not instantiated and is null when this callback is invoked. Maybe @ruilr or @robotdan can provide some guidance?

` private static void setIoReactorExceptionHandler(HttpAsyncClientBuilder clientBuilder) {
reflectGetConnectionManager(clientBuilder).ifPresent(setIoReactorExceptionHandler);
}

private static void setIoReactorExceptionHandler(NHttpClientConnectionManager connectionManager) {
    if (connectionManager instanceof PoolingNHttpClientConnectionManager) {
        PoolingNHttpClientConnectionManager poolingNHttpClientConnectionManager =
                (PoolingNHttpClientConnectionManager) connectionManager;
        reflectGetIoReactor(poolingNHttpClientConnectionManager)
                .ifPresent(ioReactor -> ioReactor.setExceptionHandler(getIoReactorExceptionHandler()));
    }
}

private static Optional<AbstractMultiworkerIOReactor> reflectGetIoReactor(
        PoolingNHttpClientConnectionManager connectionManager) {
    Field ioReactorField = FieldUtils.getField(PoolingNHttpClientConnectionManager.class, "ioReactor", true);
    try {
        AbstractMultiworkerIOReactor ioReactor =
                (AbstractMultiworkerIOReactor) ioReactorField.get(connectionManager);
        return Optional.of(ioReactor);
    } catch (ClassCastException classCastException) {
        log.warn("Unknown ioReactor class", classCastException);
    } catch (IllegalAccessException exception) {
        log.warn("Could not access field {}", SafeArg.of("fieldName", "ioReactor"), exception);
    }
    return Optional.empty();
}

private static Optional<NHttpClientConnectionManager> reflectGetConnectionManager(
        HttpAsyncClientBuilder clientBuilder) {
    Field connManagerField = FieldUtils.getField(clientBuilder.getClass(), "connManager", true);
    try {
        NHttpClientConnectionManager connectionManager =
                (NHttpClientConnectionManager) connManagerField.get(clientBuilder);
        return Optional.of(connectionManager);
    } catch (ClassCastException classCastException) {
        log.warn("Unknown connectionManager class", classCastException);
    } catch (IllegalAccessException exception) {
        log.warn("Could not access field {}", SafeArg.of("fieldName", "connManager"), exception);
    }
    return Optional.empty();
}

private static IOReactorExceptionHandler getIoReactorExceptionHandler() {
    return new IOReactorExceptionHandler() {
        @Override
        public boolean handle(IOException ioException) {
            log.warn("System may be unstable: IOReactor encountered a checked exception : {}", ioException);
            return true; // Return true to note this exception as handled, it will not be re-thrown
        }

        @Override
        public boolean handle(RuntimeException runtimeException) {
            log.warn("System may be unstable: IOReactor encountered a runtime exception : ", runtimeException);
            return true; // Return true to note this exception as handled, it will not be re-thrown
        }
    };
}`

@ayonel
Copy link

ayonel commented Apr 26, 2021

have you had a look of this:

https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/index.html

the link is 404 now

@MingtaoYuTR
Copy link

have you had a look of this:
https://hc.apache.org/httpcomponents-core-ga/httpcore-nio/apidocs/index.html

the link is 404 now

New link is https://hc.apache.org/httpcomponents-core-4.4.x/current/tutorial/html/nio.html#d5e601
In case url changes again. It's "3.4. I/O reactor exception handling" under "Chapter 3. Asynchronous I/O based on NIO"

@DaveCTurner
Copy link
Contributor

Second: Browsing the changes in the HTTP Components library for version 5, it seems that the IOReactor code now logs the exceptions by default.

I think this is the case even in 4.1.4 which is the version we're using today:

            this.reactorThread = threadFactory.newThread(new Runnable() {

                @Override
                public void run() {
                    try {
                        final IOEventDispatch ioEventDispatch = new InternalIODispatch(handler);
                        connmgr.execute(ioEventDispatch);
                    } catch (final Exception ex) {
                        log.error("I/O reactor terminated abnormally", ex);
                    } finally {
                        status.set(Status.STOPPED);
                    }
                }

            });

@mwunderlich
Copy link

What is the current state of this issue?
I have been struggling with this very problem for a while. It occurs mostly when unit tests are being executed in our CI/CD system (TeamCity). Sometimes it also happens locally, but it is not reproducible in a deterministic manner in either environment (locally or on TC).

@mwunderlich
Copy link

Please ignore my comment above. Problem has been solved.

The root cause was on the side of my application, which tried to login with the same account multiple times, which in turn lead to multiple unnecessary save() and retrieveById() statements on the respective indices.

To diagnose a similar problem, it helps to closely examine the stacktrace and - if the error occurs in several different unit tests - look for any commonalities there, such as the same method being called on the index. HTH.

@zz56566
Copy link

zz56566 commented May 6, 2022

请问,现在哪个版本解决了这个问题

@kazoompa
Copy link

kazoompa commented May 6, 2022

Version 8 seems to have resolved this.

@hust419
Copy link

hust419 commented Nov 11, 2022

mark 目前仍未解决,即使设置了IOReactorExceptionHandler也无济于事

@meicool
Copy link

meicool commented Jan 10, 2023

I/O reactor status: STOPPED,How do I solve it.

@aman-tandon-30
Copy link

Version 8 seems to have resolved this.

Could you please share the PR or commit for version 8 which seems to be fixing it.

@moshuowen
Copy link

有人解决了吗

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Clients/Java Low Level REST Client Minimal dependencies Java Client for Elasticsearch >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests