Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/actuator/health never returns when AMQ is down? #25188

Closed
marclallen opened this issue Feb 11, 2021 · 7 comments
Closed

/actuator/health never returns when AMQ is down? #25188

marclallen opened this issue Feb 11, 2021 · 7 comments
Labels
for: external-project For an external project and not something we can fix status: invalid An issue that we don't feel is valid

Comments

@marclallen
Copy link

I'm so sorry if this isn't the place.

I'm not entirely sure if I've even identified the issue, but here's what I have. I'm not including files, etc. unless someone wants them, as I fully expect someone to point me to a thread/issue/whatnot that I failed to find.

I was adding in the SB Admin Client to a local project I had. I couldn't do other work because my corporate VPN was down. I spent hours on it because I couldn't the /actuator/health endpoint to respond. Finally, my VPN came up, and BOOM! So did the endpoint.

I then dropped the VPN and, again, the endpoint stopped responding.

As the SB app was essentially idle when running, the only thing it needed the VPN for was for AMQ.

So, is there a known issue with this? I'm running 2.3.4. Or is there something else that might block the health endpoint from returning?

Thanks!

@spring-projects-issues spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Feb 11, 2021
@wilkinsona
Copy link
Member

wilkinsona commented Feb 12, 2021

We had a problem in this area in the past (see #10809) but we haven't see it since that issue was fixed. Without knowing more about your VPN, how it affects your network stack, if you were using a hostname or IP address to connect to AMQ, etc, I don't think we'll be able to figure this one out.

Can you please try to share a small sample that reproduces the problem without involving your VPN. For example, an app configured with a host that won't resolve or an IP address that is unreachable. If that's not possible, please reproduce the problem with your app and then take two thread dumps 10 or so seconds apart and share them with us.

@wilkinsona wilkinsona added the status: waiting-for-feedback We need additional information before we can continue label Feb 12, 2021
@spring-projects-issues
Copy link
Collaborator

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

@spring-projects-issues spring-projects-issues added the status: feedback-reminder We've sent a reminder that we need additional information before we can continue label Feb 19, 2021
@marclallen
Copy link
Author

Oh, my! I must have missed the original reply. I am so very sorry.

Yes. I’ll try to get that information for you.

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Feb 19, 2021
@snicoll snicoll added status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Feb 19, 2021
@marclallen
Copy link
Author

marclallen commented Feb 19, 2021

Ok. Here's is an Intellij/Gradle project based on the SpringAdmin project (because I had it handy).

I simply added the AMQ starter package and set up a failover property with a dummy IP address.

The actuator port is:

http://localhost:8081/actuator

Before the AMQ change added, the /health endpoint is fine. When I run it with the AMQ library loaded, it isn't.
SpringAdminWithAMQIssue.zip

@spring-projects-issues spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue status: feedback-reminder We've sent a reminder that we need additional information before we can continue labels Feb 19, 2021
@marclallen
Copy link
Author

Oh... and this project is running 2.4.2.

@wilkinsona
Copy link
Member

Thanks for the sample. This looks like an ActiveMQ limitation.

When we call start() on the connection, the thread is blocked waiting to acquire a monitor:

"boundedElastic-4" #36 daemon prio=5 os_prio=31 cpu=2.10ms elapsed=60.51s tid=0x00007f81b0023800 nid=0x8903 waiting for monitor entry  [0x0000700002a36000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.apache.activemq.transport.failover.FailoverTransport.oneway(FailoverTransport.java:578)
	- waiting to lock <0x000000061c6ffa30> (a java.lang.Object)
	at org.apache.activemq.transport.MutexTransport.oneway(MutexTransport.java:68)
	at org.apache.activemq.transport.ResponseCorrelator.asyncRequest(ResponseCorrelator.java:81)
	at org.apache.activemq.transport.ResponseCorrelator.request(ResponseCorrelator.java:86)
	at org.apache.activemq.ActiveMQConnection.syncSendPacket(ActiveMQConnection.java:1392)
	at org.apache.activemq.ActiveMQConnection.ensureConnectionInfoSent(ActiveMQConnection.java:1486)
	- locked <0x000000061c702920> (a java.lang.Object)
	at org.apache.activemq.ActiveMQConnection.start(ActiveMQConnection.java:527)
	at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.localStart(SingleConnectionFactory.java:672)
	- locked <0x000000061f9e93d8> (a java.lang.Object)
	at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:610)
	at com.sun.proxy.$Proxy87.start(Unknown Source)
	at org.springframework.boot.actuate.jms.JmsHealthIndicator$MonitoredConnection.start(JmsHealthIndicator.java:81)
	at org.springframework.boot.actuate.jms.JmsHealthIndicator.doHealthCheck(JmsHealthIndicator.java:53)
	at org.springframework.boot.actuate.health.AbstractHealthIndicator.health(AbstractHealthIndicator.java:82)
	at org.springframework.boot.actuate.health.HealthIndicatorReactiveAdapter$$Lambda$842/0x00000008005c1440.call(Unknown Source)
	at reactor.core.publisher.MonoCallable.call(MonoCallable.java:91)
	at reactor.core.publisher.FluxSubscribeOnCallable$CallableSubscribeOnSubscription.run(FluxSubscribeOnCallable.java:227)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68)
	at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28)
	at java.util.concurrent.FutureTask.run(java.base@11.0.7/FutureTask.java:264)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.7/ScheduledThreadPoolExecutor.java:304)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7/ThreadPoolExecutor.java:628)

This monitor is held by another thread where the failover transport is trying to establish a connection:

"ActiveMQ Task-1" #46 daemon prio=5 os_prio=31 cpu=0.62ms elapsed=7.58s tid=0x00007f81c081c000 nid=0xac07 runnable  [0x0000700002c3c000]
   java.lang.Thread.State: RUNNABLE
	at java.net.PlainSocketImpl.socketConnect(java.base@11.0.7/Native Method)
	at java.net.AbstractPlainSocketImpl.doConnect(java.base@11.0.7/AbstractPlainSocketImpl.java:399)
	- locked <0x000000061b9004c0> (a java.net.SocksSocketImpl)
	at java.net.AbstractPlainSocketImpl.connectToAddress(java.base@11.0.7/AbstractPlainSocketImpl.java:242)
	at java.net.AbstractPlainSocketImpl.connect(java.base@11.0.7/AbstractPlainSocketImpl.java:224)
	at java.net.SocksSocketImpl.connect(java.base@11.0.7/SocksSocketImpl.java:403)
	at java.net.Socket.connect(java.base@11.0.7/Socket.java:609)
	at org.apache.activemq.transport.tcp.TcpTransport.connect(TcpTransport.java:525)
	at org.apache.activemq.transport.tcp.TcpTransport.doStart(TcpTransport.java:488)
	at org.apache.activemq.util.ServiceSupport.start(ServiceSupport.java:55)
	at org.apache.activemq.transport.AbstractInactivityMonitor.start(AbstractInactivityMonitor.java:169)
	at org.apache.activemq.transport.InactivityMonitor.start(InactivityMonitor.java:52)
	at org.apache.activemq.transport.TransportFilter.start(TransportFilter.java:64)
	at org.apache.activemq.transport.WireFormatNegotiator.start(WireFormatNegotiator.java:72)
	at org.apache.activemq.transport.failover.FailoverTransport.doReconnect(FailoverTransport.java:1019)
	- locked <0x000000061c6ffa30> (a java.lang.Object)
	at org.apache.activemq.transport.failover.FailoverTransport$2.iterate(FailoverTransport.java:149)
	- locked <0x000000061c6ffa40> (a java.lang.Object)
	at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133)
	at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.7/ThreadPoolExecutor.java:1128)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.7/ThreadPoolExecutor.java:628)
	at java.lang.Thread.run(java.base@11.0.7/Thread.java:834)

When the start() attempt takes longer than 5 seconds, we attempt to abort it be closing the connection. Unfortunately, this thread also gets blocked on a monitor:

"jms-health-indicator" #47 daemon prio=5 os_prio=31 cpu=1.87ms elapsed=7.58s tid=0x00007f81b0823000 nid=0x7f0b waiting for monitor entry  [0x0000700002f46000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.localStop(SingleConnectionFactory.java:681)
	- waiting to lock <0x000000061f9e93d8> (a java.lang.Object)
	at org.springframework.jms.connection.SingleConnectionFactory$SharedConnectionInvocationHandler.invoke(SingleConnectionFactory.java:616)
	at com.sun.proxy.$Proxy87.close(Unknown Source)
	at org.springframework.boot.actuate.jms.JmsHealthIndicator$MonitoredConnection.closeConnection(JmsHealthIndicator.java:87)
	at org.springframework.boot.actuate.jms.JmsHealthIndicator$MonitoredConnection.lambda$start$0(JmsHealthIndicator.java:74)
	at org.springframework.boot.actuate.jms.JmsHealthIndicator$MonitoredConnection$$Lambda$857/0x00000008005c6040.run(Unknown Source)
	at java.lang.Thread.run(java.base@11.0.7/Thread.java:834)

This monitor is held by the thread that's making the start() attempt. This means that there's no way for us to abort the start() attempt and that the start() attempt will only end when a connection has been established or the failover attempts have been exhausted. ActiveMQ's default behaviour is to make connection attempts forever so the start() attempt never ends. You can improve the situation by tuning the failover transport. For example setting maxReconnectAttempts=1 will allow the start to return and the health indicator will then report that the application is down.

Please raise this with the ActiveMQ team. Ideally, calling connection.close() on one thread would cause an ongoing attempt to start the connection on another thread to fail. If they're unwilling or unable to make that change, we can re-open this and take another look. We should be able to work around the problem in the health indicator but it'll add quite a bit of complexity.

@wilkinsona wilkinsona added for: external-project For an external project and not something we can fix status: invalid An issue that we don't feel is valid and removed status: feedback-provided Feedback has been provided status: waiting-for-triage An issue we've not yet triaged labels Feb 19, 2021
@marclallen
Copy link
Author

Ok, thanks.

It seems to affect far more than just the Health monitor. Several other parts of my main project don't really get running. I guess I'll need to set the retry connects low and then... I don't know. I hope there's an event for connect failure that I can glom onto and restart it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for: external-project For an external project and not something we can fix status: invalid An issue that we don't feel is valid
Projects
None yet
Development

No branches or pull requests

4 participants