-
Notifications
You must be signed in to change notification settings - Fork 41.5k
Description
We recently upgraded from Spring Boot 2.2.7 to 2.2.8, running on PCF (Azure, OS is linux). Now we just ran into an issue where the app crashed in the end with "Too many open files". "files" actually were open TCP sockets, over 1 million. As there are quite some other dependency upgrades, it's very hard to figure out where it goes wrong. We actually had 32 instances crashing and reproduced it in another environment pretty easily.
After a couple of hours the number of open sockets didn't change.
The application uses webflux, so netty. To see if it was about netty I downgraded Spring Boot to 2.2.7 and only updated all netty dependencies to 4.1.50. With that configuration it worked fine, the amount of sockets stayed around 30.000.
I can't reproduce this on my Mac, but with some load easily on PCF on Linux. So I think it's related to the OS.
Some stack traces:
io.netty.channel.DefaultChannelPipeline : An exceptionCaught() event was fired, and it reached at the tail of the pipeline.
It usually means the last handler in the pipeline did not handle the exception. io.netty.channel.unix.Errors$NativeIoException: accept(..) failed: Too many open files
sun.rmi.transport.tcp : RMI TCP Accept-5000: accept loop for ServerSocket[addr=0.0.0.0/0.0.0.0,localport=5000] throws java.net.SocketException: Too many open files (Accept failed)
at java.base/java.net.PlainSocketImpl.socketAccept(Native Method)
at java.base/java.net.AbstractPlainSocketImpl.accept(Unknown Source)
at java.base/java.net.ServerSocket.implAccept(Unknown Source)
at java.base/java.net.ServerSocket.accept(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(Unknown Source)
at java.rmi/sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(Unknown Source)
at java.base/java.lang.Thread.run(Unknown Source)
a.w.r.e.AbstractErrorWebExceptionHandler : [4974671b-12062] 500 Server Error for HTTP POST "/some/path" io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
Wrapped by: io.netty.channel.ChannelException: io.netty.channel.unix.Errors$NativeIoException: newSocketStream(..) failed: Too many open files
at io.netty.channel.unix.Socket.newSocketStream0(Socket.java:421)
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:319)
at io.netty.channel.epoll.LinuxSocket.newSocketStream(LinuxSocket.java:323)
at io.netty.channel.epoll.EpollSocketChannel.<init>(EpollSocketChannel.java:45)
at reactor.netty.resources.DefaultLoopEpoll.getChannel(DefaultLoopEpoll.java:45)
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:187)
at reactor.netty.resources.LoopResources.onChannel(LoopResources.java:169)
at reactor.netty.tcp.TcpResources.onChannel(TcpResources.java:215)
at reactor.netty.http.client.HttpClientConnect$HttpTcpClient.connect(HttpClientConnect.java:141)
at reactor.netty.tcp.TcpClientOperator.connect(TcpClientOperator.java:43)
Wrapped by: com.netflix.hystrix.exception.HystrixRuntimeException: payment-request-merchant-site.payment-request-merchant-site-v2 failed and fallback failed.
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)
|_ Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException:
Error has been observed
at the following site(s):
|_ |
|_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.authorization.AuthorizationWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.authorization.ExceptionTranslationWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.authentication.logout.LogoutWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.savedrequest.ServerRequestCacheWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.context.SecurityContextServerWebExchangeWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.context.ReactorContextWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.config.web.server.ServerHttpSecurity$ServerWebExchangeReactorContextWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.security.web.server.WebFilterChainProxy [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.cloud.sleuth.instrument.web.TraceWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain]
|_ |
|_ checkpoint ⇢ HTTP POST "/some/path" [ExceptionHandlingWebHandler]
Stack trace:
|_
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:832)
|_
at com.netflix.hystrix.AbstractCommand$22.call(AbstractCommand.java:807)
|_
at rx.internal.operators.OperatorOnErrorResumeNextViaFunction$4.onError(OperatorOnErrorResumeNextViaFunction.java:140)
|_
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
|_
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
|_
at com.netflix.hystrix.AbstractCommand$DeprecatedOnFallbackHookApplication$1.onError(AbstractCommand.java:1472)
|_
at com.netflix.hystrix.AbstractCommand$FallbackHookApplication$1.onError(AbstractCommand.java:1397)
|_
at rx.internal.operators.OnSubscribeDoOnEach$DoOnEachSubscriber.onError(OnSubscribeDoOnEach.java:87)
|_
at rx.internal.reactivestreams.SubscriberAdapter.onError(SubscriberAdapter.java:59)
|_
at reactor.core.publisher.StrictSubscriber.onError(StrictSubscriber.java:106)