Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Receiving "INTERNAL: Panic! This is a bug!" messages #4544

Closed
dadadom opened this issue Jun 7, 2018 · 4 comments
Closed

Receiving "INTERNAL: Panic! This is a bug!" messages #4544

dadadom opened this issue Jun 7, 2018 · 4 comments

Comments

@dadadom
Copy link

dadadom commented Jun 7, 2018

We are experiencing TaskRejectedExceptions although the thread pool is not exhausted.

We are sending about one unary request every 1-2 seconds and the request is returned within less than 100 ms.

The message we see (on the client) is:

Caused by: io.grpc.StatusRuntimeException: INTERNAL: Panic! This is a bug!
	at io.grpc.Status.asRuntimeException(Status.java:526)
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:467)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at me.dinowernli.grpc.prometheus.MonitoringClientCallListener.onClose(MonitoringClientCallListener.java:50)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:684)
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
	at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:391)
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:471)
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:553)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:474)
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:591)
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
	... 3 common frames omitted
Caused by: org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@47ecf012[Running, pool size = 20, active threads = 2, queued tasks = 8, completed tasks = 87236]] did not accept task: io.grpc.internal.DelayedClientTransport$5@5ab49817
	at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:296)
	at io.grpc.internal.DelayedClientTransport.reprocess(DelayedClientTransport.java:300)
	at io.grpc.internal.ManagedChannelImpl.updateSubchannelPicker(ManagedChannelImpl.java:712)
	at io.grpc.internal.ManagedChannelImpl.access$4500(ManagedChannelImpl.java:81)
	at io.grpc.internal.ManagedChannelImpl$LbHelperImpl$3.run(ManagedChannelImpl.java:1045)
	at io.grpc.internal.ChannelExecutor.drain(ChannelExecutor.java:73)
	at io.grpc.internal.InternalSubchannel$TransportListener.transportReady(InternalSubchannel.java:510)
	at io.grpc.netty.ClientTransportLifecycleManager.notifyReady(ClientTransportLifecycleManager.java:43)
	at io.grpc.netty.NettyClientHandler$FrameListener.onSettingsRead(NettyClientHandler.java:707)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$FrameReadListener.onSettingsRead(DefaultHttp2ConnectionDecoder.java:423)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder$PrefaceFrameListener.onSettingsRead(DefaultHttp2ConnectionDecoder.java:635)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger$1.onSettingsRead(Http2InboundFrameLogger.java:93)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readSettingsFrame(DefaultHttp2FrameReader.java:542)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:263)
	at io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160)
	at io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41)
	at io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390)
	at io.netty.handler.codec.http2.Http2ConnectionHandler$PrefaceDecoder.decode(Http2ConnectionHandler.java:254)
	at io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450)
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	... 1 common frames omitted
Caused by: java.util.concurrent.RejectedExecutionException: Task io.grpc.internal.DelayedClientTransport$5@5ab49817 rejected from java.util.concurrent.ThreadPoolExecutor@47ecf012[Running, pool size = 20, active threads = 0, queued tasks = 9, completed tasks = 87236]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063)
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830)
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379)
	at org.springframework.scheduling.concurrent.ThreadPoolTaskExecutor.execute(ThreadPoolTaskExecutor.java:293)
	... 37 common frames omitted project:abc

The call is done like this:

MyServiceGrpc.newFutureStub(managedChannel).withDeadline(myDeadline).getMyGrpcServiceFutureStub().myUnaryRequest(request);

The executor which is set for the managed channel is configured as follows:

ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(5);
        executor.setMaxPoolSize(20);
        executor.setQueueCapacity(10);
        executor.setDaemon(true);
        executor.setThreadNamePrefix("myExecutor-");
        executor.initialize();

It seems to be weird that the queue is rejecting new entries although there are idle threads. Also, knowing that there is only one onNext() and one onCompleted() per request, why are there so many entries being scheduled on that executor? If I understand it correctly, the executor is used to process the on*() events, correct?

Caused by: org.springframework.core.task.TaskRejectedException: Executor [java.util.concurrent.ThreadPoolExecutor@47ecf012[Running, pool size = 20, active threads = 2, queued tasks = 8, completed tasks = 87236]] did not accept task:

I am not sure if this is actually a gRPC issue or an Executor issue, but gRPC is the only place where we see this.

What version of gRPC are you using?

grpc-java 1.12.0

What did you expect to see?

No Panic

@dadadom
Copy link
Author

dadadom commented Jun 7, 2018

One more thing:
The issues happen since we did a loadtest on the system. It was running without these errors before, then we saw some during the loadtest (where it is kind of expected) but the errors don't vanish although there is close to no load on the system.
At the same time, the number of items in the executor queue only seems to spike because we also have a gauge metric which gives us the queue.size() every 15 seconds and that is almost always zero. So there seem to be really bursty events happening, which I think is a gRPC issue.

@dapengzhang0
Copy link
Member

@zhangkun83

@carl-mastrangelo
Copy link
Contributor

Hi @dadadom,

I think this is expected behavior and the solution is to raise your maxQueueSize. That size is incredibly low. If there was a spike in callbacks (like 10 in a short period) you would get this error.

Executors don't promise that if there is a free thread, you won't get a RejectedExecutionException. The way ThreadPoolExecutor's work is that there is a group of threads all trying to pull work off the queue. But, how do they know if there is work available? They have to wait on the queue (i.e. by calling take()) in order to process it. The problem is that if one thread very quickly adds work, the other threads won't be able to wake up in time to consume it. The producer thread could very quickly add 11 items of work before any consumers have time to pull it off.

@zhangkun83
Copy link
Contributor

Panic is legitimate here. If the executor queue is full, the channel can't submit any work to it. It can't even fail the RPC, because the executor can't even run ClientCall.Listener.onClose(). Logging alone won't be sufficient, because the RPC would be dropped by the channel and stuck in limbo.

@lock lock bot locked as resolved and limited conversation to collaborators Sep 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants