[1.0.2] Periodical temporary Linkerd outages when `io.l5d.mesh` is used #1346

ashald · 2017-05-25T20:21:13Z

We believe it's a bug introduced in version 1.0.2 - we cannot reproduce it with version 1.0.0.

Linkerd config

admin:
  port: 9990

namers:
- kind: io.l5d.consul
  useHealthCheck: true
  consistencyMode: stale
  prefix: /consul

routers:
- protocol: http
  label: egress
  maxRequestKB: 51200
  maxResponseKB: 51200
  maxInitialLineKB: 10
  maxHeadersKB: 65
  dstPrefix: /http
  identifier:
    - kind: io.l5d.header.token
      header: Host
  interpreter:
    kind: io.l5d.mesh
    dst: /#/consul/.local/namerd-grpc
    root: /default
    experimental: true
  bindingCache:
    clients: 1000
  servers:
  - port: 4140
    ip: 0.0.0.0

Namerd config

admin:
  port: 9001
storage:
  ...
namers:
  ...
interfaces:
- kind: io.l5d.thriftNameInterpreter
  cache:
    bindingCacheActive: 10000
  ip: 0.0.0.0
  port: 4100
- kind: io.l5d.httpController
  ip: 0.0.0.0
  port: 4180
- kind: io.l5d.mesh
  ip: 0.0.0.0
  port: 4101

Every several hours Linkerd instance "becomes broken". It can forward requests to the names that were resolved through it before but cannot forward requests to names there are yet unknown to it (e.g., new service is deployed).

All requests to "new names" result in:

com.twitter.finagle.RequestTimeoutException: exceeded 10.seconds to unspecified while dyn binding /http/new-service-123.acme.co. Remote Info: Not Available

Looking at logs (Consul log level set to ALL) we see that it gets updates from Consul about services being added or removed so we assume that Linkerd knows about health Namerd instances.

After a while (we weren't able to figure out whether this interval is constant or no) we see a stacktrace like this in logs (root log level set to ALL):

May 25 15:33:52 linkerd01 linkerd: E 0525 19:33:52.348 UTC THREAD278 TraceId:98a06fbda2c87d2e: [C L:/10.10.1.1:46384 R:/10.10.2.2:4101 S:15] unexpected error
May 25 15:33:52 linkerd01 linkerd: com.twitter.finagle.ChannelWriteException: com.twitter.finagle.ChannelClosedException: null at remote address: /10.10.2.2:4101. Remote Info: Not Available from service: io.l5d.mesh. Remote Info:
Upstream Address: /127.0.0.1:34492, Upstream Client Id: Not Available, Downstream Address: /10.10.2.2:4101, Downstream Client Id: io.l5d.mesh, Trace Id: 98a06fbda2c87d2e.dd46fab898edad2f<:98a06fbda2c87d2e
May 25 15:33:52 linkerd01 linkerd: Caused by: com.twitter.finagle.ChannelClosedException: null at remote address: /10.10.2.2:4101. Remote Info: Not Available
May 25 15:33:52 linkerd01 linkerd: at com.twitter.finagle.ChannelException$.apply(Exceptions.scala:253)
May 25 15:33:52 linkerd01 linkerd: at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$2.operationComplete(ChannelTransport.scala:105)
May 25 15:33:52 linkerd01 linkerd: at com.twitter.finagle.netty4.transport.ChannelTransport$$anon$2.operationComplete(ChannelTransport.scala:102)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:113)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.Http2CodecUtil$SimpleChannelPromiseAggregator.setPromise(Http2CodecUtil.java:395)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.Http2CodecUtil$SimpleChannelPromiseAggregator.doneAllocatingPromises(Http2CodecUtil.java:314)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.DefaultHttp2FrameWriter.writeHeadersInternal(DefaultHttp2FrameWriter.java:484)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.DefaultHttp2FrameWriter.writeHeaders(DefaultHttp2FrameWriter.java:200)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.Http2OutboundFrameLogger.writeHeaders(Http2OutboundFrameLogger.java:60)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.DefaultHttp2ConnectionEncoder.writeHeaders(DefaultHttp2ConnectionEncoder.java:186)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.DefaultHttp2ConnectionEncoder.writeHeaders(DefaultHttp2ConnectionEncoder.java:146)
May 25 15:33:52 linkerd01 linkerd: at io.netty.handler.codec.http2.H2FrameCodec.write(H2FrameCodec.scala:141)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:738)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:730)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:38)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:1089)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:1136)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:1078)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 linkerd01 linkerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 linkerd01 linkerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 linkerd01 linkerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 linkerd01 linkerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 linkerd01 linkerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 linkerd01 linkerd: Caused by: java.nio.channels.ClosedChannelException
May 25 15:33:52 linkerd01 linkerd: at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

where:

10.10.1.1 - linkerd01
10.10.2.1 - namerd01
10.10.2.2 - namerd02

Once such an exception is logged Linkerd is "recovered" and now is able to resolve names not yet known to it.

The text was updated successfully, but these errors were encountered:

ashald · 2017-05-25T20:41:38Z

Around the same time there were few stack-traces in `namerd02`

May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.350 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.351 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.354 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.355 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.357 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)
May 25 15:33:52 namerd02 namerd: W 0525 19:33:52.358 UTC THREAD37: Unhandled exception in connection with /10.10.1.1:46384, shutting down connection
May 25 15:33:52 namerd02 namerd: io.netty.handler.codec.http2.Http2Exception$StreamException: Stream closed before write could take place
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2Exception.streamError(Http2Exception.java:149)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:492)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$FlowState.cancel(DefaultHttp2RemoteFlowController.java:473)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2RemoteFlowController$1.onStreamClosed(DefaultHttp2RemoteFlowController.java:103)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.notifyClosed(DefaultHttp2Connection.java:349)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.removeFromActiveStreams(DefaultHttp2Connection.java:957)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$ActiveStreams.deactivate(DefaultHttp2Connection.java:913)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection$DefaultStream.close(DefaultHttp2Connection.java:469)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.DefaultHttp2Connection.close(DefaultHttp2Connection.java:153)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler$BaseDecoder.channelInactive(Http2ConnectionHandler.java:191)
May 25 15:33:52 namerd02 namerd: at io.netty.handler.codec.http2.Http2ConnectionHandler.channelInactive(Http2ConnectionHandler.java:392)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:75)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.netty4.channel.ChannelStatsHandler.channelInactive(ChannelStatsHandler.scala:113)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1329)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:908)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:744)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
May 25 15:33:52 namerd02 namerd: at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
May 25 15:33:52 namerd02 namerd: at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
May 25 15:33:52 namerd02 namerd: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
May 25 15:33:52 namerd02 namerd: at com.twitter.finagle.util.BlockingTimeTrackingThreadFactory$$anon$1.run(BlockingTimeTrackingThreadFactory.scala:24)
May 25 15:33:52 namerd02 namerd: at java.lang.Thread.run(Thread.java:745)

DukeyToo · 2017-06-06T15:45:11Z

I am able to reproduce this as well in 1.0.2, but not in 1.0.0 linkerd. Sanitized linkerd logs look like this:

Jun  3 00:46:06 cwl-mesos-minions-v019-1343 linkerd: com.twitter.finagle.RequestTimeoutException: exceeded 10.seconds to unspecified while dyn binding /http/something. Remote Info: Not Available
Jun  3 00:46:06 cwl-mesos-minions-v019-1343 linkerd: E 0603 00:46:06.705 UTC THREAD10 TraceId:15a4bd604b272638: service failure
Jun  3 00:46:13 cwl-mesos-minions-v019-1343 linkerd: E 0603 00:46:13.686 UTC THREAD10 TraceId:aeb9dffa88f5b1e6: service failure
Jun  3 00:46:13 cwl-mesos-minions-v019-1343 linkerd: com.twitter.finagle.RequestTimeoutException: exceeded 10.seconds to unspecified while dyn binding /http/something. Remote Info: Not Available
Jun  3 00:46:43 cwl-mesos-minions-v019-1343 linkerd: E 0603 00:46:43.875 UTC THREAD10 TraceId:ab394c9057dc9fdc: service failure
Jun  3 00:46:43 cwl-mesos-minions-v019-1343 linkerd: com.twitter.finagle.RequestTimeoutException: exceeded 10.seconds to unspecified while dyn binding /http/something. Remote Info: Not Available
Jun  3 00:47:42 cwl-mesos-minions-v019-1343 linkerd: I 0603 00:47:42.929 UTC THREAD10 TraceId:c73fac633d696d92: FailureAccrualFactory marking connection to "io.l5d.mesh" as dead. Remote Address: Inet(cwl-mesos-masters.service.consul/172.21.0.21:4182,Map())
Jun  3 00:47:42 cwl-mesos-minions-v019-1343 linkerd: W 0603 00:47:42.930 UTC THREAD10 TraceId:c73fac633d696d92: Exception propagated to the default monitor (upstream address: /172.21.0.32:15090, downstream address: namerd.service.consul/172.21.0.21:4182, label: io.l5d.mesh).
Jun  3 00:47:42 cwl-mesos-minions-v019-1343 linkerd: Reset.Cancel
Jun  3 00:47:43 cwl-mesos-minions-v019-1343 linkerd: E 0603 00:47:43.685 UTC THREAD10 TraceId:2a0e066147086bb4: service failure

etc.

klingerf · 2017-06-06T18:47:20Z

@DukeyToo Thanks for confirming that you're also seeing the issue. Can you provide a bit more info about your setup, as well as steps to reproduce if possible? If you don't mind sharing your linkerd and namerd configs, that would be very helpful. You can also send them to me directly; I'm @kl on the linkerd slack.

DukeyToo · 2017-06-07T16:08:05Z

@klingerf , configs below. We're running a linkerd per host, with a cluster of 3 namerd hosts, which also run zookeeper and some other stuff. The linkerd hosts come and go but the namerd hosts typically stay for a long time.

Like @ashald we did not see the issue immediately. It occurred after some undetermined time. Restarting all of the linkerd processes fixed it for a while, until the next occurrence. We reverted pretty quickly to 1.0.0, so it only occurred for us about 3 times total over 2 days.

Linkerd yaml:

  admin:
    port: 9990
  routers:
  - protocol: http
    label: microservices_dev
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_microservices_dev.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /microservices-dev
    identifier:
      - kind: io.l5d.path
        segments: 3
      - kind: io.l5d.path
        segments: 2
      - kind: io.l5d.path
        segments: 1
    servers:
    - port: 4140
      ip: 0.0.0.0
  - protocol: http
    label: microservices_stage
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_microservices_stage.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /microservices-stage
    identifier:
      - kind: io.l5d.path
        segments: 3
      - kind: io.l5d.path
        segments: 2
      - kind: io.l5d.path
        segments: 1
    servers:
    - port: 4160
      ip: 0.0.0.0
  - protocol: http
    label: proxy_by_host
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_host.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /default
    identifier:
      - kind: io.l5d.header.token
        header: Host
    servers:
    - port: 4141
      ip: 0.0.0.0
    - port: 80
      ip: 0.0.0.0
  - protocol: http
    label: proxy_by_path3
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_path3.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /default
    identifier:
      - kind: io.l5d.path
        segments: 3
    servers:
    - port: 4153
      ip: 0.0.0.0
  - protocol: http
    label: proxy_by_path2
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_path2.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /default
    identifier:
      - kind: io.l5d.path
        segments: 2
    servers:
    - port: 4152
      ip: 0.0.0.0
  - protocol: http
    label: proxy_by_path1
    dstPrefix: /http
    httpAccessLog: /var/log/linkerd_access_path1.log
    maxHeadersKB: 16
    interpreter:
      kind: io.l5d.mesh
      dst: /$/inet/namerd.service.consul/4182
      experimental: true
      root: /default
    identifier:
      - kind: io.l5d.path
        segments: 1
    servers:
    - port: 4151
      ip: 0.0.0.0
  telemetry:
  - kind: io.l5d.recentRequests
    sampleRate: 1.0
  - kind: io.l5d.prometheus
  - kind: io.l5d.influxdb

Namerd config:

admin:
  port: 9991
storage:
  kind: io.l5d.zk
  zkAddrs:
  - host: 127.0.0.1
    port: 2181
  pathPrefix: /dtabs
  sessionTimeoutMs: 10000
namers:
- kind:      io.l5d.marathon
  prefix:    /io.l5d.marathon
  host:      127.0.0.1
  port:      8080
  useHealthCheck: true
  ttlMs:     5000
- kind: io.l5d.rewrite
  prefix: /ipport
  pattern: "/{ip}/{port}"
  name: "/$/inet/{ip}/{port}"
- kind: io.l5d.consul
  prefix: /consul
  useHealthCheck: true
  consistencyMode: stale
  failFast: true
  preferServiceAddress: true
- kind: io.l5d.consul
  prefix: /consultag
  useHealthCheck: true
  consistencyMode: stale
  includeTag: true
  failFast: true
  preferServiceAddress: true
interfaces:
- kind: io.l5d.thriftNameInterpreter
  ip: 0.0.0.0
  port: 4100
- kind: io.l5d.httpController
  ip: 0.0.0.0
  port: 4181
- kind: io.l5d.mesh
  ip: 0.0.0.0
  port: 4182

ashald · 2017-06-07T23:04:23Z

Finally, me and @edio were able to reproduce the issue using following steps:

1. Set Vars

export BASE=$((UID + 1000))
export NAMERD_ADMIN_PORT=$((BASE + 1))
export NAMERD_HTTP_CONTROLLER_PORT=$((BASE + 2))
export NAMERD_GRPC_PORT=$((BASE + 3))

export LINKERD_ADMIN_PORT=$((BASE + 4))
export LINKERD_PROXY_PORT=$((BASE + 5))

export SERVICE_PORT=$((BASE + 6))

2. Prepare configs

Namerd

cat > namerd.yaml << EOF
admin:
  port: ${NAMERD_ADMIN_PORT}

storage:
  kind: io.l5d.inMemory
  namespaces:
    default: |
      /http=>/$/io.buoyant.hostportPfx/inet;
      /inet=>/$/inet;

namers:
- kind: io.l5d.rewrite
  prefix: /unused
  pattern: "/{service}/api"
  name: "/srv/{service}"

interfaces:
- kind: io.l5d.httpController
  ip: 0.0.0.0
  port: ${NAMERD_HTTP_CONTROLLER_PORT}
- kind: io.l5d.mesh
  ip: 0.0.0.0
  port: ${NAMERD_GRPC_PORT}

EOF

Linkerd

cat > linkerd.yaml << EOF
admin:
  port: ${LINKERD_ADMIN_PORT}

routers:
- protocol: http
  label: egress
  dstPrefix: /http
  identifier:
    - kind: io.l5d.header.token
      header: Host
  interpreter:
    kind: io.l5d.mesh
    dst: /$/inet/127.0.0.1/${NAMERD_GRPC_PORT}
    root: /default
    experimental: true
  bindingCache:
    paths: 10
    trees: 10
    bounds: 10
    clients: 10
  servers:
  - port: ${LINKERD_PROXY_PORT}
    ip: 0.0.0.0
EOF

3. Start namerd, linkerd and a service

Namerd

./namerd-1.0.2-exec namerd.yaml

Linkerd

./linkerd-1.0.2-exec linkerd.yaml

Service

python -m SimpleHTTPServer ${SERVICE_PORT}

Run Test

Prepare Test

cat > test.sh << EOF
#!/usr/bin/env bash

export http_proxy=localhost:${LINKERD_PROXY_PORT}

START=1
END=\${1}

for ((i=START; i<=END; i++))
do
    echo -n "\${i} "
    echo \$(curl -s "http://127.0.0.\${i}:${SERVICE_PORT}" | grep 10.seconds)
    sleep 1
done
EOF

chmod +x ./test.sh

4. Run Test

$ ./test.sh 20
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16 exceeded 10.seconds to unspecified while dyn binding /http/127.0.0.16:58080. Remote Info: Not Available
17
18
19
20

klingerf · 2017-06-07T23:21:57Z

Thanks @ashald! Here's the namerd debug log for that test:

D 0607 23:15:49.049 UTC THREAD34: h2 server pipeline: installing framer: DefaultChannelPipeline{(channelStats = com.twitter.finagle.netty4.channel.ChannelStatsHandler), (AnyToHeapInboundHandler$#0 = com.twitter.finagle.netty4.channel.AnyToHeapInboundHandler$), (H2FrameCodec$ConnectionHandler#0 = io.netty.handler.codec.http2.H2FrameCodec$ConnectionHandler), (h2 framer = io.netty.handler.codec.http2.H2FrameCodec), (channelRequestStatsHandler = com.twitter.finagle.netty4.channel.ChannelRequestStatsHandler), (exceptionHandler = com.twitter.finagle.netty4.channel.ChannelExceptionHandler), (finagleChannelTransport = com.twitter.finagle.netty4.transport.ChannelTransport$$anon$1)}
D 0607 23:15:49.070 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:3] initialized stream
D 0607 23:15:49.303 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:5] initialized stream
D 0607 23:15:50.394 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:7] initialized stream
D 0607 23:15:50.415 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:9] initialized stream
D 0607 23:15:56.520 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:11] initialized stream
D 0607 23:15:56.540 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:13] initialized stream
D 0607 23:16:02.627 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:15] initialized stream
D 0607 23:16:02.649 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:17] initialized stream
D 0607 23:16:08.727 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:19] initialized stream
D 0607 23:16:08.746 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:21] initialized stream
D 0607 23:16:14.831 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:23] initialized stream
D 0607 23:16:14.850 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:25] initialized stream
D 0607 23:16:20.926 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:27] initialized stream
D 0607 23:16:20.953 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:29] initialized stream
D 0607 23:16:27.037 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:31] initialized stream
D 0607 23:16:27.055 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:33] initialized stream
D 0607 23:16:33.137 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:35] initialized stream
D 0607 23:16:33.152 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:37] initialized stream
D 0607 23:16:39.233 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:39] initialized stream
D 0607 23:16:39.249 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:41] initialized stream
D 0607 23:16:45.348 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:43] initialized stream
D 0607 23:16:45.396 UTC THREAD34: [S L:/127.0.0.1:5003 R:/127.0.0.1:61660 S:45] initialized stream
D 0607 23:17:02.522 UTC THREAD38: h2 server pipeline: installing framer: DefaultChannelPipeline{(channelStats = com.twitter.finagle.netty4.channel.ChannelStatsHandler), (AnyToHeapInboundHandler$#0 = com.twitter.finagle.netty4.channel.AnyToHeapInboundHandler$), (H2FrameCodec$ConnectionHandler#0 = io.netty.handler.codec.http2.H2FrameCodec$ConnectionHandler), (h2 framer = io.netty.handler.codec.http2.H2FrameCodec), (channelRequestStatsHandler = com.twitter.finagle.netty4.channel.ChannelRequestStatsHandler), (exceptionHandler = com.twitter.finagle.netty4.channel.ChannelExceptionHandler), (finagleChannelTransport = com.twitter.finagle.netty4.transport.ChannelTransport$$anon$1)}
D 0607 23:17:02.523 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:3] initialized stream
D 0607 23:17:02.560 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:5] initialized stream
D 0607 23:17:08.659 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:7] initialized stream
D 0607 23:17:08.691 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:9] initialized stream
D 0607 23:17:14.790 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:11] initialized stream
D 0607 23:17:14.822 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:13] initialized stream
D 0607 23:17:20.921 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:15] initialized stream
D 0607 23:17:20.952 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:17] initialized stream
D 0607 23:17:27.050 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:19] initialized stream
D 0607 23:17:27.084 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:21] initialized stream
D 0607 23:17:33.174 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:23] initialized stream
D 0607 23:17:33.205 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:25] initialized stream
D 0607 23:17:39.300 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:27] initialized stream
D 0607 23:17:39.334 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:29] initialized stream
D 0607 23:17:45.437 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:31] initialized stream
D 0607 23:17:45.469 UTC THREAD38: [S L:/127.0.0.1:5003 R:/127.0.0.1:61752 S:33] initialized stream

So a new connection is initialized when the dyn binding error is encountered.

klingerf · 2017-06-08T02:00:49Z

Ok, it looks like what's happening here is that when linkerd's client cache is full and it receives a request that requires building a new client, it evicts an existing client and tears it down. If the client has an open stream to namerd (as is the case with the io.l5d.mesh api), the stream is closed on the linkerd side.

As far as I can tell, the stream is not closed on the namerd side, and namerd continues to send updates on the stream, which look like this:

----------------INBOUND--------------------
[id: 0xe254a629, L:/127.0.0.1:64150 - R:/127.0.0.1:5003] DATA: streamId=5, padding=0, endStream=false, length=22, bytes=0000000011220f0a0d080012047f000001188e272200
------------------------------------

Once one of these is sent to linkerd after the stream has been closed, the recv method is no longer able to successfully receive the frame, and it falls into an infinite loop trying to receive it. That happens here:

https://github.com/linkerd/linkerd/blob/master/finagle/h2/src/main/scala/com/twitter/finagle/buoyant/h2/netty4/Netty4StreamTransport.scala#L479

The call to recvFrame returns false and prints "remote offer failed" over and over again:

D 0608 01:31:28.714 UTC THREAD88: [C L:/127.0.0.1:64018 R:/127.0.0.1:5003 S:5] remote offer failed
D 0608 01:31:28.714 UTC THREAD88: [C L:/127.0.0.1:64018 R:/127.0.0.1:5003 S:5] remote offer failed
D 0608 01:31:28.714 UTC THREAD88: [C L:/127.0.0.1:64018 R:/127.0.0.1:5003 S:5] remote offer failed

The way in which remote streams are closed changed as part of #1280, and that's evidently where this bug was introduced.

Am still investigating, but it seems like there are two approaches to fixing:

Figure out why the remote stream on namerd's side is not being closed when the linkerd tears down the client and closes the stream.
Automatically close remote streams when the remote offer fails, rather than retrying indefinitely.

I think we should probably do both but will keep poking around.

ashald · 2017-06-08T02:05:43Z

ZOMG, it looked obscure, but I never thought it might be so deep. o_O Thanks for looking into it!

Problem If a linkerd h2 connection receives a data frame from the remote and is unable to accept it due to the local frame queue already being reset, it falls into a recursive infinite loop trying to accept the frame. Solution Instead of retrying on failure, send a reset frame back to the remote so that it will not continue to send frames.

) * Update version test to block on http server initialization * Run version test on different port Signed-off-by: Kevin Lingerfelt <kl@buoyant.io>

adleong added this to Triage in Linkerd 1.x Backlog. See https://github.com/linkerd/linkerd2/blob/main/ROADMAP.md May 25, 2017

klingerf self-assigned this Jun 6, 2017

klingerf mentioned this issue Jun 8, 2017

Reset h2 streams when frames cannot be received (#1346) #1370

Merged

klingerf closed this as completed in #1370 Jun 12, 2017

adleong removed this from Triage in Linkerd 1.x Backlog. See https://github.com/linkerd/linkerd2/blob/main/ROADMAP.md Jun 13, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.0.2] Periodical temporary Linkerd outages when `io.l5d.mesh` is used #1346

[1.0.2] Periodical temporary Linkerd outages when `io.l5d.mesh` is used #1346

ashald commented May 25, 2017

ashald commented May 25, 2017 •

edited

DukeyToo commented Jun 6, 2017

klingerf commented Jun 6, 2017

DukeyToo commented Jun 7, 2017

ashald commented Jun 7, 2017

klingerf commented Jun 7, 2017

klingerf commented Jun 8, 2017

ashald commented Jun 8, 2017

[1.0.2] Periodical temporary Linkerd outages when io.l5d.mesh is used #1346

[1.0.2] Periodical temporary Linkerd outages when io.l5d.mesh is used #1346

Comments

ashald commented May 25, 2017

ashald commented May 25, 2017 • edited

DukeyToo commented Jun 6, 2017

klingerf commented Jun 6, 2017

DukeyToo commented Jun 7, 2017

ashald commented Jun 7, 2017

1. Set Vars

2. Prepare configs

Namerd

Linkerd

3. Start namerd, linkerd and a service

Namerd

Linkerd

Service

Run Test

Prepare Test

4. Run Test

klingerf commented Jun 7, 2017

klingerf commented Jun 8, 2017

ashald commented Jun 8, 2017

[1.0.2] Periodical temporary Linkerd outages when `io.l5d.mesh` is used #1346

[1.0.2] Periodical temporary Linkerd outages when `io.l5d.mesh` is used #1346

ashald commented May 25, 2017 •

edited