CPU usage grows if clients don't reuse connections #392

Closed
stevej opened this Issue May 11, 2016 · 2 comments

Projects

None yet

1 participant

@stevej
Member
stevej commented May 11, 2016 edited

Thread.currentThread is our number 1 CPU user under load

How to reproduce: send a few thousand qps with a load tester like slow_cooker and wait half an hour.

Here are two profiles from 2000 qps load test on a 1 gig l5d with a StaticNamer. CPU usage is 850%
(wow!!!!). CPU started out at like 250% at the beginning of our load test.

(workers is the setting of numWorkers set at the command-line)

stevej@proxy-test-16:~/linkerd$ google-pprof ./profile_32_workers_before_slowness
Using local file ./profile_32_workers_before_slowness.
Using local file ./profile_32_workers_before_slowness.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 4471 samples
     518  11.6%  11.6%      531  11.9% org.jboss.netty.buffer.DynamicChannelBuffer.ensureWritableBytes
     470  10.5%  22.1%      470  10.5% java.lang.Thread.currentThread
     294   6.6%  28.7%      759  17.0% org.jboss.netty.handler.codec.http.HttpMessageEncoder.encodeAscii
     269   6.0%  34.7%      276   6.2% org.jboss.netty.handler.codec.http.DefaultHttpHeaders.hash
     236   5.3%  40.0%      354   7.9% com.twitter.util.Promise.continue
     173   3.9%  43.8%      173   3.9% sun.misc.Unsafe.compareAndSwapObject
     168   3.8%  47.6%      197   4.4% scala.collection.immutable.List.foreach
     150   3.4%  51.0%      158   3.5% java.util.Arrays.copyOfRange
     123   2.8%  53.7%      124   2.8% org.jboss.netty.handler.codec.http.HttpHeaders.validateHeaderValue
     100   2.2%  55.9%      236   5.3% org.jboss.netty.handler.codec.http.DefaultHttpHeaders.get
(pprof) quit
stevej@proxy-test-16:~/linkerd$ google-pprof ./profile_32_workers_after_slowness
Using local file ./profile_32_workers_after_slowness.
Using local file ./profile_32_workers_after_slowness.
Welcome to pprof!  For help, type 'help'.
(pprof) top
Total: 49657 samples
   11463  23.1%  23.1%    11463  23.1% java.lang.Thread.currentThread
    4660   9.4%  32.5%     4660   9.4% sun.misc.Unsafe.compareAndSwapObject
    4273   8.6%  41.1%     4334   8.7% org.jboss.netty.buffer.DynamicChannelBuffer.ensureWritableBytes
    4215   8.5%  49.6%     4376   8.8% java.util.Arrays.copyOfRange
    2445   4.9%  54.5%     3540   7.1% org.jboss.netty.handler.codec.http.DefaultHttpHeaders.get
    1644   3.3%  57.8%     1688   3.4% org.jboss.netty.handler.codec.http.DefaultHttpHeaders.hash
    1514   3.0%  60.8%     1514   3.0% java.lang.String.hashCode
    1149   2.3%  63.2%     1152   2.3% org.jboss.netty.handler.codec.http.HttpMessageDecoder.findNonWhitespace
    1138   2.3%  65.5%     1138   2.3% java.lang.String.equals
     816   1.6%  67.1%      820   1.7% scala.collection.LinearSeqOptimized$class.length

And the relevant lines from /admin/threads

stevej@proxy-test-16:~/linkerd$ grep -B 7 -A 20 -n currentThread threads
273-    },
274-    "27" : {
275-      "priority" : 5,
276-      "state" : "RUNNABLE",
277-      "daemon" : true,
278-      "thread" : "finagle/netty3-10",
279-      "stack" : [
280:        "java.lang.Thread.currentThread(Native Method)",
281-        "java.lang.ThreadLocal.get(ThreadLocal.java:160)",
282-        "com.twitter.util.Local$.com$twitter$util$Local$$get(Local.scala:48)",
283-        "com.twitter.util.Local.apply(Local.scala:132)",
284-        "com.twitter.app.Flag.localValue(Flag.scala:291)",
285-        "com.twitter.app.Flag.getValue(Flag.scala:325)",
286-        "com.twitter.app.GlobalFlag.getValue(Flag.scala:899)",
287-        "com.twitter.app.Flag.valueOrDefault(Flag.scala:345)",
288-        "com.twitter.app.Flag.apply(Flag.scala:376)",
289-        "com.twitter.finagle.tracing.Trace$.record(Trace.scala:309)",
290-        "com.twitter.finagle.tracing.Trace$.recordBinary(Trace.scala:356)",
291-        "com.twitter.finagle.buoyant.DstTracing$Path$Proxy.apply(DstTracing.scala:25)",
292-        "com.twitter.finagle.Filter$$anon$2.apply(Filter.scala:104)",
293-        "com.twitter.finagle.ServiceFactoryProxy.apply(Service.scala:208)",
294-        "com.twitter.finagle.buoyant.DstBindingFactory$Cached$$anon$1.apply(DstBindingFactory.scala:157)",
295-        "com.twitter.finagle.factory.IdlingFactory.apply(ServiceFactoryCache.scala:23)",
296-        "com.twitter.finagle.factory.ServiceFactoryCache.apply(ServiceFactoryCache.scala:95)",
297-        "com.twitter.finagle.buoyant.DstBindingFactory$Cached.apply(DstBindingFactory.scala:144)",
298-        "io.buoyant.router.RoutingFactory$RoutingService$$anonfun$apply$3.apply(RoutingFactory.scala:93)",
299-        "io.buoyant.router.RoutingFactory$RoutingService$$anonfun$apply$3.apply(RoutingFactory.scala:87)",
300-        "com.twitter.util.Future$$anonfun$flatMap$1.apply(Future.scala:986)",
--
705-    },
706-    "44" : {
707-      "priority" : 5,
708-      "state" : "RUNNABLE",
709-      "daemon" : true,
710-      "thread" : "finagle/netty3-27",
711-      "stack" : [
712:        "java.lang.Thread.currentThread(Native Method)",
713-        "java.lang.ThreadLocal.get(ThreadLocal.java:160)",
714-        "com.twitter.util.Local$.save(Local.scala:19)",
715-        "com.twitter.util.Promise$Transformer.apply(Promise.scala:120)",
716-        "com.twitter.util.Promise$Transformer.apply(Promise.scala:103)",
717-        "com.twitter.util.Promise$$anon$1.run(Promise.scala:381)",
718-        "com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:178)",
719-        "com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:136)",
720-        "com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:207)",
721-        "com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:92)",
722-        "com.twitter.util.Promise.runq(Promise.scala:350)",
723-        "com.twitter.util.Promise.updateIfEmpty(Promise.scala:721)",
724-        "com.twitter.util.Promise.update(Promise.scala:694)",
725-        "com.twitter.util.Promise.setValue(Promise.scala:670)",
726-        "com.twitter.concurrent.AsyncQueue.offer(AsyncQueue.scala:111)",
727-        "com.twitter.finagle.netty3.transport.ChannelTransport.handleUpstream(ChannelTransport.scala:55)",
728-        "org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)",
729-        "org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)",
730-        "org.jboss.netty.handler.codec.http.HttpContentDecoder.messageReceived(HttpContentDecoder.java:108)",
731-        "org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)",
732-        "org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)",
--
787-    },
788-    "23" : {
789-      "priority" : 5,
790-      "state" : "RUNNABLE",
791-      "daemon" : true,
792-      "thread" : "finagle/netty3-6",
793-      "stack" : [
794:        "java.lang.Thread.currentThread(Native Method)",
795-        "java.lang.ThreadLocal.get(ThreadLocal.java:160)",
796-        "com.twitter.util.Local$.com$twitter$util$Local$$get(Local.scala:48)",
797-        "com.twitter.util.Local.apply(Local.scala:132)",
798-        "com.twitter.finagle.context.Context$class.env(Context.scala:139)",
799-        "com.twitter.finagle.context.MarshalledContext.env(Context.scala:232)",
800-        "com.twitter.finagle.context.Context$class.getOrElse(Context.scala:164)",
801-        "com.twitter.finagle.context.MarshalledContext.getOrElse(Context.scala:232)",
802-        "com.twitter.finagle.tracing.Trace$.id(Trace.scala:113)",
803-        "com.twitter.finagle.tracing.Trace$.record(Trace.scala:312)",
804-        "com.twitter.finagle.tracing.Trace$.recordClientAddr(Trace.scala:344)",
805-        "com.twitter.finagle.dispatch.GenSerialClientDispatcher.com$twitter$finagle$dispatch$GenSerialClientDispatcher$$tryDispatch(ClientDispatcher.scala:67)",
806-        "com.twitter.finagle.dispatch.GenSerialClientDispatcher$$anonfun$apply$1.apply(ClientDispatcher.scala:82)",
807-        "com.twitter.finagle.dispatch.GenSerialClientDispatcher$$anonfun$apply$1.apply(ClientDispatcher.scala:80)",
808-        "com.twitter.util.ConstFuture$$anon$5.run(Future.scala:1351)",
809-        "com.twitter.concurrent.LocalScheduler$Activation.run(Scheduler.scala:178)",
810-        "com.twitter.concurrent.LocalScheduler$Activation.submit(Scheduler.scala:136)",
811-        "com.twitter.concurrent.LocalScheduler.submit(Scheduler.scala:207)",
812-        "com.twitter.concurrent.Scheduler$.submit(Scheduler.scala:92)",
813-        "com.twitter.util.Promise.runq(Promise.scala:350)",
814-        "com.twitter.util.Promise.updateIfEmpty(Promise.scala:721)",
@stevej stevej changed the title from Thread.currentThread is our number 1 CPU user under load to CPU usage grows if clients don't reuse connections May 18, 2016
@stevej stevej added the bug label May 18, 2016
@stevej stevej self-assigned this May 18, 2016
@stevej
Member
stevej commented May 18, 2016

When we upgraded slow_cooker, (our load tester) to reuse connections, this issue went away so we believe there's a Promise chain being created due to connection churn. I'm currently tracking down whether this is in l5d code or inside of finagle.

@stevej
Member
stevej commented May 19, 2016

This appears fixed with #402.

@stevej stevej closed this May 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment