Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[2.4] Still a thread leak in development environment #4649

Open
lizepeng opened this issue Jun 11, 2015 · 42 comments

Comments

Projects
None yet
7 participants
@lizepeng
Copy link

commented Jun 11, 2015

I know there already is a issue #3799, and I thought it should be solved in 2.4.0.
But recently I was trying to upgrade my project to play 2.4.0,
it appears to one thread named "ForkJoinPool-n-worker-n" was leaked after every

  1. run
  2. refresh http://127.0.0.1:9000 in browser
  3. ctrl d

cycle.

By following these steps, you could reproduce it easily.

  1. use activator new project-name play-scala to create a new sample project.
  2. run "activator" command in terminal
  3. start "Visual VM", connect to the jvm you just started
  4. then open threads tab
  5. run "run" in terminal
  6. access "http://127.0.0.1:9000" in browser
  7. press "ctrl d" in terminal
  8. then you could see a thread named "ForkJoinPool-n-worker-n" increased in Visual VM's threads list.
@jroper

This comment has been minimized.

Copy link
Member

commented Jun 15, 2015

Yes it is a bug - one that is very hard to test for. I wonder if we can write a scripted test to verify that all newly started threads in dev mode are cleaned up?

@benmccann benmccann added this to the 2.4.1 milestone Jun 16, 2015

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 19, 2015

Here's a scripted test that implements thread leak detection for dev mode. It's still got some printlns in it and it needs a bit of tidying up. We check for leaks by making sure that a WeakReference to the Play ClassLoader is garbage collected.

https://github.com/richdougherty/playframework/commits/thread-leak-detection

I've been able to verify that the test detects thread leaks and fails. However, I haven't been able to verify that it passes when there are no thread leaks, because we need to get rid of several thread leaks first!

I did some digging into thread leaks. So far what I have is this:

  • Reload leaks, i.e. leaks within a single run
    • The Akka scheduler thread references the wrong ClassLoader. It seems to use the current thread's ClassLoader, not the ActorSystem ClassLoader. Since we now load a single ActorSystem for each run invocation, shared across reloads, the scheduler thread ends up holding a reference to the first Play application's ClassLoader. This is probably an Akka bug but we can most likely work around it in Play.
  • Run leaks, i.e. resources leaked after run finishes
    • There's a process reaper Thread hanging around. Its contextClassLoader refers to the Play application ClassLoader. This thread is created by the JRE Process class.
    • There's a reference from javax.xml.bind.DatatypeConverterImpl.datatypeFactory to a org.apache.xerces.jaxp.datatype.DatatypeFactoryImpl created with the Play application ClassLoader. I assume this is something to do with Play's custom XML settings. Maybe we need to clean up our settings after ourselves—perhaps have an XML plugin.
    • There's a reference from java.util.logging.LogManager.manager to a org.slf4j.bridge.SLF4JBridgeHandler created with the Play application ClassLoader. Again, maybe we need to clean up our logging settings.
    • There's a scala.tools.nsc.interactive.PresentationCompilerThread that references the Play ClassLoader via its contextClassLoader ThreadLocal.
@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 19, 2015

@richdougherty this is great!! there's going to be a lot of happy people when this is fixed

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 21, 2015

I was hoping this was a quick fix, but since there are 4 leaks I might not be able to get around to this for a while. :(

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 21, 2015

Even a partial fix is nice if a couple of them are easy :-) We had 6 in our app (the 4 from Play and 2 of our own) and it made a big difference when we fixed 2 of them. We used to have the app crash every 6 reloads or so. Now we make it to more like a dozen reloads, which is a very noticeable improvement

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 21, 2015

The biggest leaks are probably:

  • The process reaper thread.
  • The scala.tools.nsc.interactive.PresentationCompilerThread.

Both of these will leak on every run. The leaks which are references from JRE classes should only leak once (because there's only one JRE class to leak from). The scheduler thread leak that happens on each reload should only leak once per run and then be cleaned up once run finishes.

@jroper jroper removed this from the 2.4.1 milestone Jun 22, 2015

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2015

I've started to hunt these leaks. My results:

  • process reaper Threads: I can see these threads too but they stop after about 1 min. I can't see any leak here.
  • scala.tools.nsc.interactive.PresentationCompilerThread: I can not reproduce this because this thread doesn't appear.
  • javax.xml.bind.DatatypeConverterImpl.datatypeFactory: This is a tricky one. Workaround: run Class.forName("javax.xml.bind.DatatypeConverterImpl") BEFORE you start run in sbt. This Workaround may change XML behavior because DatatypeConverterImpl will use JRE internal xerces instead of play classloader xerces (not sure if it does or not).
  • org.slf4j.bridge.SLF4JBridgeHandler: org.slf4j.bridge.SLF4JBridgeHandler.uninstall() will remove the handler from java.util.logging
  • Akka scheduler thread & classloader: I can't reproduce this.

I'm not sure about this but I think the LeakDetector won't work. Reason: Java8 doesn't collect metaspace on every GC. It will GC metaspace when metaspace-usage reaches metaspace-limit. I had to use JAVA_OPTS="-XX:MaxMetaspaceSize=150m -Xmx1024m" ~/opt/activator-1.3.5-minimal/activator to reach metaspace-limit faster (and see class unloading).

My env:
Java: 1.8.0_45, Play: 2.4.1, Activator: 1.3.5, OS: Kubuntu 14.10 64 Bit

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2015

If the JAXB leak is an issue, we could look at using https://github.com/mjiderhamn/classloader-leak-prevention/blob/master/src/main/java/se/jiderhamn/classloader/leak/prevention/ClassLoaderLeakPreventor.java#L498. @richdougherty was suggesting that it'll only leak once instead of on every run, so I'm not sure it's worth doing anything about if that's the case.

I filed a bug against JAXB. Maybe we can make this suck less in Java 9 :-p https://java.net/jira/browse/JAXB-1076

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2015

I see there's a Logger.shutdown method. Maybe that would be a good place to call the uninstall() method that @planetenkiller mentioned?

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2015

Ok, took my own stab at reproducing this. The Akka issue is easily the worst of them and is the one we should focus on first. I'm not sure I'd run into any issues if that one gets fixed or at least it would be far less common. I was able to reproduce it with jstack as explained in #4469

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2015

I filed a bug with Akka akka/akka#17857. It'd be cleanest to just fix the root of the problem rather than try to workaround it

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 28, 2015

Thanks a lot for looking into this, I really appreciate it!

  • process reaper threads: If these stop automatically then maybe we don't need to worry about them. That's fine with me. :)
  • scala.tools.nsc.interactive.PresentationCompilerThread: I modified a Twirl template to force Play to reload. This would have required the Scala compiler to run. Did you try this?
  • javax.xml.bind.DatatypeConverterImpl.datatypeFactory: At least this leak is bounded to only one ClassLoader instance, so it's not the worst of the leaks. If we want to try and fix it, then I guess one solution is to use reflection to hack the private static field. Maybe load the class outside the Play ClassLoader. When Play starts, replace the field value with the right value for Play. When Play stops, set it back to what it was. Not very elegant…
  • org.slf4j.bridge.SLF4JBridgeHandler: org.slf4j.bridge.SLF4JBridgeHandler.uninstall() sounds like what we need. You could run this in the DevServerStart class. Maybe make it part of the stop hook in the ServerProvider.Context.
  • Akka scheduler thread & classloader: maybe it was just an accident of timing that I saw this. I ran jmap four times to get the heap dump: 1. before run, 2. after the first compilation had happened, 3. after a reload and second compilation and 4. after stopping run. If I remember correctly I would have seen the thread start during 2 and noticed the leak during 3. The thread would have stopped by 4, because when the server stops the ActorSystem would also be stopped. @benmccann, I'm not sure if this is the worst leak because it is bounded at one thread, and it gets cleaned up when the server stops when run finishes. Or did you see it leak more than this?
@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2015

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 28, 2015

@richdougherty i ran jstack once. then i repeatedly edited a file and reloaded. i then ran jstack a second time. at that point there were an extra half dozen akka threads. it seemed to be leaking one thread per reload

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 28, 2015

@planetenkiller: for the Akka scheduler thread, I simply used a fresh project (activator new, play-scala, sbt, run) then followed the 4 steps I outlined above. I didn't do any special calls to the system.scheduler.

The ActorSystem and its scheduler should both use this ClassLoader, which should be the ClassLoader that sbt launches the Play run command with.

By the way, as part of this change, shall we give the anonymous ClassLoaders some better names, so that debugging is easier? In other words, let's change these two classes so they're not anonymous.

@benmccann, one thread per reload doesn't sound good! When you stop the run task, do all the threads get cleaned up?

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2015

yeah, i didn't spend a lot of time looking at it, but it does look like the threads get cleaned up when you stop the run task. of course in practice my server often crashes with an out of memory error before i ever stop it

de-anonymizing the classloader sounds like a good suggestion

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 29, 2015

OK, since the Akka scheduler leak is unbounded, we definitely want to fix it. We could probably work around the problem by calling system.scheduler.scheduleOnce when we first start the ActorSystem, and make sure we call it with environment.classLoader set as the calling Thread's contextClassLoader.

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jun 29, 2015

I tried to locate code locations that create new ActorSystems. I found two:

The question: Is the second ActorSystem a bug or a feature?

By the way, as part of this change, shall we give the anonymous ClassLoaders some better names, so that debugging is easier?

+100

@jroper

This comment has been minimized.

Copy link
Member

commented Jun 30, 2015

The second ActorSystem is a feature. When Play runs in dev mode, it needs an actor system that outlives the application in order to continue serving http requests when the application is reloaded. The application itself can't use this actor system, since the application will may create its own actors, have its own configuration, and require its own classloader for the actor system, and the application will also expect all the actors it creates to be shutdown when the application reloads. Since Akka doesn't support this kind of "context" inside an actor system, we have to use a second actor system that is bound to the applications lifecycle, classloader and configuration.

In production, there is only one actor system, the server uses the one from the application.

@lizepeng

This comment has been minimized.

Copy link
Author

commented Jun 30, 2015

In my opinion, play.core.Execution.common is very suspicious. Since in dev mode, when we first access http://localhost:9000, play need a different thread to reload everything, but at that moment, Play._currentApp is still null. As a result, it seems like that play create a new ForkJoinPool which is play.core.Execution.common and won't release until ctrl-d.

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 30, 2015

@planetenkiller @jroper: Sorry I completely forgot about the second ActorSystem. That would explain why we're getting one with a different contextClassLoader. Does it look to you like the scheduler thread of the per-reload-ActorSystem is being properly cleaned up, or is it leaking?

@lizepeng, play.core.Execution.common is a bit of a hack. We're slowly trying to remove global state and we still need to provide a statically-available thread pool for some situations. In the future hopefully we can remove this thread pool entirely. What I hope is that common will be used for only a moment, until the application becomes available, then it will clean itself up. See description of worker cleanup.

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jun 30, 2015

I can reproduce the scala.tools.nsc.interactive.PresentationCompilerThread leak. It looks like a more complicated leak then the leaks from above (timing & threads & contextClassLoader).

from dev mailinglist:

I used YourKit and I can get you a license if you need one.

A YourKit license would be nice. jvisualvm is good but very limited compared to YourKit (or JProfiler & co.)

@richdougherty

This comment has been minimized.

Copy link
Member

commented Jun 30, 2015

A YourKit license would be nice.

I've requested a license. If you send me your email address then I'll forward the license on to you when I get it. My email address is rich@rd.gen.nz.

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jul 2, 2015

scala.tools.nsc.interactive.PresentationCompilerThread leak:

lizepeng was very close already. To see this leak: the first compile batch after run must not contain templates. The first template compile after the first compile will result in the leak.

What happens:

  • ApplicationProvider.get starts a Future code
  • ExecutionContext of the Future is the Application-ActorSystem-dispatcher code
  • sbt uses ThreadPoolExecutor, new worker of Executor inherits application contextClassLoader (because its a akka dispatch worker thread)
  • sbt submits new Task ThreadPoolExecutor, new worker of Executor inherits current workers contextClassLoader (==application contextClassLoader)
  • twirl uses scala compiler that creates a PresentationCompilerThread, inherits current workers contextClassLoader (==application contextClassLoader)

To get rid of this leak we have to change play.core.Execution.internalContext:

  • play.core.Execution.common does not work 100% because it will leak to (contextClassLoader will change to "run" ClassLoader. PresentationCompilerThread won't stop after the run task finishes => first "run" ClassLoader will leak)
  • we need a Thread(-Pool) that creates threads with a non run specific classloader
@richdougherty

This comment has been minimized.

Copy link
Member

commented Jul 2, 2015

Wow that was a tricky one to track down. Nice work.

I'm open to suggestions for changing play.core.Execution.internalContext, if you can think of something that will work. :)

Another option is to make sure that we've use a different contextClassLoader when we ask sbt to compile the code. For example, given call sbt within Application.get when we call BuildLink.reload. This invokes the code in Reloader.reload, which invokes sbt.

We could change the code in Reloader.reload to set a different contextClassLoader before it invokes sbt. We can probably use the Reloader's own classloader for the contextClassLoader, because I think Reloader runs in the main sbt classloader anyway.

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jul 4, 2015

That is a good idea. I will create a PR that contains all the fixes of this issue.

Any opinions about this (using reflection to replace value on run-start and run-stop):

javax.xml.bind.DatatypeConverterImpl.datatypeFactory: At least this leak is bounded to only one ClassLoader instance, so it's not the worst of the leaks. If we want to try and fix it, then I guess one solution is to use reflection to hack the private static field. Maybe load the class outside the Play ClassLoader. When Play starts, replace the field value with the right value for Play. When Play stops, set it back to what it was. Not very elegant…

jroper pushed a commit that referenced this issue Jul 15, 2015

jroper added a commit that referenced this issue Jul 15, 2015

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2015

Yay, looks like a couple of these are fixed now. Seems like the Akka one is the only one left to address now (which is the worst of them, so it will be important to fix)

@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2015

Which akka leak? The Akka scheduler thread references the wrongClassLoader` isn't a leak because the dev mode uses two actor systems (one per run, one per reload). The 'reload' actor system shouldn't leak because it is part of the shutdown logic (triggered on every reload).

Or did I miss something?

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2015

I'm not sure what the root cause is, but it's easy to reproduce that there's an Akka leak.

Run ps -ef | grep sbt to find the PID of your process. Then run jstack PID > jstack1 with that PID. Now repeatedly edit a file and reload. Run jstack PID > jstack2. Open the files jstack1 and jstack2 and search for akka. You'll see that akka is found in jstack2 many more times than it is in jstack1.

jCalamari pushed a commit to jCalamari/playframework that referenced this issue Jul 17, 2015

Piotr Fras
local variable
fix BodyParser scaladoc

Add support for unquoted Content-Disposition parameters

Update test dependencies

Upgrade async-http-client to version 1.9.29

guice stage - local variable

Update Heroku deployment documentation for 2.4

Change JsLookupResult#getOrElse to call by name

fix dev-mode memory leaks (playframework#4649)

Enable sql log statements

SQL statements can now be logged through jdbcdslog-exp
datasource proxy. This will proxy both connection pools
implementations and also custom implementations configured
by users.

Added SubTypesScanner to Reflections factory in Classpath

Added SubTypesScanner to Reflections factory in Classpath class because
getTypesAnnotatedWith method requires SubTypesScanner to be configured
in Reflections version 0.9.9.
Refactor duplicated creations of Reflections configuration to single method.
Added test case for utility class Classpath.

Make `invokeWithContext` static in Helpers.java

To fix issue playframework#4834
@planetenkiller

This comment has been minimized.

Copy link
Contributor

commented Jul 17, 2015

Can you create a gist that contains jstack1 / jstack2 (can you edit&reload again and add jstack3 too please)?

bjfletcher added a commit to bjfletcher/playframework that referenced this issue Jul 19, 2015

Merge branch 'master' of github.com:playframework/playframework into …
…range-api

* 'master' of github.com:playframework/playframework:
  + corrected view titles
  Make `invokeWithContext` static in Helpers.java
  Added SubTypesScanner to Reflections factory in Classpath
  Change JsLookupResult#getOrElse to call by name
  Update Heroku deployment documentation for 2.4
  fix dev-mode memory leaks (playframework#4649)
  Update test dependencies
  Add support for unquoted Content-Disposition parameters
  Upgrade async-http-client to version 1.9.29
  Enable sql log statements
@benmccann

This comment has been minimized.

Copy link
Contributor

commented Jul 20, 2015

trydofor added a commit to trydofor/playframework that referenced this issue Jul 29, 2015

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Aug 3, 2015

I got two GitHub notifications from ritakuhou (@lizepeng) that aren't appearing on this thread. Pasting them below for future reference:

I still think play.core.Execution.common is getting something wrong. After applying patch shown as below to play.core.Execution.scala, threads named "ForkJoinPool-n-worker-n" don't leak any more.

private[play] object Execution {

  def internalContext: ExecutionContext = {
    val appOrNull: Application = Play._currentApp
    appOrNull match {
      case null => common
      case app: Application => {
        //Since we won't need this ForJoinPool once Play is started.
        if (!forkJoinPool.isShutdown()) forkJoinPool.shutdown()
        app.actorSystem.dispatcher
      }
    }
  }

  object Implicits {

    implicit def internalContext = Execution.internalContext

  }

  private val forkJoinPool = new ForkJoinPool()

  /**
   * Use this as a fallback when the application is unavailable.
   * The ForkJoinPool implementation promises to create threads on-demand
   * and clean them up when not in use (standard is when idle for 2
   * seconds).
   */
  private val common = ExecutionContext.fromExecutor(forkJoinPool)
}

Then when I run some edit-reload cycles or run-refreshbrowser-ctrld cycles, everything seems to be fine.
Note that although the loaded classes and meatspace keep increasing, those parts of memory could be recycled by GC when hit some threshold later.

image

Leaking Thread/ThreadPool I found are:

  • play.core.Execution.common
  • play.api.libs.iteratee.Concurrent.timer
  • ( (In my env) akka.remoting.shutdown (see akka/akka#17729)

For the time being, I'm using scala reflection api to kill those threads, since they're either in private class, or private field.

@lizepeng

This comment has been minimized.

Copy link
Author

commented Aug 4, 2015

@benmccann Yes, that was me. :)

I thought I was right in my direction, but actually some parts are.
So I removed those posts and wanted to give a better comment after summarising.

The thing is,

  1. Patching play.core.Execution.scala (or hacking with reflection) only works fine in the simplest case, such as sample project play-scala.
  2. When it comes to some more complex projects, things are getting worse. At least, I have encountered two problems. play.api.libs.iteratee.Concurrent.timer and akka.remote.Remoting.shutdown.
    1. play.api.libs.iteratee.Concurrent, a scala object created a java.util.Timer inside and won't cancel() it.
    2. akka.remote.Remoting, who use scala.concurrent.ExecutionContext.global during shutdown, thus a thread named 'ForkJoinPool-n-worker-n' survived after every run-ctrld cycle.
  3. I also tried hacking those through reflection api, it turned out that it works for play.core.Execution.common since after shutting down the internal ForkJoinPool we still have app.actorSystem.dispatcher at hand to use. In other cases this might not work.

That's all I got till now.

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Aug 4, 2015

Good detective work! I've filed an issue for the Akka bug with them akka/akka#18115

@lizepeng

This comment has been minimized.

Copy link
Author

commented Aug 13, 2015

Currently, I am using these pieces of code for workaround, and it seems to work fine.

StopHooks
Compile Time Injection AppLoader

In brief,

  1. Should shutdown play.core.Execution.common pool and cancel() timer thread in play.api.libs.iteratee.Concurrent on each reload.
  2. Prepend the above hooks to applicationLifeCycle.

Hope this could help.

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2015

The Akka fix was just merged. We should upgrade to 2.3.13 when it comes out

ClaraAllende pushed a commit to ClaraAllende/playframework that referenced this issue Aug 28, 2015

@mkurz

This comment has been minimized.

Copy link
Member

commented Sep 3, 2015

akka 2.3.13 is out now

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2015

I'm seeing this in testing a lot as well. I filed a new issue for it. Curious if you guys could help me track it down. #5144

@benmccann

This comment has been minimized.

Copy link
Contributor

commented Apr 14, 2019

Some folks are suggesting this may be related to combination of sbt and scala versions chosen: sbt/sbt#4166

@dwijnand dwijnand removed the status:backlog label May 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.