Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concurrency issues with specs2 3.8.7+ #556

Closed
Shadowfiend opened this issue Mar 6, 2017 · 39 comments
Closed

Concurrency issues with specs2 3.8.7+ #556

Shadowfiend opened this issue Mar 6, 2017 · 39 comments

Comments

@Shadowfiend
Copy link

Shadowfiend commented Mar 6, 2017

We recently moved to specs2 3.8.8 and started seeing a weird situation where the classloader used in internal code that bootstraps a Clojure environment was behaving differently from before. We narrowed down the version of specs2 that started causing this to version 3.8.7. I implemented a workaround for this, but… We then started seeing these:

[error]    com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'scribe-retry' (SimpleConfig.java:152)
[error] com.typesafe.config.impl.SimpleConfig.findKeyOrNull(SimpleConfig.java:152)
[error] com.typesafe.config.impl.SimpleConfig.findOrNull(SimpleConfig.java:170)
[error] com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:184)
[error] com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:189)
[error] com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:258)
[error] com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:264)
[error] com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:37)
[error] com.elemica.spandex.common.Configuration$scribeRetry$.<init>(configuration.scala:66)
[error] com.elemica.spandex.common.Configuration.scribeRetry$lzycompute(configuration.scala:65)
[error] com.elemica.spandex.common.Configuration.scribeRetry(configuration.scala:65)
[error] com.elemica.spandex.common.ScribeClient$class.$init$(ScribeClient.scala:42)

Unfortunately, these don't happen in 3.8.6---only 3.8.7 and 3.8.8. So it seems like something in the 3.8.6->3.8.7 got weird with respect to classpaths and possibly class loaders, though I don't know what it would be. Notably, if I run the offending specs individually, they work fine. But if I run them in a chunk (we have multiple subprojects and run their tests using all common/test universe/test portal/test fabric/test persistence/test rica-lib/test spandex/test), not so much...

Switching on the project down to 3.8.6 makes these pass, up to 3.8.7 fails, down to 3.8.6 again passes.

I tried looking through the diff between the versions but nothing jumped out at me, so I'm down to my last resort… Any ideas?

@etorreborre
Copy link
Owner

The execution indeed changed between 3.8.6 and 3.8.7. I still need to understand exactly how to re-implement the previous behaviour (I don't think it is related to classloaders). The "new" behaviour will execute tests concurrently in batches. Can you please try https://oss.sonatype.org/content/repositories/releases/org/specs2/specs2-core_2.12/3.8.8-20170228162720-a7d4c14, pass the unbatched argument on the command line and tell me if you still have this issue?

Also this might tell you that you have possible concurrency issues within your test code and/or production code and you might want to fix those if that's the case (in that case the bug will have become a feature :-)).

@Shadowfiend
Copy link
Author

The issue here is that it's unlikely to be concurrency at the code level. The Clojure loading block, for example, is synchronized… But somehow there has been a change in classloader behavior (I hypothesize this because the way I fixed it was to replace the context classloader with the current class's classloader at the beginning of the synchronized block and restore it after). It also seems reproducible on different boxes (i.e., it fails at the same point on different machines), which is very odd for something that could be race condition-y.

I'll keep investigating, including trying that unbatched argument, and report back.

@etorreborre
Copy link
Owner

Thanks for investigating, sorry to dump this work on you. You rarely want to have to debug a test library :-(.

@Shadowfiend
Copy link
Author

Haha no it's fine. I'd also like to see if I can make a reproducible case, both for my own benefit and yours :)

@etorreborre
Copy link
Owner

I just did some tests and I realize that there's a bit more concurrency in 3.8.7 than in 3.8.6. In 3.8.6 the tests started in order and finished depending on their execution time:

started test 1
started test 2
started test 3
started test 4
finished test 4 after 3 millis
finished test 2 after 52 millis
finished test 1 after 58 millis
finished test 3 after 86 millis

whereas in 3.8.8 they don't even start in order

started test 3
started test 4
started test 2
started test 1
finished test 4 after 3 millis
finished test 2 after 52 millis
finished test 1 after 58 millis
finished test 3 after 86 millis

Maybe this added concurrency is causing the issue? I don't remember otherwise any classloading change.

@Shadowfiend
Copy link
Author

I still don't see why that would break things in a synchronized block, though… I'm also looking into ClasspathFilter, the intermediate classloader that sbt injects, in case there's some different behavior in there that's resulting from these changes---but I don't have anything specific to report yet :) Will keep you posted!

@Shadowfiend
Copy link
Author

Shadowfiend commented Mar 7, 2017

I'm actually realizing that most if not all of these issues are happening during singleton initialization… So that's interesting. Not sure what it means yet though!

EDIT: So that I'm not continually posting new comments, some more failures:

[error] CAUSED BY
[error]  java.lang.reflect.InvocationTargetException: null (NativeConstructorAccessorImpl.java:-2)
[error] sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[error] sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[error] sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[error] akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(ReflectiveDynamicAccess.scala:23)
[error] akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(ReflectiveDynamicAccess.scala:20)
[error] akka.actor.ReflectiveDynamicAccess.getClassFor(ReflectiveDynamicAccess.scala:20)
[error] akka.actor.ReflectiveDynamicAccess.createInstanceFor(ReflectiveDynamicAccess.scala:38)
[error] akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:597)
[error] akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
[error] akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
[error] com.elemica.scribe.FlipbookWorkConsumerSpec.<init>(FlipbookWorkConsumerSpec.scala:30)
[error] sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
[error] sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
[error] sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
[error] CAUSED BY
[error]  java.lang.ClassCastException: interface akka.event.LoggingFilter is not assignable from class akka.event.DefaultLoggingFilter (ReflectiveDynamicAccess.scala:23)
[error] akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(ReflectiveDynamicAccess.scala:23)
[error] akka.actor.ReflectiveDynamicAccess$$anonfun$getClassFor$1.apply(ReflectiveDynamicAccess.scala:20)
[error] akka.actor.ReflectiveDynamicAccess.getClassFor(ReflectiveDynamicAccess.scala:20)
[error] akka.actor.ReflectiveDynamicAccess.createInstanceFor(ReflectiveDynamicAccess.scala:38)
[error]    java.lang.ClassNotFoundException: com.elemica.transformations.CodeConversionContextChange (FastFullTypeHints.scala:24)
[error] com.elemica.common.FastFullTypeHints$$anonfun$classFor$1.apply(FastFullTypeHints.scala:24)
[error] com.elemica.common.FastFullTypeHints$$anonfun$classFor$1.apply(FastFullTypeHints.scala:23)
[error] com.elemica.common.FastFullTypeHints.classFor(FastFullTypeHints.scala:23)
[error] net.liftweb.json.TypeHints$CompositeTypeHints$$anonfun$hintFor$2.apply(Formats.scala:169)
[error] net.liftweb.json.TypeHints$CompositeTypeHints$$anonfun$hintFor$2.apply(Formats.scala:169)
[error] net.liftweb.json.TypeHints$CompositeTypeHints.hintFor(Formats.scala:169)
[error] net.liftweb.json.Extraction$.prependTypeHint$1(Extraction.scala:66)

This second one is for a line of code:

      val clazz = Thread.currentThread.getContextClassLoader.loadClass(hint)

Interestingly, moving away from the context class loader is precisely what fixed the Clojure situation as well.

The first one also looks like it could be related to classloading, though more tangentially.

Currently my only conclusion is: something funkay this way comes!

EDIT2: I've also tried bumping from sbt 0.13.11 to 0.13.13 to no avail.

EDIT3: also of note, the above ^^ are completely independent code paths, and are hit in different specs. Obviously the stuff happening deep in akka-land ain't our code :)

@Shadowfiend
Copy link
Author

Unbatched doesn't seem to help. Is there something I can print/check from inside a spec to make triple-plus sure that I've set unbatched correctly?

@Shadowfiend
Copy link
Author

Oh. Oh dear. Okay, I've found the issue. I added this to the Clojure initialization stuff:

    val previousClassLoader = Thread.currentThread().getContextClassLoader
    val parentClassLoader = ClojureRunner.getClass.getClassLoader

    println(previousClassLoader)
    println(parentClassLoader)

The context classloader is a ClasspathFilter, but its parent is a completely different classloader from a different project, compared to the parentClassLoader that is looked up.

This is actually very helpful; I think it's going to help me create a reproducible test case. More to come.

@etorreborre
Copy link
Owner

This looks very puzzling, I hope you will get to the bottom of it :-).

@Shadowfiend
Copy link
Author

Well, the ClasspathFilter classloader is one that sbt injects during test runs, so I suspect it's going to be quite a delight to track the root cause down… I think I'm about to learn more about how sbt and specs2 interact than I probably want to know 😆

@Shadowfiend
Copy link
Author

Okay, minimal test case lives at https://github.com/Shadowfiend/specs2-concurrent-boom-boom . 46 LOC. Would it make sense to refer this to the sbt folks/take it to a mailing list/something else?

@Shadowfiend
Copy link
Author

Allllll righty, I did some bisecting. In the process I found that the classloader behavior is indeed somewhat intermittent, though not strongly so. With a reasonable measure of certainty, it looks like commit ffa0a7b is the culprit. What in that commit caused it… Is hard to tell, and I'm out of blueberries for the day.

@Shadowfiend
Copy link
Author

@etorreborre Is it possible that the new execution strategy is using a different thread pool provided by sbt? That is, that it is using threads that aren't spawned directly from within specs2?

@etorreborre
Copy link
Owner

specs2 should be using its own thread pool. I can check later if anything changed around that.

@Shadowfiend
Copy link
Author

Another piece that would be useful in investigating this is where that thread pool comes from---i.e., where is it spawned and where do new threads get added to that pool? If you could give me a pointer to that, it could help, as there are quite a few abstraction layers to descend through and I'm having trouble digging through them. Otherwise I'll post more on here when I manage to figure it out :)

@etorreborre
Copy link
Owner

In SbtRunner the Env is created and contains an ExecutorService. This one is used to fork threads in Executor.

@Shadowfiend
Copy link
Author

So, after some investigation on the sbt group and flat out asking and being pointed to some known classloader issues they're looking at, I thought I'd try to see if I could take a quick stab at just setting the right context classloader within specs2 so it works for everything.

I added some context classloader setup in ResultExecution.execute and that seemed to make the issues go away. I can try to break down a spec for this and then open a PR, but wanted to see if there was a better place for me to do this wrapping. It looks like there might be some other places I should update as well, but perhaps the place to do the actual wrapping is wherever we compute the Result, rather than in the execution bit? My lack of familiarity with the specs2 code is a disadvantage again here, so let me know if there's a spot that seems like it could be best to add it.

For reference, the code after looks like:

  /** execute a Result and return a Result even if there are specs2 exceptions */
  def execute(result: =>Result) = {
    val oldContextClassLoader = Thread.currentThread().getContextClassLoader()
    try {
      Thread.currentThread().setContextClassLoader(result.getClass.getClassLoader())
      result
    } catch {
      ... same catch stuff ...
    } finally {
      Thread.currentThread().setContextClassLoader(oldContextClassLoader)
    }
  }

@etorreborre
Copy link
Owner

I am also not familiar with classloaders and I don't know if such a modification would be safe for all users, especially when not using sbt. One thing you can try "locally" is to mix-in a trait to your specs:

trait WithClassClassLoader extends AroundEach {
  def around[R : AsResult](r: =>R): Result = {
    val old = Thread.currentThread().getContextClassLoader()
    try {
      val result = r
      Thread.currentThread().setContextClassLoader(result.getClass.getClassLoader())
      AsResult(result)
    } finally Thread.currentThread().setContextClassLoader(oldContextClassLoader)
  }
}

I am actually not sure how this works. If your test code throws an Exception because it is not executed with the right class loader how will we reach the point where we set the right classloader? (note that in your version of execute you evaluate result twice).

@etorreborre
Copy link
Owner

Maybe this class loader is the one which should be set as the context class loader before running the tests?

@Shadowfiend
Copy link
Author

I'll give that classloader a shot and report back.

I am actually not sure how this works.

I noticed this after posting my comment yesterday… Is it possible Result is capturing the underlying exception, and since it's still in the right classloader running a getClass still gets us the right classloader?

I don't know if such a modification would be safe for all users, especially when not using sbt.

Well what I can say for sure is that after the commit I pointed out above, specs2 isn't safe for all users (particularly using sbt in a multi-project setup). Definitely the potential for a catch-22 here if we fix it and break other situations, though... I've posted a question on Stack Overflow to see if anyone has any ideas at the sbt level.

@etorreborre
Copy link
Owner

So what I gather from this thread is that the classloader used to execute the tests is not deterministic and is influenced by how we deal with concurrent execution. For example in the commit you point out we introduce a bit of scheduling to be able to time out tests.

I used your "boom-boom" project a bit and it seems that immutable specifications and just one module is enough to reproduce the issue. Also I tried to reproduce the same case inside specs2 which is also a multi-module build and I don't get this issue. In fact it seems that there is no difference between the class loaders, they are both AppClassLoader.

I need to play around more in order to understand what's going on. I'll try to do more of that this week.

@Shadowfiend
Copy link
Author

Your understanding seems the same as mine, yep.

Very weird that specs2 doesn't exhibit the same issue. Sorry I haven't been able to circle back here, a few other things have kept me distracted and will probably continue to keep me busy for a few days; when I get a chance, though, I'll be happy to do more spelunking as needed.

@nathankleyn
Copy link

nathankleyn commented Apr 17, 2017

Hey guys! At the risk of turning this into a "me too" thread, we've hit this issue within a large scala codebase trying to upgrade to Specs2 3.8.9. I went through the same rough process as @Shadowfiend and came to find that it's definitely introduced somewhere in 3.8.7 - that is, going back to 3.8.6 makes the issues go away completely.

I don't think it'd be particularly easy for me to construct a reproducible test case better than @Shadowfiend, but if there's anything I can do to help or provide more details I'm at your disposal! 👍

@etorreborre
Copy link
Owner

@nathankleyn that's definitely good to know, that means I must shift my priorities :-).

@etorreborre
Copy link
Owner

Can you please try 3.8.9-20170417165056-719dc83 (should be available in a few minutes)? I used the same technique than the one used for scalacheck.

@nathankleyn
Copy link

@etorreborre No worries, can do - and many thanks for the quick reply! Still waiting for it to appear in Maven Central right now, no sign just yet.

@etorreborre
Copy link
Owner

Something was interrupted in the publishing. I'm now trying to publish 3.8.9-20170417195349-7b7973e but sonatype is really slow tonight.

@etorreborre
Copy link
Owner

Ok done

@etorreborre
Copy link
Owner

I'm closing this issue as it should be ok now. Please reopen if necessary.

@nathankleyn
Copy link

nathankleyn commented Apr 18, 2017

Hey @etorreborre! I can confirm it seems to fix the issues on my end. Thanks so much for the fix! 👍

@Shadowfiend
Copy link
Author

Haven't confirmed it on our end yet but I'm hopeful. Appreciate the effort!

@Shadowfiend
Copy link
Author

Yep, confirmed that this fixed the classloader issues we were having! Thanks!

@alexduf
Copy link

alexduf commented Oct 25, 2017

I encountered the exact same issue when migrating to play 2.6.

When using 3.8.9-20170417195349-7b7973e it all works, but it looks like neither 3.8.9 nor 3.9.x has the fix. Was this published version just a one-off fix or was it intended to reach master?

@etorreborre
Copy link
Owner

@alexduf can you try with 4.0.1?

@etorreborre
Copy link
Owner

You can also try to pass useCustomClassLoader on the command line

sbt> testOnly *Spec -- useCustomClassLoader

I'm afraid that the real problem lies in sbt but I'm not sure.

@alexduf
Copy link

alexduf commented Oct 26, 2017

Hi @etorreborre, since I'm using Play I'm stuck with whatever version they use (3.8.9 as far as I know)
Nevertheless I tried overriding that with 4.0.1, but run into more classcast exception, this time on Mockito's side. At this point I'm not sure if we're still talking about a classloader problem or a transient dependency incompatibility.

I tried using the option with 3.8.9 to no avail, so I assumed it was added later?

I'll stay on 3.8.9-20170417195349-7b7973e until I finished migrating our project to play 2.6 and will come back to this one later.

@etorreborre
Copy link
Owner

On Mockito side this might be due to the Mockito version you are using, I think I did an upgrade to the latest in 4.x.

phillip-peach added a commit to LightspeedGMI/play-plugins that referenced this issue Jan 14, 2018
@alexduf
Copy link

alexduf commented Feb 20, 2018

If anyone stumbles on this bug with something similar, I've sorted my issue by using fork := true.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants