Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scalac classpath cache should invalidate modified jars #10295

Closed
smarter opened this issue Apr 29, 2017 · 21 comments
Closed

Scalac classpath cache should invalidate modified jars #10295

smarter opened this issue Apr 29, 2017 · 21 comments
Assignees
Milestone

Comments

@smarter
Copy link
Member

smarter commented Apr 29, 2017

Since Scala 2.12, classpath entries are cached unless -YdisableFlatCpCaching is passed to scalac, this can have very surprising results like breaking this sbt test that uses unmanagedJars: https://github.com/sbt/sbt/tree/0.13/sbt/src/sbt-test/source-dependencies/binary (you can copy this folder somewhere, set scalacOptions to 2.12.2 and verify that launching sbt, doing use/compile, removing dep/A.scala, and doing use/compile again compiles even though the dependency does not exist and the jar has been updated, adding scalacOptions += "-YdisableFlatCpCaching" fixes that.

I think the scalac cache in https://github.com/scala/scala/blob/2.12.x/src/compiler/scala/tools/nsc/classpath/ZipAndJarFileLookupFactory.scala should be more intelligent and invalidate modified jars.

eed3si9n added a commit to eed3si9n/zinc that referenced this issue May 19, 2017
Fixes sbt#282
Ref scala/bug#10295

exportJars := true exposes JAR file as subproject dependency.
Scala 2.12.2 fails to invalidate the source it's used. `-YdisableFlatCpCaching` apparently works aruond this.
@eed3si9n
Copy link
Member

+1

Check out sbt/zinc#303 / sbt/zinc#282.
sbt has a setting called exportJars that exposes subproject dependencies as JAR file. When combined with Scala 2.12.2 it no longer invalidates the user source file, so that's a surprising behavior.

ZipAndJarFileLookupFactory.scala should be more intelligent and invalidate modified jars.

I'd echo @smarter's comment here. Overall the idea of caching makes sense, but it probably needs to check the timestamp or SHA-1 or something to not break existing builds.

@jvican
Copy link
Member

jvican commented May 23, 2017

A huge thumbs up to fixing this issue. I want to implement this sbt/zinc#305 in Zinc at some point, and I prefer not to ask users of it to add scalacOptions += "-YdisableFlatCpCaching".

On an independent topic, is there any particular reason why this caching of jars is performed by Scalac? How big was the performance improvement?

@smarter
Copy link
Member Author

smarter commented May 23, 2017

On an independent topic, is there any particular reason why this caching of jars is performed by Scalac? How big was the performance improvement?

scala/scala#4060 gives some background, the main usecase is IDEs where each subproject will get its own compiler instance:

I built Scala IDE using my version of Scala and tested it with a very large codebase with about 100 projects. I turned off the closing of unused presentation compilers which I added lately. And after that I did certain custom operation which does a deep refresh of all projects (close projects, regenerate the workspace, open projects).

I've also observed the dotty test suite running faster after merging the 2.12 classpath implementation in dotty: scala/scala3#2191

@jvican
Copy link
Member

jvican commented May 23, 2017

Thanks for the clarification @smarter.

@smarter
Copy link
Member Author

smarter commented Jun 6, 2017

Ping @adriaanm @retronym , it'd be great if someone could have a look at this issue.

@adriaanm
Copy link
Contributor

adriaanm commented Jun 8, 2017

@lrytz, what do you think?

@retronym retronym added this to the 2.12.3 milestone Jun 8, 2017
@retronym
Copy link
Member

retronym commented Jun 8, 2017

This cache does appear to overreach, I think can we should tame it for 2.12.3. A minimal change would be to flip the default value of the disableCaching setting to true, and set it to false in the presentation compiler (assuming the user hasn't explicitly set it).

A more nuanced change would be to use file timestamps/size/content-hash, or notifications from the nio.file.WatchService to invalidate it.

It also represents a memory leak in its current form.

@retronym retronym self-assigned this Jun 9, 2017
@retronym
Copy link
Member

retronym commented Jun 9, 2017

@retronym
Copy link
Member

retronym commented Jun 9, 2017

The following test fails every time in Scala 2.12 with default settings, finding the stale entry of "p1.C" in the second call to test().

import java.nio.file._

object Test {
  def main(args: Array[String]): Unit = {
    val f = Files.createTempFile("test-", ".jar")
    Files.delete(f)
    createZip(f, Array(), "p1/C.class")
    test(f, "C") 
    Files.delete(f)
    createZip(f, Array(), "p1/D.class")
    test(f, "D")
  }

  def test(p: Path, expectedContets: String): Unit = {
    val g = new scala.tools.nsc.Global(new scala.tools.nsc.Settings())     
    g.settings.usejavacp.value = true
    g.settings.classpath.value = p.toString
    import g._
    new Run
    val decls = g.rootMirror.RootPackage.info.decl(TermName("p1")).moduleClass.info.decls

    assert(decls.lookup(TermName(expectedContets)) != NoSymbol, decls)
    assert(decls.lookup(TypeName(expectedContets)) != NoSymbol, decls)
    assert(decls.size == 2)
  }

  def createZip(zipLocation: Path, content: Array[Byte], internalPath: String): Unit = {
    val env = new java.util.HashMap[String, String]()
    env.put("create", String.valueOf(Files.notExists(zipLocation)))
    val fileUri = zipLocation.toUri
    val zipUri = new java.net.URI("jar:" + fileUri.getScheme, fileUri.getPath, null)
    val zipfs = FileSystems.newFileSystem(zipUri, env)
    try {
      try {
        val internalTargetPath = zipfs.getPath(internalPath)
        Files.createDirectories(internalTargetPath.getParent)
        Files.write(internalTargetPath, content)
      } finally {
        if (zipfs != null) zipfs.close()
      }
    } finally {
      zipfs.close()
    }
  }
}

However, it also fails about 50% of the time in Scala 2.11, due to the fact that we don't have a Global.close() method that closes any and all ZipFile-s that we're holding open. Internally, new ZipFile(f) reuses an internal native file descriptor and zip entry cache, without any invalidation on timestamps, etc.

#9632 provides a workaround for that problem with -Dscala.classpath.closeZip=true which opens and closes ZipFile-s on deflating each entry, rather than holding them open until they are GC-ed.

In practice, the way that SBT hammers the System.gc() all the time to workaround its own resource management issues with logging streams probably means that in 2.11 the exportJars use case would have worked 100% of the time.

On my branch, I've added a way to close the ZipFile. But is actually quite hard to know how to integrate this into the desirable Global.close method if the classpath's are going to be shared between Global-s.

@retronym
Copy link
Member

retronym commented Jun 9, 2017

Note that the implementation of j.u.ZipFile has been changed in JDK 9 to be pure Java, which might exhibit different behaviour from 8. http://hg.openjdk.java.net/jdk9/jdk9/jdk/rev/d85c42d008a9

@retronym
Copy link
Member

retronym commented Jun 9, 2017

Indeed, Java 9 has improved the situation: it considers the last modified time in its caching of JAR listings.

Modifying the test above with:

    g.settings.YdisableFlatCpCaching.value = true

And running on 2.12, it passes all the time with Java 9, and fails intermittently with Java 8.

@lrytz
Copy link
Member

lrytz commented Jun 9, 2017

To summarize and check my understanding:

  • Even when disabling our cache, if the JAR file changed, Java 8 might return an existing ZipFile. This is because we don't close the them eagerly, they only close on gc. In practice, when using sbt, you're unlikely to see that problem because sbt triggers/invokes gc often.
  • In Java 9 this is "fixed", i.e., a new ZipFile is created by a timestamp check if an existing is still open.
  • Introducing an eager close is hard because it's not obvious when to invoke it.

I haven't read up about nio.file.WatchService yet, i guess it might not work equally well on all platforms?

@retronym
Copy link
Member

retronym commented Jun 9, 2017

Java 8 might return an existing ZipFile.

Java 8 might return an new instance if ZipFile that shares internal state with the previous instance.

Introducing an eager close is hard because it's not obvious when to invoke it.

I've pushed a commit to my WIP branch that shows how we might implement close correctly in the face of sharing (it needs reference counting).

@smarter
Copy link
Member Author

smarter commented Jun 9, 2017

/cc @fommil who I think has some relevant knowledge

@fommil
Copy link

fommil commented Jun 9, 2017

LOL, yeah you could say that... I've kind of just given up to be honest, I felt like I was in a well shouting about a leak and nobody cared. I was running a custom version of scala/scala#5592 at some point (via https://github.com/fommil/sbt-scala-monkey) with a third backend that used https://github.com/fommil/class-monkey/ and therefore effectively kept .class files in memory ACROSS compilers. But I hit some weird cache invalidation cases because scalac assumes that jars are immutable, even in incremental mode. That is independent of the classpath backend used. It didn't affect my workflow too much, it frankly wasn't even in my top 10 concerns in that project. I'm no longer on that project, so it's dropped off my priority list. I'd like to say I've rejoined civilisation, but I'm writing python in the day job.

Since we're having a party, let's invite @romanowski and @mpociecha

@lrytz
Copy link
Member

lrytz commented Jun 9, 2017

implement close correctly in the face of sharing (it needs reference counting)

looks like a good approach to me!

@retronym
Copy link
Member

retronym commented Jun 12, 2017

My analysis of the internal caching in java.util.File was incorrect. Even prior to Java 9, the last modified timestamp is used as part of the cache key.

I'm also preparing a small patch to SBT to call Global.close after the compile task is completed. That should mean it releases locks on JARs in the classpath after the compile task(s) are done.

@fommil
Copy link

fommil commented Jun 12, 2017

I was doing that too, it's not enough... unless you've added lots more close logic.

Does that have any performance impact on the classloader reuse?

@retronym
Copy link
Member

retronym commented Jun 12, 2017

I tested with:

# project/build.properties
sbt.version=0.13.6-bin-2f21ca5-SNAPSHOT
// build.sbt
scalaVersion := "2.12.3-bin-e1fc605-SNAPSHOT"

libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.5"
// test.scala
object Test {
   scala.xml.Node
  { case class Foo(a: Int, b: Int, d: Int, c: Int) }
  { case class Foo(a: Int, b: Int, d: Int, c: Int) }
  // a few thousand more to slow down the compile to make the open/close visible to lsof
}
sbt> ~compile
% lsof -p <SBT PID> | grep scala-xml_2.12-1.0.5

The lsof output showed that JAR was opened during the compile task, but closed afterwards.

@retronym
Copy link
Member

I'm not proposing any classloader related changes. I expect these changes to be performance neutral.

@fommil
Copy link

fommil commented Jun 12, 2017

Awesome. This would have been hugely valuable to me for the last two years, hopefully it'll help somebody else, and it might even help reduce sbt OOMs.

I guess this just closes ArchiveFile handles? N that case, can my change be backed out?

There is still the problem of URLClassloader holding JarFile references. I never figured out what was using it, maybe macros? In any case, that's what class-monkey was working around.

@retronym retronym modified the milestones: 2.12.3, 2.12.4 Jul 8, 2017
retronym added a commit to retronym/scala that referenced this issue Sep 7, 2017
Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
retronym added a commit to retronym/scala that referenced this issue Sep 7, 2017
Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
retronym added a commit to retronym/scala that referenced this issue Sep 11, 2017
Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
allanrenucci added a commit to allanrenucci/dotty that referenced this issue Sep 13, 2017
Adapted from scalac commit bcf47b165ccfd8e1827188f70aeb24e2cecfb80f by
Jason Zaugg:

Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
allanrenucci added a commit to allanrenucci/dotty that referenced this issue Sep 13, 2017
Adapted from scalac commit bcf47b165ccfd8e1827188f70aeb24e2cecfb80f by
Jason Zaugg:

Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
allanrenucci added a commit to allanrenucci/dotty that referenced this issue Sep 13, 2017
Adapted from scalac commit bcf47b165ccfd8e1827188f70aeb24e2cecfb80f by
Jason Zaugg in scala/scala#6064:

Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
allanrenucci added a commit to dotty-staging/dotty that referenced this issue Sep 13, 2017
Adapted from scalac commit bcf47b165ccfd8e1827188f70aeb24e2cecfb80f by
Jason Zaugg in scala/scala#6064:

Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
allanrenucci added a commit to dotty-staging/dotty that referenced this issue Sep 15, 2017
Adapted from scalac commit bcf47b165ccfd8e1827188f70aeb24e2cecfb80f by
Jason Zaugg in scala/scala#6064:

Use the last modified timestamp and the file inode to help detect
when the file has been overwritten (as is common in SBT builds
with `exportJars := true`, or when using snapshot dependencies).

Fixes scala/bug#10295
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants