Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FileZipArchive not closing file handles #9632

Closed
scabug opened this issue Jan 27, 2016 · 27 comments
Closed

FileZipArchive not closing file handles #9632

scabug opened this issue Jan 27, 2016 · 27 comments

Comments

@scabug
Copy link

scabug commented Jan 27, 2016

The implementation of the ZipArchive abstract file backend

https://github.com/scala/scala/blob/2.12.x/src/reflect/scala/reflect/io/ZipArchive.scala

never releases the streams to jar files that it is opening as part of providing the classpath to the compiler and (more importantly) interactive compiler.

This is not so bad on Linux, it's just wasteful, but on Windows it can be really bad as the OS locks the file and does not allow it to be deleted.

The effect is that if the user's IDE (or in my case ENSIME) is using the user's jars as the classpath definition, then the build tool (in my case, sbt in exportJars mode) cannot delete / replace the jars. The result to the end user is that the build tool is a no-op.

The problem can be fixed by releasing references to the jar file.

I am currently looking for a workaround that we can implement within ENSIME to alleviate the problem now for our users.

@scabug
Copy link
Author

scabug commented Jan 27, 2016

Imported From: https://issues.scala-lang.org/browse/SI-9632?orig=1
Reporter: Samuel Halliday (fommil)
Assignee: Hamish Dickson (hamish.dickson-at-gmail.com)
Affected Versions: 2.11.7
See #5207

@scabug
Copy link
Author

scabug commented Jan 28, 2016

@retronym said:
IntelliJ copies JARs from the original location to an IntelliJ private cache folder and refers to them from there. This strategy might be workable in Ensime, too.

I agree the current IO layer used by the compiler is really shabby. I started a refactoring a while back to replace it all with java.nio, but it was too tangled to make quick progress and I shelved that work.

@scabug
Copy link
Author

scabug commented Jan 28, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said:
@jason mainly, since I can see you've changed this code before - I've started looking into this today, do you have any advice on writing a test for this? My current plan is to add something to ZipArchiveTest which creates a new temp file which I can preform FileZipArchive things on... however I'm not quite sure how yet to make sure a stream is actually released

Also, given what you've said about java.nio (something I agree with btw..) is it worth making a fix to this before that refactor has happened?

@scabug
Copy link
Author

scabug commented Jan 29, 2016

@retronym said (edited on Jan 29, 2016 12:29:33 AM UTC):
I just tried something new, which I think might avoid the need to copy files.

diff --git a/src/reflect/scala/reflect/io/ZipArchive.scala b/src/reflect/scala/reflect/io/ZipArchive.scala
index 42e22fd..e0d9024 100644
--- a/src/reflect/scala/reflect/io/ZipArchive.scala
+++ b/src/reflect/scala/reflect/io/ZipArchive.scala
@@ -15,6 +15,7 @@ import java.util.jar.Manifest
 import scala.collection.mutable
 import scala.collection.convert.WrapAsScala.asScalaIterator
 import scala.annotation.tailrec
+import scala.sys.BooleanProp
 
 /** An abstraction for zip files and streams.  Everything is written the way
  *  it is for performance: we come through here a lot on every run.  Be careful
@@ -123,30 +124,52 @@ final class FileZipArchive(file: JFile) extends ZipArchive(file) {
   lazy val (root, allDirs) = {
     val root = new DirEntry("/")
     val dirs = mutable.HashMap[String, DirEntry]("/" -> root)
-    val zipFile = try {
+    def openZipFile = try {
+      if (file.getName.contains("math"))
+        println("opening: " + file.getName)
       new ZipFile(file)
     } catch {
       case ioe: IOException => throw new IOException("Error accessing " + file.getPath, ioe)
     }
+    def closeZipFile(f: ZipFile): Unit = {
+      f.close()
+      if (file.getName.contains("math"))
+        println("closed: " + file.getName)
+    }
+    val zipFile = openZipFile
 
-    val enum    = zipFile.entries()
+    val closeZip = FileZipArchive.closeZip.value
+    try {
 
-    while (enum.hasMoreElements) {
-      val zipEntry = enum.nextElement
-      val dir = getDir(dirs, zipEntry)
-      if (zipEntry.isDirectory) dir
-      else {
-        class FileEntry() extends Entry(zipEntry.getName) {
-          override def getArchive   = zipFile
-          override def lastModified = zipEntry.getTime()
-          override def input        = getArchive getInputStream zipEntry
-          override def sizeOption   = Some(zipEntry.getSize().toInt)
+      val enum = zipFile.entries()
+      while (enum.hasMoreElements) {
+        val zipEntry = enum.nextElement
+        val dir = getDir(dirs, zipEntry)
+        if (zipEntry.isDirectory) dir
+        else {
+          class FileEntry() extends Entry(zipEntry.getName) {
+            override def getArchive: ZipFile = {
+              if (closeZip) openZipFile else zipFile
+            }
+            override def lastModified = zipEntry.getTime()
+            override def input = {
+              val zipFile = getArchive
+              val delegate = zipFile getInputStream zipEntry
+              new FilterInputStream(delegate) {
+                override def close(): Unit = {
+                  delegate.close()
+                  if (closeZip) closeZipFile(zipFile)
+                }
+              }
+            }
+            override def sizeOption = Some(zipEntry.getSize().toInt)
+          }
+          val f = new FileEntry()
+          dir.entries(f.name) = f
         }
-        val f = new FileEntry()
-        dir.entries(f.name) = f
       }
-    }
-    (root, dirs)
+      (root, dirs)
+    } finally if (closeZip) closeZipFile(zipFile)
   }
 
   def iterator: Iterator[Entry] = root.iterator
@@ -164,6 +187,9 @@ final class FileZipArchive(file: JFile) extends ZipArchive(file) {
     case _                 => false
   }
 }
+object FileZipArchive {
+  private val closeZip = BooleanProp.keyExists("scala.classpath.closezip")
+}
 /** ''Note:  This library is considered experimental and should not be used unless you know what you are doing.'' */
 final class URLZipArchive(val url: URL) extends ZipArchive(null) {
   def iterator: Iterator[Entry] = {

This was enough to auto-close in a direct "unit test" of this class:

⚡ qscala -nc -Dscala.classpath.closezip
Welcome to Scala 2.12.0-20160127-203009-3f0bf2517e (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_71).
Type in expressions for evaluation. Or try :help.

scala> val f = new java.io.File("/Users/jason/.ivy2/cache/org.apache.commons/commons-math3/jars/commons-math3-3.0.jar")
f: java.io.File = /Users/jason/.ivy2/cache/org.apache.commons/commons-math3/jars/commons-math3-3.0.jar

scala> new scala.reflect.io.FileZipArchive(f)
opening: commons-math3-3.0.jar
closed: commons-math3-3.0.jar
res0: scala.reflect.io.FileZipArchive = /Users/jason/.ivy2/cache/org.apache.commons/commons-math3/jars/commons-math3-3.0.jar

scala> res0.allDirs.head._2.toList.head.toByteArray
opening: commons-math3-3.0.jar
closed: commons-math3-3.0.jar
res15: Array[Byte] = Array(-54, -2, -70, -66, 0, 0, 0, 49, 0, 36, 10, 0, 6, 0, 23, 10, 0, 24, 0, 25, 7, 0, 26, 10, 0, 3, 0, 29, 7, 0, 30, 7, 0, 31, 7, 0, 32, 1, 0, 6, 60, 105, 110, 105, 116, 62, 1, 0, 3, 40, 41, 86, 1, 0, 4, 67, 111, 100, 101, 1, 0, 15, 76, 105, 110, 101, 78, 117, 109, 98, 101, 114, 84, 97, 98, 108, 101, 1, 0, 18, 76, 111, 99, 97, 108, 86, 97, 114, 105, 97, 98, 108, 101, 84, 97, 98, 108, 101, 1, 0, 4, 116, 104, 105, 115, 1, 0, 50, 76, 111, 114, 103, 47, 97, 112, 97, 99, 104, 101, 47, 99, 111, 109, 109, 111, 110, 115, 47, 109, 97, 116, 104, 51, 47, 97, 110, 97, 108, 121, 115, 105, 115, 47, 102, 117, 110, 99, 116, 105, 111, 110, 47, 65, 116, 97, 110, 104, 59, 1, 0, 5, 118, 97, 108, 117, 101, 1, 0, 4, 40, 68, 41, 68, 1, 0, 1, 120, 1, 0, 1, 68, 1, 0, 10, 100, 101, 114, 105,...
scala> :quit

I confirmed with lsof that this the file was closed.

However, when I tried:

qscala -nc -Dscala.classpath.closezip -classpath ~/.ivy2/cache/org.apache.commons/commons-math3/jars/commons-math3-3.0.jar
Welcome to Scala 2.12.0-20160127-203009-3f0bf2517e (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_71).
Type in expressions for evaluation. Or try :help.

scala> opening: commons-math3-3.0.jar
closed: commons-math3-3.0.jar
# leave running

lsof reported that the file was still open. I took a quick look with YourKit-s file system probes, but I didn't find any record of closes or opens of that file, which probably means I was using the tool incorrectly.

Hopefully this gives you a starting point.

I think this ticket is a duplicate of #5207. I'm going to close that one as a duplicate of this one.

@scabug
Copy link
Author

scabug commented Jan 29, 2016

@retronym said:
[~hamish.dickson@gmail.com] I've added you to the "developers" group in JIRA so you can be assigned tickets :)

@scabug
Copy link
Author

scabug commented Jan 29, 2016

Samuel Halliday (fommil) said:
Excellent, thanks Jason, that's exactly the sort of thing I was looking for as the upstream fix. Do you think this could make it into 2.11.8?

I will still have to come up with a workaround for now, though. Do you think it would be possible for me to inject my own implementation of the zip readers into the interactive compiler's classpath?

This could all be futile... there is every possibility that ENSIME's java interactive compiler is doing the same thing, which will mean I'll definitely have to copy the files (which sounds like a massive and horrible hack).

Why, oh why, Windows, must you put us through this?

@scabug
Copy link
Author

scabug commented Jan 29, 2016

Samuel Halliday (fommil) said:
BTW, in YourKit, you need to look for InputStream, ZipFile, etc, to find references to open files. I don't think there is a "file handle" object on the heap that you can track.

@scabug
Copy link
Author

scabug commented Jan 29, 2016

@retronym said:
I was using the "events" tab of YourKit, which does show File open/read/write/close events, with stack traces at each of them. But I didn't spot events for the JAR I was testing against at all.

You can always put your own version of FileZipFileArchive.class ahead of scala-reflect.jar in the JVM classpath of Ensime if you want to "monkeypatch" in a new version.

@scabug
Copy link
Author

scabug commented Jan 29, 2016

@retronym said:
2.11.8 is coming very soon so I think its unlikely that we'll have this change fleshed out. Any help in getting my prototype above into shape will help, of course.

@scabug
Copy link
Author

scabug commented Jan 29, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said (edited on Jan 29, 2016 10:44:12 AM UTC):
@retronym great, thanks!! :)

Your code seems pretty sensible, but I'm a bit confused by lsof claiming the file is closed and then open...

@scabug
Copy link
Author

scabug commented Jan 31, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said (edited on Jan 31, 2016 3:10:35 PM UTC):
So I've spent a bit of time attempting to write automated tests for this. Using [~retronym]'s change as a starting point (the "make it work" flag is really useful for testing, but I'm guessing no one wants to keep it like that) it's easy enough to create tests that will fail on Windows.... but pass on unix.

Does anyone have any advice on how to check if a file is open in general? I know it's one of those annoying java problems, but expected to find something clever in the tests around io.Source (which I can't). Right now I'm assuming that's because they ran into exactly this problem and decided it had to be tested manually.

I'll keep working on this over the next day or so, but so you can see roughly what I'm trying to add to ZipArchiveTest

  @Test
  def closeFile: Unit = {
    System.setProperty("scala.classpath.closezip", "true")

    val f = createTempZipFile
    val fza = new FileZipArchive(createTempZipFile)

    fza.iterator

    System.clearProperty("scala.classpath.closezip")

    assertTrue(f.canWrite) // ie File is closed (on Windows)

    f.delete
  }

  @Test
  def leaveFileOpen: Unit = {
    val f = createTempZipFile
    val fza = new FileZipArchive(f)

    fza.iterator

    assertFalse(f.canWrite) // ie File is still open (on Windows anyway)
  }

  def createTempZipFile = {
    val f = JFile.createTempFile("test", ".zip")
    val zf = new ZipOutputStream(new FileOutputStream(f))
    zf.putNextEntry(new ZipEntry("data"))
    zf.close()
    f
  }

@scabug
Copy link
Author

scabug commented Feb 1, 2016

Samuel Halliday (fommil) said:
I'm trying this out as a monkey patch, for 2.10 and 2.11. Looking at the details of the proposed change, there are some references to the zipEntry that could persist after the zip file itself is closed, so this is dangerous territory.

@scabug
Copy link
Author

scabug commented Feb 1, 2016

Samuel Halliday (fommil) said:
ok, looks like ZipEntry is just a container. I was concerned that it held a reference.

@scabug
Copy link
Author

scabug commented Feb 1, 2016

Samuel Halliday (fommil) said:
The good news is that this seems to work, see my monkey patch to ensime-server here https://github.com/ensime/ensime-server/pull/1273

The bad news is that sbt appears to be holding references as well (probably the actual compiler, not the interactive compiler) so I might have to monkey patch there as well. I might get away with it if Windows allows me to delete files from within the same process that holds the reference.

@scabug
Copy link
Author

scabug commented Feb 2, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said:
[~fommil] That is good/bad news!

I know you need this fix now, so it doesn't help you today - but I have pretty much this change in a branch (along with a very simple test) here: https://github.com/hamishdickson/scala/tree/SI-9632-FileZipArchive-not-closing-file-handles I'll raise a pull request as soon as I'm 100% convinced about the tests

@scabug
Copy link
Author

scabug commented Feb 2, 2016

Samuel Halliday (fommil) said:
I'll have to take back what I said, I don't think this works after all... the sbt thing may have been a false lead. I'll need to investigate further and take some heap dumps.

@scabug
Copy link
Author

scabug commented Feb 2, 2016

Samuel Halliday (fommil) said:
(well it kind of works... it is unreliable)

@scabug
Copy link
Author

scabug commented Feb 2, 2016

Samuel Halliday (fommil) said:
I've been able to somewhat work around the problem in sbt by forcing a garbage collection. It's all very nasty.

A serious problem from this is that sbt could be failing to produce new jar files when exportJars is used and the user would be left (silently?) with stale jars :-/

@scabug
Copy link
Author

scabug commented Feb 2, 2016

@retronym said:
Take a look at sbt/sbt#1223 for some gory details.

@scabug
Copy link
Author

scabug commented Feb 2, 2016

Samuel Halliday (fommil) said:
ok, excellent find! I also raised sbt/sbt#2434 but it looks like forcegc might be a workaround for me. I'll try tonight and report back.

@scabug
Copy link
Author

scabug commented Feb 4, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said:
how did you get on [~fommil]?

@scabug
Copy link
Author

scabug commented Feb 5, 2016

Samuel Halliday (fommil) said:
the monkey patch seems to be working! But there is still something else in ENSIME keeping ZipFile references, and I think its in Apache VFS. It's caching the ZipFiles, and of course, creating a ZipFile instance create a file handler. Awesome.

@scabug
Copy link
Author

scabug commented Feb 7, 2016

Hamish Dickson (hamish.dickson-at-gmail.com) said:
Great :)

Well kind of great - sorry you're still having so many other issues

@scabug
Copy link
Author

scabug commented Mar 2, 2016

Samuel Halliday (fommil) said:
doing some cross referencing here, related to #9682 and #9683

@scabug
Copy link
Author

scabug commented Apr 21, 2016

Samuel Halliday (fommil) said:
I have a thought about how to better do this... why not just use a ClassLoader? This code is basically mimicking a classloader anyway. It would make more sense to just use a ClassLoader and then optimise the hell out of the classloader (which would also have the benefit of speeding up other parts of scalac that make use of classloading).

@scabug
Copy link
Author

scabug commented Mar 22, 2017

Samuel Halliday (fommil) said:
this is closed by scala/scala#5592

@scabug
Copy link
Author

scabug commented Mar 22, 2017

Samuel Halliday (fommil) said:
scala/scala#5592

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants