Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: run GC after Runner finished #145

Merged
merged 1 commit into from Jan 9, 2019
Merged

fix: run GC after Runner finished #145

merged 1 commit into from Jan 9, 2019

Conversation

grv87
Copy link
Contributor

@grv87 grv87 commented Jan 9, 2019

This patch, used together with forceGC=true, fixes #134.

@melix
Copy link
Owner

melix commented Jan 9, 2019

Thanks, could you verify that it actually solves #134 ?

@jnellis
Copy link

jnellis commented Jan 10, 2019

jmh.jar is still locking on windows.
This seems like this issue
gradle/gradle#937
which points to this issue
gradle/gradle#5998
which recommends this tool
https://file-leak-detector.kohsuke.org/
to figure out the root cause.

@grv87
Copy link
Contributor Author

grv87 commented Jan 12, 2019

I tried File Leak Detector for both Gradle and JMH, but it didn't give me any results except non-closed socket in JMH. Which I ascribed to ShutdownTimeoutThread.
Maybe this tool itself has side effect, and files become closed correctly.

@jnellis, waiting for 30 seconds, as I noted here, doesn't help too?
If not, could you share your build script?
All Gradle issues that I've seen, including two mentioned by you, point out that the leak is in the one of plugins. Maybe not even in JMH plugin. It makes sense to analyze the whole build.

@jnellis
Copy link

jnellis commented Jan 12, 2019

Thanks for running that leak detector, the jar file was giving me manifest error. Trying to change jmh.shutdownTimeout doesn't seem to have an affect. Running jmh from a stopped gradle daemon it does of course work. When the task runs, the gradle daemon starts five more java processes one of which is another gradle daemon and the rest are 'worker daemon's; upon completion of the task, two of these worker daemon processes are remaining.
gradle --status
shows one IDLE process whose PID matches one of these java processes and then it will list 1-4 more STOPPED processes.

This is the command line for the gradle daemon idle process:

C:\libs\javapath\bin\java.exe -XX:+HeapDumpOnOutOfMemoryError -Xmx1024m -Dfile.encoding=windows-1252 -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\wrapper\dists\gradle-4.9-bin\e9cinqnqvph59rr7g70qubb4t\gradle-4.9\lib\gradle-launcher-4.9.jar org.gradle.launcher.daemon.bootstrap.GradleDaemon 4.9

These are the command lines for the two worker daemon processes that remain:

C:\libs\javapath\bin\java.exe -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager -Dfile.encoding=windows-1252 -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\caches\4.9\workerMain\gradle-worker.jar worker.org.gradle.process.internal.worker.GradleWorkerMain "'Gradle Worker Daemon 2'"

C:\libs\javapath\bin\java.exe -Djava.security.manager=worker.org.gradle.process.internal.worker.child.BootstrapSecurityManager -Dfile.encoding=windows-1252 -Djava.io.tmpdir=C:\Users\jnellis\dev\github\binpack\build\tmp\jmh -Duser.country=US -Duser.language=en -Duser.variant -cp C:\Users\jnellis\.gradle\caches\4.9\workerMain\gradle-worker.jar worker.org.gradle.process.internal.worker.GradleWorkerMain "'Gradle Worker Daemon 3'"

You can run the jmh task again. It's only when you try to clean that you'll get 'can not delete jmh.jar' message because its trying to delete that directory. So if you don't have a habit of not cleaning before tasks then you won't see this error. Up until this point, I've just killed the gradle daemon entirely to get moving again.

Now, what if I kill these two worker processes but not the idle gradle daemon process and try to run jmh again (but don't clean first).
C:\Users\jnellis\dev\github\binpack>gradlew --stacktrace jmh > stdout.txt 2> stderr.txt
stdout.txt
stderr.txt

At this point I decided to do a clean since I had killed those two processes and knew it would be able to delete that jmh.jar. NOTE: the original gradle daemon process is still sitting idle, I have not done gradlew --stop. Then I tried to run jmh again and it fails but in a
different way.

C:\Users\jnellis\dev\github\binpack>gradlew clean

BUILD SUCCESSFUL in 1s
1 actionable task: 1 executed
C:\Users\jnellis\dev\github\binpack>gradlew --stacktrace jmh > stdout2.txt 2> stderr2.txt

C:\Users\jnellis\dev\github\binpack>

stdout2.txt
stderr2.txt

From this you can conclude that these two Gradle Worker Daemon processes (2 & 3) respectively map to the plugin tasks jmhRunByteCodeGenerator (me.champeau.gradle.JmhBytecodeGeneratorRunnable) and jmh (me.champeau.gradle.IsolatedRunner).

So looking for commonalities of these two files I notice that both use javax.inject.@Inject along with JHMTask.java. Then I notice in JmhBytecodeGeneratorTask.java it uses import org.gradle.api.tasks.@CacheableTask. I don't know anything really about gradle plugin lifecycles and so haven't cloned the plugin to attempt building it but if you look at the javax.inject.@Inject javadocs it warns at the bottom about circular dependencies between fields. This seems a reasonable behavior between two related processes that can't shut down.

Let's revert this perhaps
bb22f5e#diff-0d315c75a161a3965b2c133585e20926

and maybe revert
b900883#diff-71f3f211dfb4fe11a09433c467596c97
as it seems to make benchmarks with many params run slower.

As for my build file, it's basically straight out of the plugin example.

@grv87
Copy link
Contributor Author

grv87 commented Jan 13, 2019

@jnellis, thanks for inspecting processes.
I confirm that I can still reproduce the issue in https://github.com/melix/jmh-gradle-example with Gradle 4.8 and 5.1.1. The worker of JMHTask locks the file.

So, the problem is that this task is using Worker API, and Worker API does not support volatile classpath.
@melix, I see several options:

  1. Open an issue in Gradle, and see what they'll answer.
    Personally, I see that current Gradle implementation (keeping worker daemon warm between runs) is perfect for workers with fixed classpath. And so I'm not sure they would consider this issue as a correct use case for Worker API.
  2. Use dynamic name / temporary directory for JMH jar.
  3. Remove Worker API.
    I'm not sure that this is a good solution. Even if we don't use its other features, Worker API still allows running different tasks in parallel.
    On the other hand, it could be bad idea to run anything in parallel with benchmarking.

@jnellis, I didn't get your reasoning that the problem is in annotations.
@Inject is required to use Worker API.
There are no circular dependencies, otherwise we would get runtime exception that object can't be created.
And removing @CacheableTask doesn't help.

@jnellis
Copy link

jnellis commented Jan 13, 2019

@grv87 Looks like that's it. It says Worker API was introduced in 4.1, that's where the locking behavior starts. For Windows you have to use gradle 4.0.2 and jmh plugin 0.4.3 right now for no issues. The WSL seems to work just fine though, you can see those same two Worker processes idle still, it just doesn't lock the jar file when trying to delete it during clean. Would these worker processes normally be exited if there wasn't this classpath issue?

I suppose a workaround then for windows is that you can use a higher gradle version (with appropriate jmh plugin version) and then only call the jmh task from WSL. So far I haven't any conflicts going between the two except heavy memory use.
Edit: If you close the WSL session without doing a gradlew stop the daemon and the worker process will hang around.

@melix
Copy link
Owner

melix commented Jan 13, 2019

There's no doubt to me that @CachableTask has nothing to do with locking. The fact we use the worker API now is maybe a problem, but the issue is most likely in JMH itself. Having to wait 30s is a strong indication of it.

@jnellis
Copy link

jnellis commented Jan 13, 2019

There's no amount of time that passes that it unlocks the jar on windows. It stays locked as long as those two worker daemon processes exist. Same on WSL(windows subsystem for linux; Ubuntu 16), except there is no lock to break(Linux don't care I guess), those worker process still sit idle waiting for IsolatedRunner and JmhBytecodeGeneratorRunnable to run again. There's gradle --status to list the actual daemon PID's but there's no command for listing Worker processes that are sitting idle. The docs only say that they will be killed when 'resources are low' or the gradle daemon itself stops (default 3hours I think I read somewhere.)

@grv87
Copy link
Contributor Author

grv87 commented Jan 13, 2019

@melix, I confirm that my previous statement on 30 seconds timeout was wrong. It was false positive, either because my system become limited on resources, and some idle processes decided to stop themselves, or this was a side effect from File Leak Detector.
Jar file is locked by Gradle worker daemon.

@jnellis
Copy link

jnellis commented Jan 16, 2019

@melix What neccessitates using IsolationMode.PROCESS in these two tasks? Since killing these two processes seems to release the file lock in windows maybe not having them run in their own process in the first place is a solution.

@melix
Copy link
Owner

melix commented Jan 17, 2019

Using this isolation mode makes sure that the task workers work in isolation from Gradle itself (that is to say in a different process). So using a different isolation mode would only make things worse.

To me the issue is JMH not closing a file. I didn't have time to investigate how to mitigate this, I will as soon as I can.

@veita
Copy link

veita commented Feb 10, 2019

@melix You should probably call close() on the amendedCL in JmhBytecodeGeneratorRunnable in order to close streams opened from file URLs.

Although there's no guaranty that this will fix the issue, it shouldn't make things worse. ;-)

@jnellis
Copy link

jnellis commented May 23, 2019

@veita This doesn't do anything from what I can tell. Locking still happens.

I fork the plugin and publish the latest version to mavenLocal and then link to that in a project to test and this is what I've tried.

  • Changing IsolationMode doesn't affect the locking. I thought that changing IsolationMode from PROCESS to CLASSLOADER would restrict the worker api from spawning a new process but for JMHTask it can't be done because changing the environment variables in CLASSLOADER mode is not allowed and passing env is required by JMHTask to pass a custom java.io.tmpdir location (your project build dir.) You can change the JmhBytecodeGeneratorTask to use CLASSLOADER and that solves one of the processes not being spawned but the locking behavior happens still.

@melix

To me the issue is JMH not closing a file.

No. You can short circuit the IsolationRunner class to do nothing and the benchmark.jar file as well as all associated dependencies in the user home ./gradle/cache are locked. It does seem to be directly related to gradle worker api and how its used in this scenario; creating an intermediate dependency then using it. Which seems like it would be a common idiom.

Furthermore, at least in WSL(ubuntu/oracle jdk8), zombie processes seem to be leftover. When a change in the source code or a project dependency invokes a new benchmark.jar being created; running that abandons the old daemon worker process that was still around and it spawns a new daemon worker process. So while its not blocking on WSL ubuntu it's also not allowing stale daemon processes to die off. The gradle docs say the default time deamon worker processes hang around until they are killed is three hours but that's not my experience; they will die only when gradle --stop or the parent process is killed. Someone on a linux machine should investigate for zombie processes after a benchmark.jar rebuild and run.

@grv87 said something smart in the #937 issue. Why keep a hot path just to run benchmarks when the benchmark is required to be regenerated anytime the benched source code is changed. So while maybe this isn't a bug with the gradle worker api, it's probably the wrong use case to use considering how jmh has to rebuild the benchmark each time.

To anyone that is having the same locking issues on windows and just has a simple project they would like to run jmh benchmarks until this is resolved I have made a crude drop in workaround

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not exiting JVM and keeping a lock on file after benchmarking is over, Windows 10
4 participants