Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors running multiple independent ScriptingContainers in parallel #6218

Closed
jsolomon8080 opened this issue May 13, 2020 · 15 comments · Fixed by #6273
Closed

Errors running multiple independent ScriptingContainers in parallel #6218

jsolomon8080 opened this issue May 13, 2020 · 15 comments · Fixed by #6273

Comments

@jsolomon8080
Copy link

jsolomon8080 commented May 13, 2020

jruby 9.2.11.1 using the jruby-complete-9.2.11.1.jar

% java -version
openjdk version "11.0.4" 2019-07-16
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.4+11)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.4+11, mixed mode)

% uname -a
Linux localhost 3.10.0-1062.12.1.el7.x86_64 #1 SMP Tue Feb 4 23:02:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
% lsb_release -a
LSB Version:    :core-4.1-amd64:core-4.1-noarch
Distributor ID: CentOS
Description:    CentOS Linux release 7.6.1810 (Core) 
Release:        7.6.1810
Codename:       Core

Our environment is ScriptingContainer only. We had a ruby on rails web app that was ported to pure java but we have ~100 ERB templates and we use embedded jruby to evaluate them. In production, we have basically very few instances and threads using a ScriptingContainer. However, our unit tests use gradle and many tests touch the ERB pipeline and start ScirptingContainers.

We are finally updating to java11 from java8 and that has forced us to upgrade from jruby-1.6.7 (from 2011!) to the latest jruby (jruby-9.2.11.1). We have been using the same embedded architecture for 3-4 years and never had this problem with jruby-1.6.7.

Our environment is Linux and Windows and we see the same error on both platforms. In fact, probably more on Windows than Linux. I'm showing this error with CentOS 7.6 and java11 but it also repros with other versions CentOS and Windows and on java8.

This multi-threaded issue isn't super common in our CI/CD environment but I crafted a test case to make it common. I start X ScriptingContainers on Y threads where X=500 and Y=40. Again, not something we'd ever do, but it shows the behavior easily. Each ScriptingContainer is only accessed by the single thread that created it. Here is the java test case:

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;

import org.jruby.embed.LocalContextScope;
import org.jruby.embed.LocalVariableBehavior;
import org.jruby.embed.ScriptingContainer;
import org.junit.Test;

import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.ListenableFuture;
import com.google.common.util.concurrent.ListeningExecutorService;
import com.google.common.util.concurrent.MoreExecutors;

public class RubyTests {
   @Test
    public void testParallelInitizations() throws Exception {
        final int howManyThreads = 40;
        final int howManyRubies = 500;

        ListeningExecutorService executor = MoreExecutors.listeningDecorator(
                Executors.newFixedThreadPool(howManyThreads));
        System.err.println("starting");
        try {
            Collection<Callable<Void>> threadFuncs = new ArrayList<>();

            for(int ii=0; ii< howManyRubies; ++ii) {
                threadFuncs.add(() -> {
                    ScriptingContainer s = new ScriptingContainer(LocalContextScope.SINGLETHREAD,
                            LocalVariableBehavior.PERSISTENT);
                    s.runScriptlet("require 'erb'");
                    s.terminate();
                    return null;
                });
            }
            @SuppressWarnings({ "unchecked", "rawtypes" })
            List<ListenableFuture<Void>> futures = (List) executor.invokeAll(threadFuncs);
            Futures.allAsList(futures).get();
        } finally {
            executor.shutdownNow();
        }
    }
}

I think it should be fine to do what is done here. All the errors are related to reading the contents of the jruby-complete jar. I believe if we used the jruby.jar and stuck the contents of jruby.home on disk and set -Djruby.home to point there, it would also solve the problem. I can additionally solve the problem by synchronizing the contents of each thread with a global static lock.

Attached is a log file showing 18 errors among the 500 rubies. We normally see a MUCH lower incidence of this. It's probably because we are not using that many in any JVM. Calling terminate() exacerbates the problem. I think that's because if you don't call terminate(), then GC will do it for you and that doesn't happen that often.

TEST-RubyTests.txt

Here are some example stack traces from the above file:

org.jruby.exceptions.LoadError: (LoadError) no such file to load -- erb
	at org.jruby.RubyKernel.require(org/jruby/RubyKernel.java:974)
	at RUBY.require(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_require.rb:54)
	at RUBY.<main>(<script>:1)

org.jruby.exceptions.NameError: (NameError) uninitialized constant Gem::Dependency
	at org.jruby.RubyModule.const_missing(org/jruby/RubyModule.java:3760)
	at RUBY.gem(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_gem.rb:48)
	at RUBY.<main>(uri:classloader:/jruby/kernel/gem_prelude.rb:8)
	at org.jruby.RubyKernel.load(org/jruby/RubyKernel.java:1009)
	at RUBY.<main>(file:/home/user/.gradle/caches/modules-2/files-2.1/org.jruby/jruby-complete/9.2.11.1/78aa284f9b011173dc2b72bc1c8593a3606aacf3/jruby-complete-9.2.11.1.jar!/jruby/preludes.rb:4)

org.jruby.embed.EvalFailedException: (LoadError) no such file to load -- erb
	at org.jruby.embed.internal.EmbedEvalUnitImpl.run(EmbedEvalUnitImpl.java:131)
	at org.jruby.embed.ScriptingContainer.runUnit(ScriptingContainer.java:1295)
	at org.jruby.embed.ScriptingContainer.runScriptlet(ScriptingContainer.java:1288)
	at com.megacorp.RubyTest.lambda$testParallelInitizations$0(RubyTests.java:208)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:69)
	at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:834)


org.jruby.exceptions.SystemCallError: (SystemCallError) Unknown error (SystemCallError) - uri:classloader://META-INF/jruby.home/lib/ruby/gems/shared/specifications/did_you_mean-1.2.0.gemspec
	at org.jruby.RubyIO.read(org/jruby/RubyIO.java:3061)
	at org.jruby.RubyIO.read(org/jruby/RubyIO.java:3049)
	at org.jruby.RubyIO.read(org/jruby/RubyIO.java:3781)
	at RUBY.load(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/specification.rb:1158)
	at RUBY.to_spec(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/stub_specification.rb:192)
	at org.jruby.RubyArray.map(org/jruby/RubyArray.java:2577)
	at RUBY.matching_specs(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/dependency.rb:280)
	at RUBY.to_specs(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/dependency.rb:303)
	at RUBY.to_spec(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/dependency.rb:323)
	at RUBY.gem(uri:classloader:/META-INF/jruby.home/lib/ruby/stdlib/rubygems/core_ext/kernel_gem.rb:65)
	at RUBY.&lt;main&gt;(uri:classloader:/jruby/kernel/gem_prelude.rb:8)
	at org.jruby.RubyKernel.load(org/jruby/RubyKernel.java:1009)
	at RUBY.&lt;main&gt;(file:/home/user/.gradle/caches/modules-2/files-2.1/org.jruby/jruby-complete/9.2.11.1/78aa284f9b011173dc2b72bc1c8593a3606aacf3/jruby-complete-9.2.11.1.jar!/jruby/preludes.rb:4)
@jsolomon8080
Copy link
Author

jsolomon8080 commented May 13, 2020

I think this is the same issue: embulk/embulk#978

@headius
Copy link
Member

headius commented Jun 2, 2020

Thank you for the report! This is first in line when I get back to work tomorrow.

@headius headius added this to the JRuby 9.3.0.0 milestone Jun 2, 2020
@headius
Copy link
Member

headius commented Jun 4, 2020

Ok, so it was two days...

I just got your example running with JRuby master, and at first I tried just using jruby.jar plus all the jruby.home contents on the filesystem. I only got a couple failures, but the results are very interesting.

First failure that shows up is this one, a frustratingly hard-to-reproduce error we see randomly in CI:

Jun 04, 2020 1:24:47 AM com.google.common.util.concurrent.AggregateFuture log
SEVERE: Got more than one input Future failure. Logging failures after the first
java.lang.NullPointerException: Inflater has been closed
	at java.util.zip.Inflater.ensureOpen(Inflater.java:389)
	at java.util.zip.Inflater.inflate(Inflater.java:257)
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
	at java.io.FilterInputStream.read(FilterInputStream.java:133)
	at java.io.FilterInputStream.read(FilterInputStream.java:107)
	at org.jruby.runtime.load.LoadServiceResourceInputStream.bufferEntireStream(LoadServiceResourceInputStream.java:53)
	at org.jruby.runtime.load.LoadServiceResourceInputStream.<init>(LoadServiceResourceInputStream.java:36)
	at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:233)
	at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:33)
	at org.jruby.runtime.load.LoadService.load(LoadService.java:343)
	at org.jruby.RubyKernel.loadCommon(RubyKernel.java:1108)
	at org.jruby.RubyKernel.load(RubyKernel.java:1078)

Seems like we have a reproduction (finally!), and also seems like it's a concurrency + classloader resource issue.

The second one happens in roughly the same place. It's clear these are related:

Exception in thread "main" java.util.concurrent.ExecutionException: java.lang.NullPointerException
	at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:564)
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:525)
	at com.google.common.util.concurrent.AbstractFuture$TrustedFuture.get(AbstractFuture.java:102)
	at RubyTests.main(RubyTests.java:38)
Caused by: java.lang.NullPointerException
	at sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:130)
	at sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:152)
	at java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:239)
	at org.jruby.util.URLResource.openInputStream(URLResource.java:151)
	at org.jruby.util.FileResource.inputStream(FileResource.java:93)
	at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:228)
	at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:33)
	at org.jruby.runtime.load.LoadService.load(LoadService.java:343)
	at org.jruby.RubyKernel.loadCommon(RubyKernel.java:1108)
	at org.jruby.RubyKernel.load(RubyKernel.java:1078)

@headius
Copy link
Member

headius commented Jun 4, 2020

Another error we sometimes see in CI:

Caused by: java.lang.IllegalStateException: zip file closed
	at java.base/java.util.zip.ZipFile.ensureOpen(ZipFile.java:915)
	at java.base/java.util.zip.ZipFile.getEntry(ZipFile.java:347)
	at java.base/java.util.zip.ZipFile$1.getEntry(ZipFile.java:1121)
	at java.base/java.util.jar.JarFile.getEntry0(JarFile.java:576)
	at java.base/java.util.jar.JarFile.getEntry(JarFile.java:506)
	at java.base/sun.net.www.protocol.jar.URLJarFile.getEntry(URLJarFile.java:131)
	at java.base/sun.net.www.protocol.jar.JarURLConnection.connect(JarURLConnection.java:137)
	at java.base/sun.net.www.protocol.jar.JarURLConnection.getInputStream(JarURLConnection.java:155)
	at java.base/java.net.URLClassLoader.getResourceAsStream(URLClassLoader.java:328)
	at org.jruby.util.URLResource.openInputStream(URLResource.java:151)
	at org.jruby.util.FileResource.inputStream(FileResource.java:93)
	at org.jruby.runtime.load.LibrarySearcher$ResourceLibrary.load(LibrarySearcher.java:228)
	at org.jruby.runtime.load.LibrarySearcher$FoundLibrary.load(LibrarySearcher.java:33)
	at org.jruby.runtime.load.LoadService.load(LoadService.java:343)

@headius
Copy link
Member

headius commented Jun 4, 2020

And I think we have a winner, at least for some of these issues: f93becd

It seems like the state changes mentioned in that commit are, in fact, not actually safe enough for concurrent access. Reverting the commit locally makes the simple jruby.jar case I ran above work without any errors.

At the very least this proves that there's something bad happening in JRubyClassLoader under concurrent load, and we'll need to fix that or revert the change above.

Now, in your example you're using separate instances of ScriptingContainer, so I am still trying to figure out why these weren't thread-isolated.

@headius
Copy link
Member

headius commented Jun 4, 2020

You know what, nevermind... I did just get it to fail the same way.

@headius
Copy link
Member

headius commented Jun 4, 2020

Ok, pretty sure this is the problem:

private static final JarCache jarCache = new JarCache();

This JarCache object is shared in a static field, but caches live JarEntry objects that point at previously-opened jar files. In order to not leak, those entries are weakly referenced... which means that they might be dereferenced and finalized, closing the jar file they point at. This is the cause of "zip file closed", "inflated closed" and probably most of the errors your example produces.

The sharing is to some extent by-design. This logic was added to JRuby to support cases like yours, where there's many instances all basically loading the same files. Without a cache like this, every instance of ScriptingContainer would re-open a "connection" to the main JRuby jar file, and those connections would each need to do their own caching and such.

So we're going to need to think about how to do this without tying the cached file entries to closeable resources. It's also possible that e.g. OpenJDK may be handling such caching internally, which would eliminate the need for us to do our own caching.

@headius
Copy link
Member

headius commented Jun 4, 2020

Ah and it gets better. In an effort to clean up after ourselves, JRubyClassLoader will explicitly remove jar entries that it has retrieved from this logic. Removal triggers a teardown call to JarIndex.release, which ultimately closes the jar file.

public boolean remove(String jarPath){
String cacheKey = jarPath;
synchronized (indexCache) {
JarIndex index = indexCache.get(cacheKey);
if (index == null){
return false;
}
else{
index.release();
indexCache.remove(cacheKey);
return true;
}
}
}

So it's not just a GC mix-up that could cause this; when an otherwise separate runtime running in the same classloader spaces shuts itself down, it will probably close resources your runtime has cached.

@headius
Copy link
Member

headius commented Jun 5, 2020

OK dumping some findings here.

The JarCache is a problem, but fixing it does not eliminate all errors. I will submit a patch I have for that as a PR. A large percentage of these errors appear to be coming from the JDK's own built-in JarFile cache, leading to the ZipFile exceptions I pasted above. At the moment it seems like both URL.openStream and URLClassLoader.getResourceAsStream are suspect. Explicitly requesting a URLConnection and calling setUseCaches(false) makes the errors go away. I will submit that change as a separate PR.

The modified reproduction is provided below:

import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;

import org.jruby.embed.LocalContextScope;
import org.jruby.embed.LocalVariableBehavior;
import org.jruby.embed.ScriptingContainer;

import com.google.common.util.concurrent.Futures;
import com.google.common.util.concurrent.ListenableFuture;
import com.google.common.util.concurrent.ListeningExecutorService;
import com.google.common.util.concurrent.MoreExecutors;

public class RubyTests {
    public static void main(String[] args) throws Throwable {
        final int howManyThreads = 40;
        final int howManyRubies = 500;

        ListeningExecutorService executor = MoreExecutors.listeningDecorator(
                Executors.newFixedThreadPool(howManyThreads));
        System.err.println("starting");
        try {
            Collection<Callable<Void>> threadFuncs = new ArrayList<>();

            for(int ii=0; ii< howManyRubies; ++ii) {
                threadFuncs.add(() -> {
                    ScriptingContainer s = new ScriptingContainer(LocalContextScope.SINGLETHREAD,
                            LocalVariableBehavior.PERSISTENT);
                    s.runScriptlet("require 'erb'");
                    s.terminate();
                    return null;
                });
            }
            @SuppressWarnings({ "unchecked", "rawtypes" })
            List<ListenableFuture<Void>> futures = (List) executor.invokeAll(threadFuncs);
            Futures.allAsList(futures).get();
        } finally {
            executor.shutdownNow();
        }
    }
}

I am running it with the following command line, using a jruby-complete jar file and the latest versions of the guava and failureaccess jars (from guava maven org):

JRUBY_OPTS=-d java -Djruby.cli.debug=true -cp jruby-complete-9.3.0.0-SNAPSHOT.jar:guava-29.0-jre.jar:failureaccess-1.0.1.jar:. RubyTests" }'

headius added a commit to headius/jruby that referenced this issue Jun 5, 2020
This cache appears to have thread-safety issues. It is currently
used as a global (aka static) cache of all JarEntry from a given
jar file, to reduce the costs of searching and loading resources.
However it breaks when used across runtimes due to the
JRubyClassLoader shutdown sequence removing and thereby closing
jar files without consideration for whether they may be in use by
another runtime on the same JVM.

As far as I can tell, the jars being cached and explicitly
removed in this way should not conflict across runtimes, since
they are unpacked into unique temporary files, but the bug in
JarCache is there nonetheless.

There may be a way to ensure that a given index, and the JarFile
reference it contains, are not and will not be in use by any other
thread, allowing the file to be closed. But doing so as part of
the shutdown of a runtime defeats a key goal of caching globally:
reducing the cost of accessing the same jar from other runtimes.

This patch makes two changes to the handling of these jar indices:

* The old code used a WeakHashMap from String to JarIndex. But
WHM weakly-references keys, not values, meaning that it would very
quickly be vacated. This may or may not have impacted the
likelihood of attempting to use a closed jar file.
* Instead of eagerly cleaning up the jar indices, they are simply
dereferenced and allowed to finalize at a later time. This ensures
any live references to those indices will not find them
prematurely closed.

This was found while investigating jar resource errors reported in
issue jruby#6218. This change is not sufficient to fix all causes of
those errors but may prevent errors from JarCache.
headius added a commit to headius/jruby that referenced this issue Jun 5, 2020
The caching done by OpenJDK's jar connection logic appears to be
sensitive to heavy concurrent use. When requesting that a URL open
a stream on itself, the default behavior is for it to use an
internal cache of previously opened jar files. A map from the URL
to the JarFile is maintained, where JarFile eventually bottoms out
in an open ZipFile.

Unfortunately URLClassLoader uses the same cache to reuse open jar
files, while also adding those files to its list of closables.
When the URLClassLoader itself is closed, all closables are
closed, which may end up closing a cached JarFile.

So if different URLClassLoader instances are used to access the
same jar files, it is likely that a shared JarFile instance will
be used and eventually closed, triggering errors.

This commit alters our use of URL.openStream to always instead
open a URLConnection and then disable caching before proceeding to
actually open the stream. This avoids the use of cached jar files,
ensuring we don't hit this bug.

A similar change will likely also need to be done for all places
where we do ClassLoader.getResourceAsStream, since if the
classloader is a URLClassLoader it will backend on this same cache
and once again eat the poison pill.

Part of, and possibly sufficient for, issue jruby#6218.
@headius
Copy link
Member

headius commented Jun 5, 2020

This bug has opened a big can of worms, congratulations!

It appears that most of this is not actually our bug. I describe in #6269 the actual problem, which appears to be a bug in all versions of OpenJDK. Basically, when you are using URLs and URLClassLoaders to access jar files, it's possible they will reuse the same cached JarFile instances. Unfortunately URLClassLoader will close any JarFile instances it accesses, poisoning any that might still be cached.

The workarounds are to explicitly disable the cache by opening the intermediate URLConnection ourselves (the fix in #6269) or to simply avoid closing the URLClassLoader altogether:

diff --git a/core/src/main/java/org/jruby/util/JRubyClassLoader.java b/core/src/main/java/org/jruby/util/JRubyClassLoader.java
index 18c1cc8453..8ca0e83508 100644
--- a/core/src/main/java/org/jruby/util/JRubyClassLoader.java
+++ b/core/src/main/java/org/jruby/util/JRubyClassLoader.java
@@ -170,10 +170,10 @@ public class JRubyClassLoader extends ClassDefiningJRubyClassLoader {
      */
     @Override
     public void close() {
-        try {
-            super.close();
-        }
-        catch (Exception ex) { LOG.debug(ex); }
+//        try {
+//            super.close();
+//        }
+//        catch (Exception ex) { LOG.debug(ex); }
 
         try {
             // A hack to allow unloading all JDBC Drivers loaded by this classloader.

Either of these fixes appear sufficient to eliminate all exceptions from your test case, even in the absence of #6268 (which I believe is still a valid fix, but not a direct cause of this issue's problems).

@headius
Copy link
Member

headius commented Jun 5, 2020

FWIW it's likely that 1.6.7 worked because it supported JDKs prior to Java 7, and as a result we would not have been closing the 7+ URLClassLoader.close method.

@headius
Copy link
Member

headius commented Jun 6, 2020

Issue filed with OpenJDK: https://bugs.openjdk.java.net/browse/JDK-8246714

I believe this issue is the root cause of every error we've encountered in this bug report. Working around it should fix them all, but this is a critical issue that ultimately must be fixed in OpenJDK itself.

headius added a commit to headius/jruby that referenced this issue Jun 8, 2020
This is another alternative fix for the issues reported in jruby#6218
and diagnosed as a JDK bug in the following OpenJDK issue:

https://bugs.openjdk.java.net/browse/JDK-8246714

Rather than avoid the caching or leave all jar files open, this
patch takes a conservative approach. Temporary jar files that we
unpack must be deleted and should be closed even if globally
cached, since they will not conflict with other jar cache
consumers. Normal jars on the filesystem will remain unclosed, to
avoid impacting other consumers of the global jar cache.

This does mean that once a jar is opened by JRuby via our
classloader, it will not be closed until the process terminates.
Because this is a new behavior that may not be desirable in some
situations, this patch also adds a new property:

jruby.ji.close.classloader=true will close the per-runtime
classloader and any jar file instances it has cached, as before.
The default is "false" and in that case only the temporary jars
will be closed.

This fixes jruby#6218
@headius
Copy link
Member

headius commented Jun 9, 2020

I am resolving this issue for now via #6273, which modifies our classloader to only close the jar files that it unpacked to a temp location. This avoids runtimes leaking temp jar files but does not damage the global cache by fully closing all jars that the classloader has encountered.

This does not mean we are safe from the OpenJDK bug, since jars we use might be shared with some other non-JRuby in the same JVM that does close its classloaders, but the exposure here is significantly reduced (and eliminated when the only classloader accessing those jars is the JRuby classloader).

headius added a commit to headius/jruby that referenced this issue Jun 30, 2020
This is another alternative fix for the issues reported in jruby#6218
and diagnosed as a JDK bug in the following OpenJDK issue:

https://bugs.openjdk.java.net/browse/JDK-8246714

Rather than avoid the caching or leave all jar files open, this
patch takes a conservative approach. Temporary jar files that we
unpack must be deleted and should be closed even if globally
cached, since they will not conflict with other jar cache
consumers. Normal jars on the filesystem will remain unclosed, to
avoid impacting other consumers of the global jar cache.

This does mean that once a jar is opened by JRuby via our
classloader, it will not be closed until the process terminates.
Because this is a new behavior that may not be desirable in some
situations, this patch also adds a new property:

jruby.ji.close.classloader=true will close the per-runtime
classloader and any jar file instances it has cached, as before.
The default is "false" and in that case only the temporary jars
will be closed.

This fixes jruby#6218
@headius
Copy link
Member

headius commented Jun 30, 2020

@jsolomon8080 We will be incorporating the fix from #6273 into JRuby 9.2.12 to help you out!

In order to reduce the behavior difference from JRuby 9.2.11, the option to close the entire classloader will be on. You can get the new behavior (which will be default in JRuby 9.3) by setting JVM property jruby.ji.close.classloader to false.

See #6307 for the cherry-pick PR.

@headius headius modified the milestones: JRuby 9.3.0.0, JRuby 9.2.12.0 Jun 30, 2020
headius added a commit to headius/jruby that referenced this issue Jun 30, 2020
In order to prevent this behavior from diverging too much from the
behavior in JRuby 9.2.11, we will default this flag to on for the
JRuby 9.2.12 release. To get the new behavior, which will be the
default in JRuby 9.3, set JVM property jruby.ji.close.classloader
to "false".

See jruby#6218.
headius added a commit to headius/jruby that referenced this issue Jul 1, 2021
This cache appears to have thread-safety issues. It is currently
used as a global (aka static) cache of all JarEntry from a given
jar file, to reduce the costs of searching and loading resources.
However it breaks when used across runtimes due to the
JRubyClassLoader shutdown sequence removing and thereby closing
jar files without consideration for whether they may be in use by
another runtime on the same JVM.

As far as I can tell, the jars being cached and explicitly
removed in this way should not conflict across runtimes, since
they are unpacked into unique temporary files, but the bug in
JarCache is there nonetheless.

There may be a way to ensure that a given index, and the JarFile
reference it contains, are not and will not be in use by any other
thread, allowing the file to be closed. But doing so as part of
the shutdown of a runtime defeats a key goal of caching globally:
reducing the cost of accessing the same jar from other runtimes.

This patch makes two changes to the handling of these jar indices:

* The old code used a WeakHashMap from String to JarIndex. But
WHM weakly-references keys, not values, meaning that it would very
quickly be vacated. This may or may not have impacted the
likelihood of attempting to use a closed jar file.
* Instead of eagerly cleaning up the jar indices, they are simply
dereferenced and allowed to finalize at a later time. This ensures
any live references to those indices will not find them
prematurely closed.

This was found while investigating jar resource errors reported in
issue jruby#6218. This change is not sufficient to fix all causes of
those errors but may prevent errors from JarCache.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants