Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8317678: Fix up hashCode() for ZipFile.Source.Key #16115

Closed
wants to merge 12 commits into from

Conversation

coffeys
Copy link
Contributor

@coffeys coffeys commented Oct 10, 2023

Fix up java.util.zip.ZipFile$Source hashCode() impl so that duplicate Source objects aren't created for the same zip file.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8317678: Fix up hashCode() for ZipFile.Source.Key (Bug - P3)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16115/head:pull/16115
$ git checkout pull/16115

Update a local copy of the PR:
$ git checkout pull/16115
$ git pull https://git.openjdk.org/jdk.git pull/16115/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16115

View PR using the GUI difftool:
$ git pr show -t 16115

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16115.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 10, 2023

👋 Welcome back coffeys! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot changed the title 8317678 8317678: Fix up hashCode() for ZipFile.Source.Key Oct 10, 2023
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 10, 2023
@openjdk
Copy link

openjdk bot commented Oct 10, 2023

@coffeys The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Oct 10, 2023
@mlbridge
Copy link

mlbridge bot commented Oct 10, 2023

try {
return file.getCanonicalFile();
} catch (IOException e) {
throw new RuntimeException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping to RuntimeException doesn't look right here, instead need to use PrivilegedExceptionAction here and changing the Key constructor to throw IOException.

Comment on lines 1439 to 1444
Object fk = attrs.fileKey();
if (fk != null) {
return ((int)(t ^ (t >>> 32))) + fk.hashCode();
} else {
return ((int)(t ^ (t >>> 32))) + file.hashCode();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider this trivial simplification:

Suggested change
Object fk = attrs.fileKey();
if (fk != null) {
return ((int)(t ^ (t >>> 32))) + fk.hashCode();
} else {
return ((int)(t ^ (t >>> 32))) + file.hashCode();
}
Object fk = attrs.fileKey();
return Long.hashCode(t)
+ (fk != null ? fk.hashCode() : file.hashCode());

@AlanBateman
Copy link
Contributor

This is really ZipFile.Source.Key hashCode/equals being inconsistent so I think we should rename the JBS/PR title to make that clearer.

@AlanBateman
Copy link
Contributor

Can you look at the equals method too, need to decide if the checking of lastModifiedTime should be removed.

this.utf8 = zc.isUTF8();
}

@SuppressWarnings("removal")
private File getCanonicalFile(File file) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private File getCanonicalFile(File file) {
private static File getCanonicalFile(File file) {

To avoid this-escape.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 10, 2023

Thanks for the comments to date. I think the latest commit should satisfy requests made.

for Key.equals(Object) and Key.hashCode() - I've changed the impl so that attrs.lastModifiedTime() is only used in the case where attrs.fileKey() returns null. Is that in the direction you were suggesting Alan ?

AccessController.doPrivileged((PrivilegedExceptionAction<File>) file::getCanonicalFile) :
file.getCanonicalFile();
} catch (PrivilegedActionException e) {
throw new IOException(e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IOException to throw is PAE's cause, meaning you need to unwrap.

private static File getCanonicalFile(File file) throws IOException {
try {
return System.getSecurityManager() != null ?
AccessController.doPrivileged((PrivilegedExceptionAction<File>) file::getCanonicalFile) :
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you'll have to cleanly separate the SM and no-SM case because of the exception handling, then you can use PrivilegedExceptionAction<File> pae = file::getCanonicalFile.

}
if (!attrs.lastModifiedTime().equals(key.attrs.lastModifiedTime())) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The question in the previous comment is why is the last modified time used here, I think you have to dig into why it is used by equals.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 13, 2023

lastModifiedTime() was used in the native code and brought across when native code was removed in JDK-8145260

I thought we might be able to drop it for the case where we have a unique file key but that won't cover the case where an existing zip file is opened and modified. Re-introduced the lastModifiedTime checks as a result. We need to be able to detect such scenarios, otherwise new ZipFile constructs would continue to use a stale <Key, Source> mapping (if such Keys still existed)

I've updated the test case to junit format and beefed it up with some extra scenarios to cover the above.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for driving this issue Sean.

Overall I think it looks good.

Please see my minor comment below which you can handle prior to pushing the PR

}

@Test
public static void test() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps give the test a better name and add a brief comment block at least to the test method as we have been trying to add more clarity to the new tests that we add

@openjdk
Copy link

openjdk bot commented Oct 13, 2023

@coffeys This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8317678: Fix up hashCode() for ZipFile.Source.Key

Reviewed-by: lancea, alanb, jpai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 142 new commits pushed to the master branch:

  • 69c0ae2: 8318124: JFR: Rewrite instrumentation to use Class-File API
  • c1aeac7: 8318445: More broken bailout chains in C2
  • d888b26: 8318071: IgnoreUnrecognizedVMOptions flag still causes failure in ArchiveHeapTestClass
  • bea2d48: 8312475: org.jline.util.PumpReader signed byte problem
  • 9f767aa: 8318109: Writing JFR records while a CHT has taken its lock asserts in rank checking
  • bd22d23: 8318027: Support alternative name to jdk.internal.vm.compiler
  • c2efd77: 8295795: hsdis does not build with binutils 2.39+
  • 99de9bb: 8317807: JAVA_FLAGS removed from jtreg running in JDK-8317039
  • 704c6ea: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java
  • 7c0a828: 8318649: G1: Remove unimplemented HeapRegionRemSet::add_code_root_locked
  • ... and 132 more: https://git.openjdk.org/jdk/compare/6273ab97dc1a0d3c1f51ba94694d9594dd7593d4...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Oct 13, 2023

@AfterAll
public static void cleanup() throws IOException {
Files.deleteIfExists(Path.of(ZIPENTRY_NAME));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Sean, this looks like a typo to me. I think it should have been ZIPFILE_NAME instead of ZIPENTRY_NAME.

@jaikiran
Copy link
Member

I had a look at the linked JBS and the new jtreg test case in this PR. Is my understanding correct that the issue that we are trying to solve here is the case where the cache would end up having 2 separate entries for the same jar file, if one JarFile was constructed using an absolute path and the other JarFile (for the same underlying file on the filesystem) was constructed using a relative path?

The mention of fixing hashCode() makes me wonder if there is/was some other issue that was triggered which lead to creation of that JBS issue.

Coming to the proposed implementation in this PR, I see that we first rely on BasicFileAttributes.fileKey() and if it is available for a given File instance, we store that File instance as a component of the Key. Looking at the javadoc of BasicFileAttributes.fileKey() it states:

* Returns an object that uniquely identifies the given file, or {@code
* null} if a file key is not available. On some platforms or file systems
* it is possible to use an identifier, or a combination of identifiers to
* uniquely identify a file. Such identifiers are important for operations
* such as file tree traversal in file systems that support <a
* href="../package-summary.html#links">symbolic links</a> or file systems
* that allow a file to be an entry in more than one directory. On UNIX file
* systems, for example, the <em>device ID</em> and <em>inode</em> are
* commonly used for such purposes.

It's unclear to me (and I haven't yet read up on inode management in Unix) if 2 different symbolic links pointing to the same target (jar) file on the filesystem would have 2 different/unique fileKey(). If they do, then do you think that even with the current proposed solution of using fileKey(), we would still end up with 2 different entries in this cache, each with a different Key? Would that then bring us back to the original issue?

As for the use of File.getCanonicalFile() in the absence of a fileKey(), there used to be performance considerations with the use of File.getCanonicalFile() (and File.getCanonicalPath()) to the extent that there used to be a cache involved. Recent versions of Java have removed that cache because it was no longer necessary as detailed in https://bugs.openjdk.org/browse/JDK-8301001. I don't have historical knowledge behind the motivation of that cache (JDK-8301001 does provide some necessary history), however JDK-8301001 also says that one of the reasons why the cache is no longer needed is because:

The work in JDK 9 to remove canonicalization from FilePermission (JDK-8164705) mostly eliminated the need for this caching.

Would introducing this File.getCanonicalFile() in this JarFile area now reintroduce performance considerations (for startup?) on filesystems that don't return a fileKey()? I see that the implementation of File.getCanonicalFile() now ends up being a native call.

return AccessController.doPrivileged(
(PrivilegedExceptionAction<File>) file::getCanonicalFile);
} catch (PrivilegedActionException pae) {
throw new IOException(pae.getException());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling isn't right, you should be re-throwing pae.getException when it's an IOException, this should do what you want

        PrivilegedExceptionAction<File> pa = file::getCanonicalFile;
        try {
            return AccessController.doPrivileged(pa);
        } catch (PrivilegedActionException pae) {
            Throwable cause = pae.getCause();
            if (cause instanceof IOException ioe) {
                throw ioe;
            } else {
                throw new IOException(cause);
            }
        }

}
}
}

public int hashCode() {
long t = utf8 ? 0 : Long.MAX_VALUE;
t += attrs.lastModifiedTime().toMillis();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZipFile cannot tolerate a zip file being changed while in use. I realise you don't want to change this part of the code but I think it is desperately needs a comment in both hashCode and equals to explain why the last modified time is included.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 17, 2023

thanks for the reviews and comments to date. I've pushed new changes which should address them.

Good catch on the ZIPENTRY_NAME typo @jaikiran
I've run some tests also and it looks like the fileKey() returns the same object/reference - even if symbolic links are used.

you raise an interesting point about getCanonicalFile() performance though. I've updated an existing benchmark[1] to check and performance for fileystems that don't support fileKey() (windows) has regressed[2]. Performance is much better on systems that support fileKey() (unix) -- I'll look at it further but perhaps I'll only target the optimizations for the filesystems that support filekey() for now.

[1]

diff --git a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
index ffbbc3d245f..34449c14e32 100644
--- a/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
+++ b/test/micro/org/openjdk/bench/java/util/zip/ZipFileOpen.java
@@ -29,6 +29,7 @@
 import java.io.FileOutputStream;
 import java.io.IOException;
 import java.nio.file.Files;
+import java.nio.file.Path;
 import java.util.concurrent.TimeUnit;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipFile;
@@ -50,6 +51,7 @@ public class ZipFileOpen {
     private int size;

     public File zipFile;
+    public File relativePathFile;

     @Setup(Level.Trial)
     public void beforeRun() throws IOException {
@@ -74,6 +76,8 @@ public void beforeRun() throws IOException {
             }
         }
         zipFile = tempFile;
+        relativePathFile = Path.of(System.getProperty("user.dir"))
+                                .relativize(zipFile.toPath()).toFile();
     }

     @Benchmark
@@ -90,4 +94,13 @@ public ZipFile openCloseZipFile() throws Exception {
         zf.close();
         return zf;
     }
+
+    @Benchmark
+    public void openCloseZipFilex2() throws Exception {
+        // Ensure that we only create ZipFile.Source.Key entry per unique zip file
+        ZipFile zf = new ZipFile(zipFile);
+        ZipFile zf2 = new ZipFile(relativePathFile);
+        zf.close();
+        zf2.close();
+    }
 }

[2]

jdk-tip-no-patch: (Windows)

Benchmark                       (size)  Mode  Cnt       Score        Error  Units
ZipFileOpen.openCloseZipFile       512  avgt   15  319543.232 ±  46112.481  ns/op
ZipFileOpen.openCloseZipFile      1024  avgt   15  463719.381 ± 139409.316  ns/op
ZipFileOpen.openCloseZipFilex2     512  avgt   15  750838.029 ± 190726.572  ns/op
ZipFileOpen.openCloseZipFilex2    1024  avgt   15  875174.035 ± 140971.036  ns/op

jdk-with-new-patch
Benchmark                       (size)  Mode  Cnt        Score        Error  Units
ZipFileOpen.openCloseZipFile       512  avgt   15   603053.335 ±  50514.066  ns/op
ZipFileOpen.openCloseZipFile      1024  avgt   15   710960.542 ±  46343.033  ns/op
ZipFileOpen.openCloseZipFilex2     512  avgt   15  1258566.172 ± 216230.090  ns/op
ZipFileOpen.openCloseZipFilex2    1024  avgt   15  1260380.961 ± 164216.628  ns/op

[3]

without patch: (Linux)

Benchmark                       (size)  Mode  Cnt       Score       Error  Units
ZipFileOpen.openCloseZipFile       512  avgt   15   75378.890 ?   642.370  ns/op
ZipFileOpen.openCloseZipFile      1024  avgt   15  150814.293 ? 29801.424  ns/op
ZipFileOpen.openCloseZipFilex2     512  avgt   15  168104.286 ? 41505.778  ns/op
ZipFileOpen.openCloseZipFilex2    1024  avgt   15  299282.488 ? 44496.777  ns/op
Finished running test 'micro:java.util.zip.ZipFileOpen'


with patch:

Benchmark                       (size)  Mode  Cnt       Score       Error  Units
ZipFileOpen.openCloseZipFile       512  avgt   15   75400.408 ?   408.981  ns/op
ZipFileOpen.openCloseZipFile      1024  avgt   15  146213.650 ? 15509.167  ns/op
ZipFileOpen.openCloseZipFilex2     512  avgt   15   83194.552 ? 16361.844  ns/op
ZipFileOpen.openCloseZipFilex2    1024  avgt   15  153956.028 ? 30550.595  ns/op

* The lastModifiedTime attribute is used to detect cases where
* the same file is opened more than once and where it has been
* modified in the mean-time
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change to the hashCode method is fine but I think the comments added to both hashCode + equals are still a bit confusing because they hint that the last modified time is used to detect if a zip file has been modified. What would you think about dropping these comments and instead put a class file comment on Key to say that it represents a key to a zip file. It key is based on the file key if available, or the canonical path if the file key is not available. The key is also based on the file's last modified time to allow for cases where a zip file is re-opened after it has been modified.

@jaikiran
Copy link
Member

Hello Sean, thank you for the updates. The fix now really boils down to fixing the hashCode() implementation of java.util.zip.ZipFile$Source$Key so that it honours the equals() and hashCode() contract which mandates that the hashCode() returned by instances which are equals() must be the same.

The change you have in the Key class now looks good to me.

Coming to the newly introduced ZipSourceCache.java test case, I went through it. What's being done in that test looks OK to me, but I felt that it's a bit too complex for what I think we should be testing here - asserting that hashCode() returned by Key instances are same if Key instances are equals().

I decided to attempt a different way to test this and this is what the test looks like:

/*
 * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
 * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.
 *
 * This code is free software; you can redistribute it and/or modify it
 * under the terms of the GNU General Public License version 2 only, as
 * published by the Free Software Foundation.
 *
 * This code is distributed in the hope that it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
 * version 2 for more details (a copy is included in the LICENSE file that
 * accompanied this code).
 *
 * You should have received a copy of the GNU General Public License version
 * 2 along with this work; if not, write to the Free Software Foundation,
 * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA.
 *
 * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA
 * or visit www.oracle.com if you need additional information or have any
 * questions.
 */

/* @test
 * @bug 8317678
 * @modules java.base/java.util.zip:open
 * @summary Fix up hashCode() for ZipFile.Source.Key
 * @run junit/othervm ZipSourceCache
 */

import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.reflect.Constructor;
import java.lang.reflect.Method;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystem;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.attribute.BasicFileAttributes;

import org.junit.jupiter.api.AfterAll;
import org.junit.jupiter.api.Assertions;
import org.junit.jupiter.api.BeforeAll;
import org.junit.jupiter.api.Test;

public class ZipSourceCache {
    private static final String TEST_FILE_NAME = "foobar.zip";

    @BeforeAll
    public static void beforeAll() throws Exception {
        // create the file to run our test
        try (final FileOutputStream fos = new FileOutputStream(TEST_FILE_NAME)) {
            // no need to write anything to the file, just the presence of the file
            // is enough
        }
    }

    @AfterAll
    public static void afterAll() throws Exception {
        Files.deleteIfExists(Path.of(TEST_FILE_NAME));
    }

    /*
     * Verifies that instances of java.util.zip.ZipFile$Source$Key which are "equal()"
     * return back the same "hashCode()"
     */
    @Test
    public void testKeyEqualsHashCode() throws Exception {
        // create an instance of java.util.zip.ZipCoder for UTF-8 charset
        final Class<?> zipCoderClass = Class.forName("java.util.zip.ZipCoder");
        final Method m = zipCoderClass.getDeclaredMethod("get", Charset.class);
        m.setAccessible(true);
        final Object utf8ZipCoder = m.invoke(null, StandardCharsets.UTF_8);


        // get the java.util.zip.ZipFile$Source$Key class
        final Class<?> keyClass = Class.forName("java.util.zip.ZipFile$Source$Key");
        final Constructor<?> ctr = keyClass.getConstructor(File.class, BasicFileAttributes.class,
                zipCoderClass);
        ctr.setAccessible(true);

        final File relFile = new File(TEST_FILE_NAME);
        final File absFile = new File(TEST_FILE_NAME).getAbsoluteFile();
        System.out.println("Running test on " + keyClass.getName() + " with relative File path "
                + relFile.getPath() + " and absolute File path " + absFile.getPath());
        // create an instance of java.util.zip.ZipFile$Source$Key passing it a relative file path
        // to "foobar.zip"
        final Object relKey = ctr.newInstance(relFile, getAttrs(relFile), utf8ZipCoder);
        System.out.println("relKey, hashCode = " + relKey.hashCode());

        // create an instance of java.util.zip.ZipFile$Source$Key passing it an absolute file path
        // to "foobar.zip"
        final Object absKey = ctr.newInstance(absFile, getAttrs(absFile), utf8ZipCoder);
        System.out.println("absKey, hashCode = " + absKey.hashCode());
        // these 2 Key instances, one created with a relative File path and other with a absolute
        // File path, to the same file, are expected to be "equal()"
        Assertions.assertTrue(absKey.equals(relKey), "Key instances were expected to be equal");
        // verify that the instances which are "equal()" return the same "hashCode()"
        Assertions.assertEquals(relKey.hashCode(), absKey.hashCode(), "Differing hashCode() for" +
                " Keys that were equal");
    }

    private static BasicFileAttributes getAttrs(final File file) throws IOException {
        final FileSystem fs = file.toPath().getFileSystem();
        return Files.readAttributes(fs.getPath(file.getPath()),
                BasicFileAttributes.class);
    }
}

All it does is asserts the hashCode() values of Key instances created out of a relative file and absolute file pointing to the same underlying file. This test consistently fails without your fix for the Key class and passes with your fix. I haven't run it on Windows, but I think it shouldn't be any different.

Do you think we could simplify the ZipSourceCache.java test in this PR to instead use this alternate approach or some form of it?

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The update to ZipFile.Key hashCode and the Key comments look good. I don't have time to study the test but I think Lance and Jai are looking at that.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 23, 2023

Thanks for the test code suggestion Jai. I expanded test case coverage during development of the fix. While your suggested test case captures the core issue at hand, I think it misses a few important points that I captured in the current test.

  • ZipFile behaviour in the scenario where a file is re-opened and edited while references to the old ZipFile exist. A "invalid LOC header" needs to be thrown in cases where an old zip file mapping is still in use. An old iteration of this patch was going to remove use of lastModifiedTime and probably would have caused issue for some apps. This test case protects us in the future from that.

  • the behaviour of incorrectly adding extra <Key, Source> mappings to our internal map has gone undetected for a few years. We've no test coverage in that area. The current test closely monitors numbers in that map now and allows for behavioural differences between Filesystems that support and don't support fileKey(). I think that protects us from future changes we might make in this area also.

I think your test will fail on windows. The rel and abs file Keys are still treated as different there (proposal to use getCanonicalPath seems too costly for performance)

@jaikiran
Copy link
Member

ZipFile behaviour in the scenario where a file is re-opened and edited
while references to the old ZipFile exist. A "invalid LOC header" needs to be
thrown in cases where an old zip file mapping is still in use.

This PR doesn't do any source changes to introduce this behaviour. It instead just adds a test to assert this existing behaviour. So that's a good thing and testing this behaviour sounds fine to me.

the behaviour of incorrectly adding extra <Key, Source> mappings to our
internal map has gone undetected for a few years.
We've no test coverage in that area. The current test closely monitors
numbers in that map now and allows for behavioural differences between
Filesystems that support and don't support fileKey(). I think that protects us
from future changes we might make in this area also.

What you propose here sounds fine. The only part that I think will need a bit more experiments is to run something like a test-repeat of 50 or more to make sure that the number of jars opened when this test is running doesn't get impacted by any unforeseen loading of classes or such (from the test framework for example) which could end up opening some more jar files which can then cause the shared static map to have a different count of jars than this test would expect. This is already a /othervm test so intereference is already reduced, but it would still be good to see if there's any scope of intermittent failures, by runinng a test-repeat job.

I think your test will fail on windows. The rel and abs file Keys are still treated as different there (proposal to use getCanonicalPath seems too costly for performance)

You are right, I missed that part where the equals() may return false on Windows. So that part can't actually be asserted. The hashCode() too then needs to be asserted only if the instances are equals().

While your suggested test case captures the core issue at hand, I think it misses a few important points that I captured in the current test.

Thank you for the additional details explaining the rationale behind this test. What you propose sounds fine to me then. I've a few minor nits about this test, which I'll add inline.

ZipFile absoluteZipFile;
HashMap internalMap;
int numSources;
try (ZipFile zipFile = new ZipFile(ZIPFILE_NAME)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think changing this to:

new ZipFile(relativeFile)

would be better and more clear about the intent? A few lines below, we use new ZipFile(absoluteFile) which I think is good because it makes it clear what is being done.

assertEquals(INVALID_LOC_EXCEPTION, ioe.getMessage());
z.close();
assertEquals(--numSources, internalMap.size());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a else block which prints that this part of the testing was skipped might provide some visibility on which parts of this test were run, if/when debugging this test on CI environments.

// update the zip file, should expect a new Source Object
// ignore this part of test if file can't be updated (can't overwrite)
if (createZipFile("differentContent")) {
ZipFile z = new ZipFile(ZIPFILE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as a few lines above, maybe we should use new ZipFile(relativeFile) here to make the intention clear?

@jaikiran
Copy link
Member

Finally, now that this PR only fixes a bug in the hashCode() and doesn't introduce any performance changing characteristics to the code, do you think the micro benchmark test update being proposed in this PR is needed? I don't have too much knowledge of jmh and it's a bit unclear to me if the introduction of this new @Benchmark method would interefere with the numbers of the existing @Benchmark method. I can spend some time on reading up jmh and then reviewing this micro benchmark if you think adding that benchmark is useful in context of this PR.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 23, 2023

Thanks for the review Jai. I've edited the test to use ZipFile z = new ZipFile(relativeFile) in both places.

Regards the JMH benchmark, note that there is a significant performance gain for some environments with this patch. 2X improvement for filesystems that support fileKey() and deal with opening of a zip file twice (or more) but with different variants of path name. The benchmark update was useful for highlighting this. The existing src = files.get(key); ZipFile code call is key to this improvement. Now that hashcode() is corrected, a new <Key, Source> pairing is avoided for the same zip file.

p.s. I've run test-repeat on mach5 for this new test a few times already. It was test repeat 20 or 30 IIRC. I'll bump it to 50 for last test round before integrating.

@coffeys
Copy link
Contributor Author

coffeys commented Oct 23, 2023

meant to add - note that createZipFile already prints an error to standard err in the scenario where the zip file can't be opened for modification.

@jaikiran
Copy link
Member

Thank you for the updates, Sean. The test ZipSourceCache looks good to me in its current form. I have no other review comments except for the jmh benchmark. I'll approve this PR in its current form.

As for the jmh benchmark, the existing benchmark method has some informative inline comment explaining what it does. Perhaps we should have something similar for this new benchmark method explaining why it is there? It currently says "// Ensure that we only create ZipFile.Source.Key entry per unique zip file" which doesn't seem accurate considering that this is a benchmark method and doesn't have any verifications of Key counts.

I'll read up a bit more about jmh tomorrow because I'm unsure how multiple @Benchmark methods within a single class are run and what all those annotations mean when it comes to state management and such. But you don't have to wait for my approval of that part. If anyone else approves this in the meantime, please go ahead with your integration.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latest changes are fine Sean

@coffeys
Copy link
Contributor Author

coffeys commented Oct 23, 2023

Thanks for the reviews - test-repeat=50 passed on mach5 for product/debug platforms Jai. All looks good.

/integrate

@openjdk
Copy link

openjdk bot commented Oct 23, 2023

Going to push as commit 8d9a4b4.
Since your change was applied there have been 142 commits pushed to the master branch:

  • 69c0ae2: 8318124: JFR: Rewrite instrumentation to use Class-File API
  • c1aeac7: 8318445: More broken bailout chains in C2
  • d888b26: 8318071: IgnoreUnrecognizedVMOptions flag still causes failure in ArchiveHeapTestClass
  • bea2d48: 8312475: org.jline.util.PumpReader signed byte problem
  • 9f767aa: 8318109: Writing JFR records while a CHT has taken its lock asserts in rank checking
  • bd22d23: 8318027: Support alternative name to jdk.internal.vm.compiler
  • c2efd77: 8295795: hsdis does not build with binutils 2.39+
  • 99de9bb: 8317807: JAVA_FLAGS removed from jtreg running in JDK-8317039
  • 704c6ea: 8303525: Refactor/cleanup open/test/jdk/javax/rmi/ssl/SSLSocketParametersTest.java
  • 7c0a828: 8318649: G1: Remove unimplemented HeapRegionRemSet::add_code_root_locked
  • ... and 132 more: https://git.openjdk.org/jdk/compare/6273ab97dc1a0d3c1f51ba94694d9594dd7593d4...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Oct 23, 2023
@openjdk openjdk bot closed this Oct 23, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Oct 23, 2023
@openjdk
Copy link

openjdk bot commented Oct 23, 2023

@coffeys Pushed as commit 8d9a4b4.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@eirbjo
Copy link
Contributor

eirbjo commented Dec 13, 2023

I noticed the ZipSourceCache test fail on GHA on windows-x64:

FAILED     ZipSourceCache::testKeySourceMapping 'testKeySourceMapping()'
java.nio.file.FileSystemException: 1702471080605-bug8317678.zip: The process cannot access the file because it is being used by another process
	at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
	at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108)
	at java.base/sun.nio.fs.WindowsFileSystemProvider.implDelete(WindowsFileSystemProvider.java:273)
	at java.base/sun.nio.fs.AbstractFileSystemProvider.deleteIfExists(AbstractFileSystemProvider.java:109)
	at java.base/java.nio.file.Files.deleteIfExists(Files.java:1191)
	at ZipSourceCache.cleanup(ZipSourceCache.java:65)

On a re-run of the jobs, the test ran green again on the same source. So this seems timing related.

@jaikiran
Copy link
Member

Hello Eirik, I have opened https://bugs.openjdk.org/browse/JDK-8322078 to track this failure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
7 participants