Skip to content

Conversation

@jaikiran
Copy link
Member

@jaikiran jaikiran commented Mar 11, 2025

Can I please get a review of this change which proposes to fix an issue java.util.zip.ZipFile which would cause failures when multiple instances of ZipFile using non-UTF8 Charset were operating against the same underlying ZIP file? This addresses https://bugs.openjdk.org/browse/JDK-8347712.

ZIP file specification allows for ZIP entries to mark a UTF-8 flag to indicate that the entry name and comment are encoded using UTF8. A java.util.zip.ZipFile can be constructed by passing it a Charset. This Charset (which defaults to UTF-8) gets used for decoding entry names and comments for non-UTF8 entries.

The internal implementation of ZipFile uses a ZipCoder (backed by java.nio.charset.CharsetEncoder/CharsetDecoder instance) for the given Charset. Except for UTF8 ZipCoder, other ZipCoders are not thread safe.

The internal implementation of ZipFile maintains a cache of ZipFile$Source. A Source corresponds to the underlying ZIP file and during construction, uses a ZipCoder for parsing the ZIP entries and once constructed holds on to the parsed ZIP structure. Multiple instances of a ZipFile which all correspond to the same ZIP file on the filesystem, share a single instance of Source (after the Source has been constructed and cached). Although ZipFile instances aren't expected to be thread-safe, the fact that multiple different instances of ZipFile could be sharing the same instance of Source in concurrent threads, mandates that the Source must be thread-safe.

In Java 15, we did a performance optimization through https://bugs.openjdk.org/browse/JDK-8243469. As part of that change, we started holding on to the ZipCoder instance (corresponding to the Charset provided during ZipFile construction) in the Source. This stored ZipCoder was then used for ZipFile operations when working with the ZIP entries. As noted previously, any non-UTF8 ZipCoder is not thread-safe and as a result, any usages of ZipCoder in the Source makes Source not thread-safe too. That effectively violates the requirement that Source must be thread-safe to allow for its usage in multiple different ZipFile instances concurrently. This then causes ZipFile usages to fail in unexpected ways like the one shown in the linked https://bugs.openjdk.org/browse/JDK-8347712.

The commit in this PR addresses the issue by not maintaining ZipCoder as a instance field of Source. Instead the ZipCoder is now maintained in the ZipFile, thus allowing a separate ZipCoder instance per ZipFile. The Source when being constructed and when parsing the entries, will use a method local ZipCoder instance as and when needed.

The change that was done in Java 15 was a performance optimization. I have verified that the current proposed change in this PR doesn't degrade the performance and yet fixes the reported issue. A new jtreg test has been introduced to reproduce the issue and verify the fix. Local testing with this change passed. tier testing is currently in progress.


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 2 Reviewers)

Issues

  • JDK-8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset (Bug - P3)
  • JDK-8355975: ZipFile uses incorrect Charset if another instance for the same ZIP file was constructed with a different Charset (Bug - P3)

Reviewers

Contributors

  • Eirik Bjørsnøs <eirbjo@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/23986/head:pull/23986
$ git checkout pull/23986

Update a local copy of the PR:
$ git checkout pull/23986
$ git pull https://git.openjdk.org/jdk.git pull/23986/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 23986

View PR using the GUI difftool:
$ git pr show -t 23986

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/23986.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 11, 2025

👋 Welcome back jpai! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 11, 2025

@jaikiran This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8347712: IllegalStateException on multithreaded ZipFile access with non-UTF8 charset
8355975: ZipFile uses incorrect Charset if another instance for the same ZIP file was constructed with a different Charset

Co-authored-by: Eirik Bjørsnøs <eirbjo@openjdk.org>
Reviewed-by: eirbjo, lancea, redestad, alanb

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 229 new commits pushed to the master branch:

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 11, 2025
@openjdk
Copy link

openjdk bot commented Mar 11, 2025

@jaikiran The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Mar 11, 2025
@jaikiran
Copy link
Member Author

The change that was done in Java 15 was a performance optimization. I have verified that the current proposed change in this PR doesn't degrade the performance and yet fixes the reported issue.

The test/micro/org/openjdk/bench/java/util/zip/ZipFileGetEntry.java micro benchmark was introduced in Java 15, when that change was done. I have run that benchmark locally on macos before and after the fix proposed in this current PR and I don't see any noticeable difference (which is a good thing):

Before the change in the current PR:

Benchmark                             (size)  Mode  Cnt   Score   Error  Units
ZipFileGetEntry.getEntryHit              512  avgt   15  32.636 ? 1.169  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15  36.082 ? 1.437  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  51.132 ? 1.241  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  53.466 ? 2.568  ns/op
ZipFileGetEntry.getEntryMiss             512  avgt   15   9.151 ? 0.560  ns/op
ZipFileGetEntry.getEntryMiss            1024  avgt   15   9.330 ? 0.538  ns/op
ZipFileGetEntry.getEntryMissUncached     512  avgt   15  18.173 ? 1.016  ns/op
ZipFileGetEntry.getEntryMissUncached    1024  avgt   15  21.209 ? 1.264  ns/op
After the change in the current PR:

Benchmark                             (size)  Mode  Cnt   Score   Error  Units
ZipFileGetEntry.getEntryHit              512  avgt   15  32.278 ? 0.565  ns/op
ZipFileGetEntry.getEntryHit             1024  avgt   15  35.998 ? 1.807  ns/op
ZipFileGetEntry.getEntryHitUncached      512  avgt   15  54.884 ? 3.452  ns/op
ZipFileGetEntry.getEntryHitUncached     1024  avgt   15  52.919 ? 1.621  ns/op
ZipFileGetEntry.getEntryMiss             512  avgt   15   8.898 ? 0.050  ns/op
ZipFileGetEntry.getEntryMiss            1024  avgt   15   9.184 ? 0.048  ns/op
ZipFileGetEntry.getEntryMissUncached     512  avgt   15  18.388 ? 0.708  ns/op
ZipFileGetEntry.getEntryMissUncached    1024  avgt   15  21.448 ? 0.848  ns/op

@mlbridge
Copy link

mlbridge bot commented Mar 11, 2025

@AlanBateman
Copy link
Contributor

AlanBateman commented Mar 11, 2025

This is going to require careful review. Can you ensure that this has at least 2 reviewers?

@jaikiran
Copy link
Member Author

/reviewers 2

@openjdk
Copy link

openjdk bot commented Mar 11, 2025

@jaikiran
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

@jaikiran
Copy link
Member Author

jaikiran commented Mar 11, 2025

This is going to require careful review. Can you ensure that this has at least 2 reviewers?

Agreed. I've officially marked this as requiring 2 reviewers. @LanceAndersen, @cl4es, I would specially like your reviews for this change, when you get a chance.

Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaikiran I looked over this and added some comments. Some or most may be personal preference, take whatever you find useful. (Not a Reviewer)

Comment on lines 85 to 86
// the ZipCoder instance is derived from the Charset passed to the ZipFile constructor
// and will be used for decoding the non-UTF-8 entry names and the ZIP file comment.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a personal preference, and perhaps I'm too intimitely familiar with this code, but I feel this field comment is overly verbose.

Not sure how useful it is to document what the instance is derived from, it's not something we do for other fields and maintainers can always inspect the constructor to find out how it's being assigned.

The "non-UTF-8" part is correct, but I feel this information belongs closer to where that decision is made, not here.

The ZipCoder is used when decoding entry comments as well, so the current comment is not fully correct.

Perhaps something like "Used for decoding binary encoded names and comments into strings" would do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it just needs to say the ZipCoder for entry names and comments when not using UTF-8.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I missed updating this comment in the previous iteration. I've now updated the PR to match Alan's suggestion.

private final String fileName; // name of the file
// the ZipCoder instance is derived from the Charset passed to the ZipFile constructor
// and will be used for decoding the non-UTF-8 entry names and the ZIP file comment.
private final ZipCoder entryNameCommentCoder;
Copy link
Contributor

@eirbjo eirbjo Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do not have separate ZipCoders for decoding the different ZIP values, I'm not sure it's useful to put the field names ("name"/"comment") into the instance field name here. Especially if the comment already has this information.

Could we call this just zc or zipCoder?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now renamed the field to zipCoder.

private final String fileName; // name of the file
// the ZipCoder instance is derived from the Charset passed to the ZipFile constructor
// and will be used for decoding the non-UTF-8 entry names and the ZIP file comment.
private final ZipCoder entryNameCommentCoder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Source ZipCoder field had a @Stable annotation. Any reason why this did not survive the move?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik, the ZipCoder field here is final so it's doesn't require a @Stable annotation. I realize this field in Source too was final and yet it was annotated @Stable, but my understanding is that, that was redundant and unnecessary. The documentation of @Stable states that a @Stable for final fields is only relevant when the field is a array type, which it isn't in the case here.

* @param cen the CEN
* @param pos the entry position
*/
private static boolean useUTF8Coder(final byte[] cen, final int pos) {
Copy link
Contributor

@eirbjo eirbjo Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems mostly used to determine which ZipCoder to pick. Could we fold this into the method, calling it zipCoderForPos and make it just return the ZipCoder, like we had in Source?

If it needs to be static, then the non-UTF8/fallback ZipCoder could be passed as a parameter?

If a method to determine whether a CEN entry uses UTF-8 is still needed, then I think it would be more appropriately named isUTF8Entry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems mostly used to determine which ZipCoder to pick. Could we fold this into the method, calling it zipCoderForPos and make it just return the ZipCoder, like we had in Source?

I intentionally removed the zipCoderForPos() method and instead introduced this static method to avoid the case where the code ends up calling this method and then stores the returned ZipCoder (like was being done in Source). Instead with this new method, it now allows the call sites to determine if a UTF8 ZipCoder is needed or a non-UTF8 one and then decide where to get the non-UTF8 ZipCoder from. If the call site is a instance method in ZipFile, then it was use the ZipFile's zipCoder field and if the call site is within Source, when the Source is being instantiated then it constructs a non-UTF8 ZipCoder afresh. That level of detail is better dealt at the call site instead of a method like zipCoderForPos().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Jaikiran,

I'm not convinced dropping zipCoderForPos is a step forward:

  • zipCoderForPos included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR and could impact lookup performance. It would be strange to repeat this optimization at all call sites.
  • The only thing differing between callsites seems to be which ZipCoder to use as a "fallback" when it's not UTF-8. This ZipCoder instance can easilly be passed in as a parameter, and will de-duplicate logic at the call sites.
  • The only "cumbersome" call site seems to be initCEN, caused by the lazy initialization of the non-UTF-8 ZipCoder. But that lazy initialization seems questionable: First, ZipCoder is already lazy in the way it initializes encoders and decoders. An unused ZipCoder is just a thin wrapper around a Charset and should not incur significant cost until it gets used. Second, the ZipFile constructor actually constructs a ZipCoder instance, would it not be better to just pass this down to initCEN as a parameter? That would avoid the cost of initializing encoders and decoders twice for the common case of single-threaded ZipFile access, once for initCEN, then again on the first lookup.

Here's a patch for a quick implementation of the above on top of your PR. (Code speeks better than words some times :-)

This retains the CENFLG optimization, simplifies logic at call sites and prevents duplicated initialization of ZipCoder beteween initCEN and lookups:

Index: src/java.base/share/classes/java/util/zip/ZipFile.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
--- a/src/java.base/share/classes/java/util/zip/ZipFile.java	(revision 935b04ed00522aa92105292baa0693c55b1356ae)
+++ b/src/java.base/share/classes/java/util/zip/ZipFile.java	(date 1741800197915)
@@ -201,8 +201,8 @@
         this.fileName = file.getName();
         long t0 = System.nanoTime();
 
-        this.res = new CleanableResource(this, charset, file, mode);
         this.zipCoder = ZipCoder.get(charset);
+        this.res = new CleanableResource(this, charset, zipCoder, file, mode);
 
         PerfCounter.getZipFileOpenTime().addElapsedTimeFrom(t0);
         PerfCounter.getZipFileCount().increment();
@@ -368,13 +368,19 @@
     }
 
     /**
-     * {@return true if the ZIP entry corresponding to the given {@code pos} uses UTF-8
-     * for entry name and comment field encoding, false otherwise}
+     * {@return a ZipCoder for decoding name and comment fields for the given CEN entry position}
      * @param cen the CEN
      * @param pos the entry position
+     * @param fallback the ZipCoder to return when the entry is not flagged as UTF-8
      */
-    private static boolean useUTF8Coder(final byte[] cen, final int pos) {
-        return (CENFLG(cen, pos) & USE_UTF8) != 0;
+    private static ZipCoder zipCoderFor(final byte[] cen, final int pos, ZipCoder fallback) {
+        if (fallback.isUTF8()) {
+            return ZipCoder.UTF8;
+        }
+        if ((CENFLG(cen, pos) & USE_UTF8) != 0) {
+            return ZipCoder.UTF8;
+        }
+        return fallback;
     }
 
     private static class InflaterCleanupAction implements Runnable {
@@ -575,7 +581,7 @@
     private String getEntryName(int pos) {
         byte[] cen = res.zsrc.cen;
         int nlen = CENNAM(cen, pos);
-        ZipCoder zc = useUTF8Coder(cen, pos) ? ZipCoder.UTF8 : zipCoder;
+        ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
         return zc.toString(cen, pos + CENHDR, nlen);
     }
 
@@ -646,7 +652,7 @@
         }
         if (clen != 0) {
             int start = pos + CENHDR + nlen + elen;
-            ZipCoder zc = useUTF8Coder(cen, pos) ? ZipCoder.UTF8 : zipCoder;
+            ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
             e.comment = zc.toString(cen, start, clen);
         }
         lastEntryName = e.name;
@@ -678,11 +684,11 @@
 
         Source zsrc;
 
-        CleanableResource(ZipFile zf, Charset charset, File file, int mode) throws IOException {
+        CleanableResource(ZipFile zf, Charset charset, ZipCoder zipCoder, File file, int mode) throws IOException {
             this.cleanable = CleanerFactory.cleaner().register(zf, this);
             this.istreams = Collections.newSetFromMap(new WeakHashMap<>());
             this.inflaterCache = new ArrayDeque<>();
-            this.zsrc = Source.get(file, (mode & OPEN_DELETE) != 0, charset);
+            this.zsrc = Source.get(file, (mode & OPEN_DELETE) != 0, charset, zipCoder);
         }
 
         void clean() {
@@ -1472,7 +1478,7 @@
         private static final java.nio.file.FileSystem builtInFS =
                 DefaultFileSystemProvider.theFileSystem();
 
-        static Source get(File file, boolean toDelete, Charset charset) throws IOException {
+        static Source get(File file, boolean toDelete, Charset charset, ZipCoder zipCoder) throws IOException {
             final Key key;
             try {
                 key = new Key(file,
@@ -1489,7 +1495,7 @@
                     return src;
                 }
             }
-            src = new Source(key, toDelete);
+            src = new Source(key, toDelete, zipCoder);
 
             synchronized (files) {
                 Source prev = files.putIfAbsent(key, src);
@@ -1511,7 +1517,7 @@
             }
         }
 
-        private Source(Key key, boolean toDelete) throws IOException {
+        private Source(Key key, boolean toDelete, ZipCoder zipCoder) throws IOException {
             this.key = key;
             if (toDelete) {
                 if (OperatingSystem.isWindows()) {
@@ -1525,7 +1531,7 @@
                 this.zfile = new RandomAccessFile(key.file, "r");
             }
             try {
-                initCEN(-1);
+                initCEN(-1, zipCoder);
                 byte[] buf = new byte[4];
                 readFullyAt(buf, 0, 4, 0);
                 this.startsWithLoc = (LOCSIG(buf) == LOCSIG);
@@ -1680,7 +1686,7 @@
         }
 
         // Reads ZIP file central directory.
-        private void initCEN(int knownTotal) throws IOException {
+        private void initCEN(int knownTotal, ZipCoder zipCoder) throws IOException {
             // Prefer locals for better performance during startup
             byte[] cen;
             if (knownTotal == -1) {
@@ -1736,31 +1742,19 @@
             int idx = 0; // Index into the entries array
             int pos = 0;
             manifestNum = 0;
-            ZipCoder zipCoder = null;
             int limit = cen.length - CENHDR;
             while (pos <= limit) {
                 if (idx >= entriesLength) {
                     // This will only happen if the ZIP file has an incorrect
                     // ENDTOT field, which usually means it contains more than
                     // 65535 entries.
-                    initCEN(countCENHeaders(cen));
+                    initCEN(countCENHeaders(cen), zipCoder);
                     return;
                 }
 
                 int entryPos = pos + CENHDR;
-                // the ZipCoder for any non-UTF8 entries
-                final ZipCoder entryZipCoder;
-                if (useUTF8Coder(cen, pos)) {
-                    // entry name explicitly wants UTF-8 Charset
-                    entryZipCoder = ZipCoder.UTF8;
-                } else {
-                    // use the ZipCoder corresponding to the Charset that
-                    // was provided when constructing the ZipFile
-                    if (zipCoder == null) {
-                        zipCoder = ZipCoder.get(key.charset);
-                    }
-                    entryZipCoder = zipCoder;
-                }
+                // the ZipCoder for this entry
+                final ZipCoder entryZipCoder = zipCoderFor(cen, pos, zipCoder);
                 // Checks the entry and adds values to entries[idx ... idx+2]
                 int nlen = checkAndAddEntry(pos, idx, entryZipCoder);
                 idx += 3;
@@ -1853,15 +1847,7 @@
                     int noff = pos + CENHDR;
                     int nlen = CENNAM(cen, pos);
 
-                    final ZipCoder zc;
-                    if (useUTF8Coder(cen, pos)) {
-                        // entry name explicitly wants UTF-8 Charset
-                        zc = ZipCoder.UTF8;
-                    } else {
-                        // use the ZipCoder corresponding to the Charset that
-                        // was provided when constructing the ZipFile
-                        zc = zipCoder;
-                    }
+                    final ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
                     // Compare the lookup name with the name encoded in the CEN
                     switch (zc.compare(name, cen, noff, nlen, addSlash)) {
                         case ZipCoder.EXACT_MATCH:

Copy link
Member Author

@jaikiran jaikiran Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik,

zipCoderForPos included a short-circuit optimization logic which skipped reading the CENFLG field in the case where the opening charset was UTF-8 (which it commonly is, and always for JarFile). This optimization seems lost in the currrent state of this PR

That's a good point. I'll update this PR shortly, borrowing from your proposed diff/patch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik, I have now updated the PR to borrow a major part of the diff that you provided. This now brings back the zipCoderFor() method and at the same time, we don't end up holding on to the ZipCoder in the the Source class, which is expected to be thread safe.

entries[index++] = hash;
entries[index++] = next;
entries[index ] = pos;
entries[index] = hash;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems unrelated to the issue at hand. Perhaps better left for a separate PR, backed by a benchmark.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a performance related change. I merely marked the method parameters as final and the index method parameter was being updated here.

private final Charset entryNameCommentCharset;

public Key(File file, BasicFileAttributes attrs, ZipCoder zc) {
public Key(File file, BasicFileAttributes attrs, Charset entryNameCommentCharset) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this parameter and the field it is assigned to could also just be called charset. The comment has the information about what it's used for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the PR to rename this to charset.

@Override
public int hashCode() {
long t = utf8 ? 0 : Long.MAX_VALUE;
long t = entryNameCommentCharset.hashCode();
Copy link
Contributor

@eirbjo eirbjo Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This represents a behavioral change, right? Is a CSR warranted?

EDIT: Scratch that, I guess the caching mechanism here is unspecified and and more of implementation detail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal implementation detail, so doesn't require a CSR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Jaikiran,

Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".

This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.

For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.

The bug could be solved in a separate PR, however I feel it's also ok to fix it in this PR, since moving the ZipCoder instance to ZipFile seems to incidentally solve the issue as well.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.

Consider the following test, which when added to ZipCoding.java fails in master but succeeds in this PR:

/**
 * Verifies that opening a ZipFile with an incorrect charset does not
 * prevent a later opening of the same file using the correct charset.
 *
 * This may happen if the two ZipFile instances have identical
 * ZipFile.Source.Key, thus mapping to the same Source instance.
 *
 * @throws IOException if an unexpected I/O error occurs
 */
@Test
public void differentNonUTF8Charsets() throws IOException {
    // Using ISO-8859-15 and ISO-8859-1 here, since they are similar
    // encodings with different characters for some bytes

    // The byte 0xA4 is "Euro sign" in 8859-15, "Currency sign (generic)" in 8859-1
    String euroSign = "\u20AC";
    String currencySign = "\u00A4";

    // Create a ZIP, encoded in ISO-8859-15
    byte[] zip = createZIP("ISO-8859-15", euroSign, euroSign);
    Path f = Path.of("zf-non-utf.zip");
    Files.write(f, zip);

    // Open a ZipFile instance using (incorrect) encoding ISO-8859-1
    // Then open a ZipFile instance using (correct) encoding ISO-8859-15
    try (ZipFile incorrect = new ZipFile(f.toFile(), StandardCharsets.ISO_8859_1);
         ZipFile correct = new ZipFile(f.toFile(), Charset.forName("ISO-8859-15"))) {

        // Correct encoding should resolve euro sign
        assertNotNull(correct.getEntry(euroSign));
        // Correct encoding should not resolve currency sign
        assertNull(correct.getEntry(currencySign));

        // Incorrect encoding should resolve currency sign
        assertNotNull(incorrect.getEntry(currencySign));
        // Incorrect encoding should not resolve euro sign
        assertNull(incorrect.getEntry(euroSign));
    }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik,

Before this change, two ZipFile instances opened using different (non-UTF-8) charsets would have equal keys, and thus be backed by the same Source and ZipCoder instance, whichever ZipFile constructed first would "win".
This seems like a separate bug, independent of the concurrency concerns described JDK-8347712.
For the benefit of future maintainers, I think this independent bug should be described in a separate JBS issue.

That's right - when fixing 8347712 I saw this issue with Key class and addressed it in this PR. You have a good point that having a JBS issue specifically for this will be useful. I have now created https://bugs.openjdk.org/browse/JDK-8355975 and linked that issue in this current PR. What helped was your test case example that uses ISO-8859-15 and ISO-8859-1 charsets to trigger this issue. I wasn't aware of that difference between ISO-8859-15 and ISO-8859-1 charsets. Thank you, I've now added that test in this PR.

}

int entryPos = pos + CENHDR;
final ZipCoder entryZipCoder;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to construct this lazily, then I think a comment documenting that purpose would be useful here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@eirbjo
Copy link
Contributor

eirbjo commented Mar 11, 2025

Agreed. I've officially marked this as requiring 2 reviewers.

Did you mean to do /reviewers 2 reviewer ?

@jaikiran
Copy link
Member Author

Agreed. I've officially marked this as requiring 2 reviewers.

Did you mean to do /reviewers 2 reviewer ?

I didn't know there was a difference, but reading through the skara command documentation, it appears there is. So I've now marked this PR appropriately.

@jaikiran
Copy link
Member Author

/reviewers 2 reviewer

@openjdk
Copy link

openjdk bot commented Mar 12, 2025

@jaikiran
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 2 Reviewers).

Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a pass over the test, adding some comments.

}

/**
* Creates multiple instances of java.util.zip.ZipFile with the given {@code entryNameCharset}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment could be interpreted as saying that the construction of the ZipFile instances happens up-front, on the main test thread. However, looking at the actual code, the construction of the ZipFile instance AND the enumeration happens concurrently.

Perhaps we could trim the "how" part here, and instad just say that the test verifies that concurrent construction and enumeration of java.util.zip.ZipFile instances backed by a the same ZIP file does not cause unexpected encoding-related concurrency failures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've now updated the test method comment to clarify what the test does.

// create a new instance of ZipFile and iterate over the entries
try (final ZipFile zf = new ZipFile(this.file.toFile(), this.entryNameCharset)) {
final var entries = zf.entries();
while (entries.hasMoreElements()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be good to include a call to ZipFile::getEntry here as well, since that exercises a separate code path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. The updated PR now does a ZipFile.getEntry() call too.

private final Charset entryNameCharset;
private final CountDownLatch startLatch;

private ZipEntryIteratingTask(final Path file, final Charset entryNameCharset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Charset parameter/field could be renamed to charset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

*/
@ParameterizedTest
@MethodSource("charsets")
void testMultipleZipFileInstances(final Charset entryNameCommentCharset) throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Charset parameter could be renamed to charset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jaikiran
Copy link
Member Author

/contributor add eirbjo

@openjdk
Copy link

openjdk bot commented Mar 19, 2025

@jaikiran
Contributor Eirik Bjørsnøs <eirbjo@openjdk.org> successfully added.

* for {@linkplain java.nio.charset.StandardCharsets#UTF_8 UTF-8 Charset} is thread-safe
* and can be used concurrently by multiple threads. All other {@code ZipCoder} instances
* are not thread-safe and external synchronization is required by callers, if those
* instances are to be used concurrently by multiple threads.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can trim this down to just say that the UTF-8 ZipCoder is thread safe, ZipCoder for other charsets require synchronization.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - the PR has been updated to trim this text to what you suggested.

* @param pos the ZIP entry's position in CEN
* @param fallback the fallback ZipCoder to return if the entry doesn't require UTF-8
*/
private static ZipCoder zipCoderFor(final byte[] cen, final int pos, final ZipCoder fallback) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried putting an instance method on ZipFile instead as it has access to the zip coder. That would give you better abstraction and avoid needing to introduce "fallback" as that is confusing to see here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Alan that an instance method would provide better abstraction and avoid the somewhat clumsy "fallback" parameter introduced in my previous patch.

Here's a patch exploring the instance method:

Index: src/java.base/share/classes/java/util/zip/ZipFile.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/java.base/share/classes/java/util/zip/ZipFile.java b/src/java.base/share/classes/java/util/zip/ZipFile.java
--- a/src/java.base/share/classes/java/util/zip/ZipFile.java	(revision 83ac59fcd292c21bbff4e0ac243a6bf07b4b21dc)
+++ b/src/java.base/share/classes/java/util/zip/ZipFile.java	(date 1742719028475)
@@ -202,7 +202,7 @@
         long t0 = System.nanoTime();
 
         this.zipCoder = ZipCoder.get(charset);
-        this.res = new CleanableResource(this, zipCoder, file, mode);
+        this.res = new CleanableResource(this, file, mode);
 
         PerfCounter.getZipFileOpenTime().addElapsedTimeFrom(t0);
         PerfCounter.getZipFileCount().increment();
@@ -292,7 +292,7 @@
             // Look up the name and CEN header position of the entry.
             // The resolved name may include a trailing slash.
             // See Source::getEntryPos for details.
-            EntryPos pos = res.zsrc.getEntryPos(name, true, zipCoder);
+            EntryPos pos = res.zsrc.getEntryPos(name, true, this);
             if (pos != null) {
                 entry = getZipEntry(pos.name, pos.pos);
             }
@@ -332,7 +332,7 @@
             if (Objects.equals(lastEntryName, entry.name)) {
                 pos = lastEntryPos;
             } else {
-                EntryPos entryPos = zsrc.getEntryPos(entry.name, false, zipCoder);
+                EntryPos entryPos = zsrc.getEntryPos(entry.name, false, this);
                 if (entryPos != null) {
                     pos = entryPos.pos;
                 } else {
@@ -374,26 +374,25 @@
      * <p>
      * A ZIP entry's name and comment fields may be encoded using UTF-8, in
      * which case this method returns a UTF-8 capable {@code ZipCoder}. If the
-     * entry doesn't require UTF-8, then this method returns the {@code fallback}
-     * {@code ZipCoder}.
+     * entry doesn't require UTF-8, then this method returns the
+     * {@code ZipCoder} of the ZipFile.
      *
      * @param cen the CEN
      * @param pos the ZIP entry's position in CEN
-     * @param fallback the fallback ZipCoder to return if the entry doesn't require UTF-8
      */
-    private static ZipCoder zipCoderFor(final byte[] cen, final int pos, final ZipCoder fallback) {
-        if (fallback.isUTF8()) {
-            // the fallback ZipCoder is capable of handling UTF-8,
+    private ZipCoder zipCoderFor(final byte[] cen, final int pos) {
+        if (zipCoder.isUTF8()) {
+            // the ZipCoder is capable of handling UTF-8,
             // so no need to parse the entry flags to determine if
             // the entry has UTF-8 flag.
-            return fallback;
+            return zipCoder;
         }
         if ((CENFLG(cen, pos) & USE_UTF8) != 0) {
             // entry requires a UTF-8 ZipCoder
             return ZipCoder.UTF8;
         }
         // entry doesn't require a UTF-8 ZipCoder
-        return fallback;
+        return zipCoder;
     }
 
     private static class InflaterCleanupAction implements Runnable {
@@ -594,7 +593,7 @@
     private String getEntryName(int pos) {
         byte[] cen = res.zsrc.cen;
         int nlen = CENNAM(cen, pos);
-        ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
+        ZipCoder zc = zipCoderFor(cen, pos);
         return zc.toString(cen, pos + CENHDR, nlen);
     }
 
@@ -665,7 +664,7 @@
         }
         if (clen != 0) {
             int start = pos + CENHDR + nlen + elen;
-            ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
+            ZipCoder zc = zipCoderFor(cen, pos);
             e.comment = zc.toString(cen, start, clen);
         }
         lastEntryName = e.name;
@@ -697,12 +696,11 @@
 
         Source zsrc;
 
-        CleanableResource(ZipFile zf, ZipCoder zipCoder, File file, int mode) throws IOException {
-            assert zipCoder != null : "null ZipCoder";
+        CleanableResource(ZipFile zf, File file, int mode) throws IOException {
             this.cleanable = CleanerFactory.cleaner().register(zf, this);
             this.istreams = Collections.newSetFromMap(new WeakHashMap<>());
             this.inflaterCache = new ArrayDeque<>();
-            this.zsrc = Source.get(file, (mode & OPEN_DELETE) != 0, zipCoder);
+            this.zsrc = Source.get(file, (mode & OPEN_DELETE) != 0, zf);
         }
 
         void clean() {
@@ -1492,12 +1490,12 @@
         private static final java.nio.file.FileSystem builtInFS =
                 DefaultFileSystemProvider.theFileSystem();
 
-        static Source get(File file, boolean toDelete, ZipCoder zipCoder) throws IOException {
+        static Source get(File file, boolean toDelete, ZipFile zipFile) throws IOException {
             final Key key;
             try {
                 key = new Key(file,
                         Files.readAttributes(builtInFS.getPath(file.getPath()),
-                                BasicFileAttributes.class), zipCoder.charset());
+                                BasicFileAttributes.class), zipFile.zipCoder.charset());
             } catch (InvalidPathException ipe) {
                 throw new IOException(ipe);
             }
@@ -1509,7 +1507,7 @@
                     return src;
                 }
             }
-            src = new Source(key, toDelete, zipCoder);
+            src = new Source(key, toDelete, zipFile);
 
             synchronized (files) {
                 Source prev = files.putIfAbsent(key, src);
@@ -1531,7 +1529,7 @@
             }
         }
 
-        private Source(Key key, boolean toDelete, ZipCoder zipCoder) throws IOException {
+        private Source(Key key, boolean toDelete, ZipFile zipFile) throws IOException {
             this.key = key;
             if (toDelete) {
                 if (OperatingSystem.isWindows()) {
@@ -1545,7 +1543,7 @@
                 this.zfile = new RandomAccessFile(key.file, "r");
             }
             try {
-                initCEN(-1, zipCoder);
+                initCEN(-1, zipFile);
                 byte[] buf = new byte[4];
                 readFullyAt(buf, 0, 4, 0);
                 this.startsWithLoc = (LOCSIG(buf) == LOCSIG);
@@ -1700,7 +1698,7 @@
         }
 
         // Reads ZIP file central directory.
-        private void initCEN(final int knownTotal, final ZipCoder zipCoder) throws IOException {
+        private void initCEN(final int knownTotal, final ZipFile zipFile) throws IOException {
             // Prefer locals for better performance during startup
             byte[] cen;
             if (knownTotal == -1) {
@@ -1762,13 +1760,13 @@
                     // This will only happen if the ZIP file has an incorrect
                     // ENDTOT field, which usually means it contains more than
                     // 65535 entries.
-                    initCEN(countCENHeaders(cen), zipCoder);
+                    initCEN(countCENHeaders(cen), zipFile);
                     return;
                 }
 
                 int entryPos = pos + CENHDR;
                 // the ZipCoder for any non-UTF8 entries
-                final ZipCoder entryZipCoder = zipCoderFor(cen, pos, zipCoder);
+                final ZipCoder entryZipCoder = zipFile.zipCoderFor(cen, pos);
                 // Checks the entry and adds values to entries[idx ... idx+2]
                 int nlen = checkAndAddEntry(pos, idx, entryZipCoder);
                 idx += 3;
@@ -1842,7 +1840,7 @@
          * to the specified entry name, or {@code null} if not found.
          */
         private EntryPos getEntryPos(final String name, final boolean addSlash,
-                                     final ZipCoder zipCoder) {
+                                     final ZipFile zipFile) {
             if (total == 0) {
                 return null;
             }
@@ -1861,7 +1859,7 @@
                     int noff = pos + CENHDR;
                     int nlen = CENNAM(cen, pos);
 
-                    final ZipCoder zc = zipCoderFor(cen, pos, zipCoder);
+                    final ZipCoder zc = zipFile.zipCoderFor(cen, pos);
                     // Compare the lookup name with the name encoded in the CEN
                     switch (zc.compare(name, cen, noff, nlen, addSlash)) {
                         case ZipCoder.EXACT_MATCH:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Alan,

Have you tried putting an instance method on ZipFile instead as it has access to the zip coder. That would give you better abstraction and avoid needing to introduce "fallback" as that is confusing to see here.

When I experimented with a fix for this issue, several weeks back, my initial version included a variant of what Eirik shows as a diff in his comment. I decided to shelve that in favour of what I currently have in my PR. The root of the issue is that we use the ZipCoder at two categorically different moments in the lifecycle of a ZipFile instance. Once when the ZipFile is still being constructed and once after the ZipFile instance is fully constructed.

In the constructor of the ZipFile, the code leads to the static Source class. A ZipCoder that's an instance member/method of ZipFile can be used but that would then mean that various code paths that spawn out of the Source class, while parsing the CEN, may end up operating on a ZipFile instance (while obtaining the ZipCoder) that isn't fully constructed yet. It would not be immediately obvious while reading or updating such code that the ZipFile instance may not yet have been fully constructed. So to reduce such usage I decided to pursue this alternate way where using these static methods would prevent such access to the ZipFile instance. It has it's own drawbacks like you note about the confusing method parameters. But I felt that this is better of the two approaches.

If you think we should consider the instance method approach, then let me know and I'll experiment more and see if any issues related to that can be minimized.

Maybe in some larger future work this entire code path can be redone in a manner that a ZipFile instance that isn't yet constructed, doesn't have such tight integration with the Source/cache class.

@jaikiran
Copy link
Member Author

Thank you Alan, Eirik and others for the review comments so far. There's one review comment from Eirik which I plan to respond and update this PR with. I got involved with something else and couldn't focus on this PR the past few weeks. I plan to refresh this in the coming few days and have it ready well in advance of RDP1 of 25.

@liach
Copy link
Member

liach commented Apr 25, 2025

Thanks, I think our only requests were for comment updates to ease maintenance down the road, as the original issue was caused by incorrect assumptions that ZipCoder is stateless. The code itself looks robust already.

@jaikiran
Copy link
Member Author

/issue add JDK-8355975

@openjdk
Copy link

openjdk bot commented Apr 30, 2025

@jaikiran
Adding additional issue to issue list: 8355975: ZipFile uses incorrect Charset if another instance for the same ZIP file was constructed with a different Charset.

@mlbridge
Copy link

mlbridge bot commented Apr 30, 2025

Mailing list message from Eirik Bjørsnøs on core-libs-dev:

ons. 30. apr. 2025 kl. 14:13 skrev Jaikiran Pai <jpai at openjdk.org>:

Hello Eirik,

Before this change, two ZipFile instances opened using different
(non-UTF-8) charsets would have equal keys, and thus be backed by the same
Source and ZipCoder instance, whichever ZipFile constructed first would
"win".
This seems like a separate bug, independent of the concurrency concerns
described JDK-8347712.
For the benefit of future maintainers, I think this independent bug
should be described in a separate JBS issue.

That's right - when fixing 8347712 I saw this issue with `Key` class and
addressed it in this PR. You have a good point that having a JBS issue
specifically for this will be useful. I have now created
https://bugs.openjdk.org/browse/JDK-8355975 and linked that issue in this
current PR. What helped was your test case example that uses `ISO-8859-15`
and `ISO-8859-1` charsets to trigger this issue. I wasn't aware of that
difference between `ISO-8859-15` and `ISO-8859-1` charsets. Thank you, I've
now added that test in this PR.

Thanks Jaikiran for addressing this bad other comments. I?m limited to the
mobile UI of GitHub right now, but everything looks good. Thanks for adding
the test ?

Eirik.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/core-libs-dev/attachments/20250430/45731cd0/attachment.htm>

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jai,

A few minor comments on the last set of changes. Going to make another pass through the previous changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest throwing SkippedException otherwise junit throws org.opentest4j.TestAbortedException If I understand correctly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Lance, I was under the impression that the jtreg framework would mark an aborted junit test as skipped. Now that you mentioned this, I checked locally and in its current form, jtreg reports this test as:

STARTED    ZipFileCharsetTest::testCachedZipFileSource 'testCachedZipFileSource()'
org.opentest4j.TestAbortedException: Assumption failed: skipping test since ISO-8859-15 charset isn't available
ABORTED    ZipFileCharsetTest::testCachedZipFileSource 'testCachedZipFileSource()' [37ms]

[ JUnit Containers: found 4, started 4, succeeded 4, failed 0, aborted 0, skipped 0]
[ JUnit Tests: found 1, started 1, succeeded 0, failed 0, aborted 1, skipped 0]


So it's being classified by jtreg as aborted instead of skipped.

I then took your suggestion to throw jtreg.SkippedException:

+import jtreg.SkippedException;
-import static org.junit.jupiter.api.Assumptions.assumeTrue;
 
+ * @library /test/lib
  * @run junit ZipFileCharsetTest
  */
 public class ZipFileCharsetTest {
 
         // ISO-8859-15 is not a standard charset in Java. We skip this test
         // when it is unavailable
-        assumeTrue(Charset.availableCharsets().containsKey(ISO_8859_15_NAME),
-                "skipping test since " + ISO_8859_15_NAME + " charset isn't available");
+        if (!Charset.availableCharsets().containsKey(ISO_8859_15_NAME)) {
+            throw new SkippedException("skipping test since " + ISO_8859_15_NAME
+                    + " charset isn't available");
+        }

That however makes jtreg mark this test as failed instead of skipped:

STARTED    ZipFileCharsetTest::testCachedZipFileSource 'testCachedZipFileSource()'
jtreg.SkippedException: skipping test since ISO-8859-15 charset isn't available
    at ZipFileCharsetTest.testCachedZipFileSource(ZipFileCharsetTest.java:70)
    at java.base/java.lang.reflect.Method.invoke(Method.java:565)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)
    at java.base/java.util.ArrayList.forEach(ArrayList.java:1604)
FAILED     ZipFileCharsetTest::testCachedZipFileSource 'testCachedZipFileSource()' [28ms]
JavaTest Message: JUnit Platform Failure(s): 1

[ JUnit Containers: found 4, started 4, succeeded 4, failed 0, aborted 0, skipped 0]
[ JUnit Tests: found 1, started 1, succeeded 0, failed 1, aborted 0, skipped 0]

So it looks like we have an issue here with jtreg when it runs a junit test and the test throws jtreg.SkippedException. There appears to be an open issue for that (although in context of testng) https://bugs.openjdk.org/browse/CODETOOLS-7902676.

Given this, I can't think of a different way to handle this situation. Would it be OK to continue using assumeTrue(...) for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Jai,

It is the lessor of 2 evils WRT the reporting isn't it. It is unfortunate that this feature is not working. I think throwing the SkippedException is the right way to go long term but I assume this shows as a reported failure in Mach5 and in the case of assumeTrue it does not result in a failure being reported?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing the SkippedException is the right way to go long term but I assume this shows as a reported failure in Mach5 and in the case of assumeTrue it does not result in a failure being reported?

That is correct, use of assumeTrue will skip the test (which is what we want) but it will be classified as aborted instead of skipped. Throwing jtreg.SkippedException however fails that test which isn't a good thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think throwing the SkippedException is the right way to go long term but I assume this shows as a reported failure in Mach5 and in the case of assumeTrue it does not result in a failure being reported?

That is correct, use of assumeTrue will skip the test (which is what we want) but it will be classified as aborted instead of skipped. Throwing jtreg.SkippedException however fails that test which isn't a good thing.

So I think it is important that the reporting is addressed WRT SkippedException, especially in light of some of the discussions raised in previous threads for Mach5.

So in order to keep the Mach5 pipeline happy, I see the options is simply returning or use assumeTrue providing this does not create any additional reporting noise.

It might be worthwhile creating a follow on bug to address this in the test, linking to the JtREG bug so that it does not get lost/forgotten

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit

'The test here verifies' -> "The test verifies'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@jaikiran
Copy link
Member Author

I've now gone through all the review comments and have either addressed them or replied to them. So I don't have any additional changes in mind other than any additional review suggestions that may come up.

tier testing is currently in progress.

Comment on lines +1238 to +1240
entries[index] = hash;
entries[index + 1] = next;
entries[index + 2] = pos;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
entries[index] = hash;
entries[index + 1] = next;
entries[index + 2] = pos;
entries[index ] = hash;
entries[index + 1] = next;
entries[index + 2] = pos;

Aligned code is more readable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Shaojin, leaving one or more spaces before a closing ] bracket of array access contradicts the style that I've seen used often in the code. At least it's not a common style used in this area of the code in the JDK.

I think it's fine in the current form and changing it to align with subsequent lines isn't necessary.

Copy link
Contributor

@eirbjo eirbjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, I have no further input.

Reviewing this part of ZipFile reminds me that this codes begs for some refactoring. Neither the RandomAccessFile nor the CEN bytes change between different charsets so I believe there may be potential for some untangling and simplification here.

But that’s for another PR :)

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the efforts on this Jai.

The changes look good

Copy link
Member

@cl4es cl4es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

While this PR fixes the bug reported in JDK-8355975 I'm not sure how the changes in JDK-8243254 are to blame for that particular bug. This since no aspect of the charset was part of Key before. The provided test case fails in an unrelated way (unmappable character) when I try building and testing on JDK 8 and 11, though. It would be good to make sure this is not an issue on older JDKs?

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 6, 2025
@jaikiran
Copy link
Member Author

jaikiran commented May 6, 2025

Hello Claes, thank you for the review.

While this PR fixes the bug reported in JDK-8355975 I'm not sure how the changes in JDK-8243254 are to blame for that particular bug. This since no aspect of the charset was part of Key before.

You are right and I've updated the JBS issue to correct myself that 8355975 isn't the cause for this and it likely exists even in older versions. I overlooked that even though 8355975 introduced the charset to be part of the Key, the underlying issue relates to considering two ZipFile instances opened with different Charsets as having the same entry names.

The provided test case fails in an unrelated way (unmappable character) when I try building and testing on JDK 8 and 11, though. It would be good to make sure this is not an issue on older JDKs?

I will take a look at why it fails with that error on those versions and see if the test can be adjusted (outside of this PR) for those releases.

Thank you everyone for the reviews and the inputs. Alan had started reviewing this last week, so I'll wait for his review before integrating this.

@jaikiran
Copy link
Member Author

jaikiran commented May 6, 2025

Hello Eirik,

Neither the RandomAccessFile nor the CEN bytes change between different charsets so I believe there may be potential for some untangling and simplification here.
...
But that’s for another PR :)

You are right and I noticed that and a few other related things when working on this change. Given the role this ZIP/JAR parsing code plays in the JDK and various historical iterations it has gone through, doing any refactoring would be better done in independent and reasonably sized chunks if possible. I haven't however done a thorough analysis of whether it's feasible.

Copy link
Contributor

@AlanBateman AlanBateman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation changes look good and nice to get this one fixed.

I didn't spend time on the tests but I see others have reviewed in detail so you don't need me.

@jaikiran
Copy link
Member Author

Thank you everyone for the reviews. tier1, tier2 and tier3 tests continue to pass with this change against latest master branch. I'll go ahead and integrate this now.

@jaikiran
Copy link
Member Author

/integrate

@openjdk
Copy link

openjdk bot commented May 14, 2025

Going to push as commit 2c4e8d2.
Since your change was applied there have been 241 commits pushed to the master branch:

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label May 14, 2025
@openjdk openjdk bot closed this May 14, 2025
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels May 14, 2025
@openjdk
Copy link

openjdk bot commented May 14, 2025

@jaikiran Pushed as commit 2c4e8d2.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@jaikiran jaikiran deleted the 8347712 branch May 14, 2025 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

7 participants