Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8303920: Avoid calling out to python in DataDescriptorSignatureMissing test #12959

Closed
wants to merge 13 commits into from

Conversation

eirbjo
Copy link
Contributor

@eirbjo eirbjo commented Mar 9, 2023

Please review this PR which brings the DataDescriptorSignatureMissing test back to life.

This test currently calls out to Python to create a test vector ZIP with a Data Descriptor without the recommended but optional signature. The Python dependency has turned out to be very brittle, so the test is currently marked with @ignore

The PR replaces Python callouts with directly creating the test vector ZIP in the test itself. We can then remove the @ignoretag and run this useful test automatically.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8303920: Avoid calling out to python in DataDescriptorSignatureMissing test (Enhancement - P4)

Reviewers

Contributors

  • Jaikiran Pai <jpai@openjdk.org>

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/12959/head:pull/12959
$ git checkout pull/12959

Update a local copy of the PR:
$ git checkout pull/12959
$ git pull https://git.openjdk.org/jdk.git pull/12959/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 12959

View PR using the GUI difftool:
$ git pr show -t 12959

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/12959.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 9, 2023

👋 Welcome back eirbjo! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 9, 2023

@eirbjo The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label Mar 9, 2023
@eirbjo eirbjo marked this pull request as ready for review March 9, 2023 20:03
@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 9, 2023
@mlbridge
Copy link

mlbridge bot commented Mar 9, 2023

* optional signature.
* - ZipInputStream cannot handle the missing signature
*
* descriptor signatures.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik, I think the summary should no longer have references to python. Since this test was introduced for https://bugs.openjdk.org/browse/JDK-8056934, perhaps we can just change the @summary of this test definition to say:

@summary Verify the ability to read zip files whose local header data descriptor is missing the optional signature

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I updated the summary according to your suggestion. I left it there to keep the history/context of the test, but I guess that's a bit detailed and is already captured in the bug.

@@ -1,5 +1,5 @@
/*
* Copyright 2012 Google, Inc. All Rights Reserved.
* Copyright 2012, 2023 Google, Inc. All Rights Reserved.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's my understanding that unless you are doing this change on behalf of Google, you shouldn't be changing that line. Instead, one has to add another line below that, of the form:

Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no affiliation to Google. I reverted the change to their copyright line and added an Oracle one.

There is very little code left from Martin here (the Zip64 note in the header, the class name and some whitespace), but let's keep it anyhow.

@jaikiran
Copy link
Member

jaikiran commented Mar 10, 2023

It's good that this test is being revived and no longer relies on an external tool/program to generate a zip which was triggering the issue in https://bugs.openjdk.org/browse/JDK-8056934.

The changes look good to me. In order to verify that without the fix in https://bugs.openjdk.org/browse/JDK-8056934, this test fails (i.e. reproduces the issue), I reverted the fix that was done in that issue (git revert 3951dda4cf06c6e61e19d3df26a792022c1701b9) and then built the JDK and ran this updated test. It runs into a NullPointerException within the test because the second entry in the zip is missing after the zip is read through the ZipInputStream. Could you add this following patch to your test:

diff --git a/test/jdk/java/util/zip/DataDescriptorSignatureMissing.java b/test/jdk/java/util/zip/DataDescriptorSignatureMissing.java
index 5efdc59de63..636cecb4851 100644
--- a/test/jdk/java/util/zip/DataDescriptorSignatureMissing.java
+++ b/test/jdk/java/util/zip/DataDescriptorSignatureMissing.java
@@ -39,6 +39,7 @@ import java.nio.charset.StandardCharsets;
 import java.util.zip.*;
 
 import static org.testng.Assert.assertEquals;
+import static org.testng.Assert.assertNotNull;
 
 public class DataDescriptorSignatureMissing {
 
@@ -55,10 +56,12 @@ public class DataDescriptorSignatureMissing {
         try (ZipInputStream in = new ZipInputStream(
                 new ByteArrayInputStream(zip))) {
             ZipEntry first = in.getNextEntry();
+            assertNotNull(first, "Zip file is unexpectedly missing first entry");
             assertEquals(first.getName(), "first");
             assertEquals(in.readAllBytes(), "first".getBytes(StandardCharsets.UTF_8));
 
             ZipEntry second = in.getNextEntry();
+            assertNotNull(second, "Zip file is unexpectedly missing second entry");
             assertEquals(second.getName(), "second");
             assertEquals(in.readAllBytes(), "second".getBytes(StandardCharsets.UTF_8));
         }

so that instead of running into a NullPointerException, the test will (rightly) reproduce and report the missing second entry?

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

/contributor add @jaikiran

@openjdk
Copy link

openjdk bot commented Mar 10, 2023

@eirbjo jaikiran was not found in the census.

Syntax: /contributor (add|remove) [@user | openjdk-user | Full Name <email@address>]. For example:

  • /contributor add @openjdk-bot
  • /contributor add duke
  • /contributor add J. Duke <duke@openjdk.org>

User names can only be used for users in the census associated with this repository. For other contributors you need to supply the full name and email address.

Copy link
Member

@jaikiran jaikiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now looks good to me. Thank you for doing these changes. I'll run this test on our CI just to be sure there isn't any obvious issues.

Before integrating, please wait for another review from Lance or others who have more knowledge of this area.

@openjdk
Copy link

openjdk bot commented Mar 10, 2023

@eirbjo This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8303920: Avoid calling out to python in DataDescriptorSignatureMissing test

Co-authored-by: Jaikiran Pai <jpai@openjdk.org>
Reviewed-by: jpai, lancea, iris

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 274 new commits pushed to the master branch:

  • fe0ccdf: 8319640: ClassicFormat::parseObject (from DateTimeFormatter) does not conform to the javadoc and may leak DateTimeException
  • 1802cb5: 8319570: Change to GCC 13.2.0 for building on Linux at Oracle
  • d992033: 8317562: [JFR] Compilation queue statistics
  • 965ae72: 8319753: Duration javadoc has "period" instead of "duration" in several places
  • 115b074: 8319944: Remove DynamicDumpSharedSpaces
  • c0507af: 8319818: Address GCC 13.2.0 warnings (stringop-overflow and dangling-pointer)
  • 3684b4b: 8306116: Update CLDR to Version 44.0
  • 88ccd64: 8296250: Update ICU4J to Version 74.1
  • 03db828: 8319650: Improve heap dump performance with class metadata caching
  • b41b00a: 8319820: Use unnamed variables in the FFM implementation
  • ... and 264 more: https://git.openjdk.org/jdk/compare/a876beb63d5d509b80366139ae4c6abe502efe1e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@jaikiran, @LanceAndersen, @irisclark) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Mar 10, 2023
@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

/contributor add jpai

@openjdk
Copy link

openjdk bot commented Mar 10, 2023

@eirbjo
Contributor Jaikiran Pai <jpai@openjdk.org> successfully added.

@eirbjo
Copy link
Contributor Author

eirbjo commented Mar 10, 2023

This now looks good to me.

Thanks for taking time to do this thorough review and especially for running the regressing case. Much appreciated!

I'll wait for a second review before integrating.

@eirbjo
Copy link
Contributor Author

eirbjo commented Apr 14, 2023

Since we strip 4 bytes from the first entry's data descriptor, we need to account for this by reducing the second CEN header's LOC offset by 4. Similarly, the END header's CEN offset also needs adjustment.

@bridgekeeper
Copy link

bridgekeeper bot commented May 12, 2023

@eirbjo This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 9, 2023

@eirbjo This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jun 9, 2023
@openjdk openjdk bot reopened this Oct 28, 2023
@openjdk
Copy link

openjdk bot commented Oct 28, 2023

@eirbjo This pull request is now open

@eirbjo
Copy link
Contributor Author

eirbjo commented Oct 28, 2023

Reopening this PR.

Before being closed for inactivity, this PR was reviewed by @jaikiran, who requested that another reviewer with experience in this area also take a look at it before integration.

I think it would be good to have this regression test run automatic again.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for trying to move this forward. Please see my comments below.

Another question, is the zip that is generated by this test readable by other zip tools such as info-zip, Apache Common-compress, winzip?

@@ -1,5 +1,6 @@
/*
* Copyright 2012 Google, Inc. All Rights Reserved.
* Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure the copyright can be updated this way @irisclark, could you provide guidance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of updating the copyright was suggested by @jaikiran in the March 10th comment above. Would be nice to get this clarified, yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually very little left of Martin's code after my rewrite, besides whitespace, some curly braces and a couple of imports. Hardly defensible IP, but then I Am Not a Lawyer :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exchanged messages with Iris and she is comfortable with the updates to the copyright

* No way to adapt the technique in this test to get a ZIP64 zip file
* without data descriptors was found.
*
* @ignore 8303920 This test has brittle dependencies on an external working python.
* @run testng DataDescriptorSignatureMissing
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we convert this please to use junit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converted to junit as suggested.

}
/**
* Produce a ZIP file where the first entry has a signature-less data descriptor
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to show the what the internal zip representation of the LOC and CEN looks like to make it clear what a signature-less data descriptor is meant to be for future maintainers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment including some structural examples. (I personally feel it maybe ended up a bit excessive)

@eirbjo
Copy link
Contributor Author

eirbjo commented Oct 30, 2023

Another question, is the zip that is generated by this test readable by other zip tools such as info-zip, Apache Common-compress, winzip?

  • info-zip: Does not support unzipping from a zip, so uses the CEN instead of the data descriptor.
  • Apache commons-compress: Reads the signature-less ZIP just fine.
  • winzip: I do not currently have easy access to Windows, so can't test this. But I would assume it also uses the CEN when unzipping

@LanceAndersen
Copy link
Contributor

Thinking some more about this, I would like to see us keep the Zip generated by python, store it in a byte array (or equivalent) as it also validate that we can still process the zip given this was the original test and the zip is being generated by a 3rd party tool.

This would be an additional test along with your proposed enhancements

@eirbjo
Copy link
Contributor Author

eirbjo commented Nov 2, 2023

Thinking some more about this, I would like to see us keep the Zip generated by python, store it in a byte array (or equivalent) as it also validate that we can still process the zip given this was the original test and the zip is being generated by a 3rd party tool.

Lance,

I was finally able to reproduce the original issue using Python 3.4.4. (Which was a challenge to install given its archaic dependencies!)

I was able to verfy that the missing signature is the ONLY difference between the input and output files. (Except updated LOC and CEN offsets accounting for the missing bytes). Additionally, I independently removed the signature files from the input file, this produced an output file binary identical to Python's.

Given that the one and only difference introduced by the Python script is covered by the test in this PR, I'm not sure I see any additional value in adding a test with the binary test vector produced by Python. I think it will just increase our maintenance costs, without adding any real value or coverage.

If you see this differently, that's of course ok. Just let me know and I'll create the test with the encoded binary ZIP (which I have easily available now).

Waiting for your guidance, thanks :-)

@LanceAndersen
Copy link
Contributor

Thinking some more about this, I would like to see us keep the Zip generated by python, store it in a byte array (or equivalent) as it also validate that we can still process the zip given this was the original test and the zip is being generated by a 3rd party tool.

Lance,

I was finally able to reproduce the original issue using Python 3.4.4. (Which was a challenge to install given its archaic dependencies!)

I was able to verfy that the missing signature is the ONLY difference between the input and output files. (Except updated LOC and CEN offsets accounting for the missing bytes). Additionally, I independently removed the signature files from the input file, this produced an output file binary identical to Python's.

Given that the one and only difference introduced by the Python script is covered by the test in this PR, I'm not sure I see any additional value in adding a test with the binary test vector produced by Python. I think it will just increase our maintenance costs, without adding any real value or coverage.

If you see this differently, that's of course ok. Just let me know and I'll create the test with the encoded binary ZIP (which I have easily available now).

Waiting for your guidance, thanks :-)

If you verified that we are a complete match, I am good with that. Thank you for validating

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are in a good place with the test clean up. Thanks for your efforts here

I can kick of a test run internally next week or perhaps Sunday

@@ -1,5 +1,6 @@
/*
* Copyright 2012 Google, Inc. All Rights Reserved.
* Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I exchanged messages with Iris and she is comfortable with the updates to the copyright

@eirbjo
Copy link
Contributor Author

eirbjo commented Nov 3, 2023

I can kick of a test run internally next week or perhaps Sunday

Thanks for your reviews, Lance and Iris!

FWIW, the test ran fine on Github Actions, including on linux-x86 (which is 32-bit, right?):

TEST: java/util/zip/DataDescriptorSignatureMissing.java
  build: 0.081 seconds
  compile: 0.081 seconds
  junit: 0.055 seconds
TEST RESULT: Passed. Execution successful

https://github.com/eirbjo/jdk/actions/runs/6696663322/job/18196803138

@eirbjo
Copy link
Contributor Author

eirbjo commented Nov 8, 2023

I can kick of a test run internally next week or perhaps Sunday

FWIW, the test ran fine on Github Actions, including on linux-x86 (which is 32-bit, right?):

Disregard this comment, since it confuses this PR with #12991 :-)

@LanceAndersen
Copy link
Contributor

I can kick of a test run internally next week or perhaps Sunday

Thanks for your reviews, Lance and Iris!

FWIW, the test ran fine on Github Actions, including on linux-x86 (which is 32-bit, right?):

TEST: java/util/zip/DataDescriptorSignatureMissing.java
  build: 0.081 seconds
  compile: 0.081 seconds
  junit: 0.055 seconds
TEST RESULT: Passed. Execution successful

https://github.com/eirbjo/jdk/actions/runs/6696663322/job/18196803138

Have an internal mach5 run going and will let you know when it completes

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest Mach 5 run looks fine

@eirbjo
Copy link
Contributor Author

eirbjo commented Nov 9, 2023

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Nov 9, 2023
@openjdk
Copy link

openjdk bot commented Nov 9, 2023

@eirbjo
Your change (at version 4a541ff) is now ready to be sponsored by a Committer.

Copy link
Member

@jaikiran jaikiran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello Eirik, thank you for this change - it looks really good.

@jaikiran
Copy link
Member

Lance and Iris too have approved the PR and Lance has noted that the mach5 run came back fine. I'll go ahead and sponsor this now.

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 14, 2023

Going to push as commit 07eaea8.
Since your change was applied there have been 274 commits pushed to the master branch:

  • fe0ccdf: 8319640: ClassicFormat::parseObject (from DateTimeFormatter) does not conform to the javadoc and may leak DateTimeException
  • 1802cb5: 8319570: Change to GCC 13.2.0 for building on Linux at Oracle
  • d992033: 8317562: [JFR] Compilation queue statistics
  • 965ae72: 8319753: Duration javadoc has "period" instead of "duration" in several places
  • 115b074: 8319944: Remove DynamicDumpSharedSpaces
  • c0507af: 8319818: Address GCC 13.2.0 warnings (stringop-overflow and dangling-pointer)
  • 3684b4b: 8306116: Update CLDR to Version 44.0
  • 88ccd64: 8296250: Update ICU4J to Version 74.1
  • 03db828: 8319650: Improve heap dump performance with class metadata caching
  • b41b00a: 8319820: Use unnamed variables in the FFM implementation
  • ... and 264 more: https://git.openjdk.org/jdk/compare/a876beb63d5d509b80366139ae4c6abe502efe1e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Nov 14, 2023
@openjdk openjdk bot closed this Nov 14, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Nov 14, 2023
@openjdk
Copy link

openjdk bot commented Nov 14, 2023

@jaikiran @eirbjo Pushed as commit 07eaea8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs core-libs-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

4 participants