Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8276766: Enable jar and jmod to produce deterministic timestamped content #6481

Closed

Conversation

andrew-m-leonard
Copy link

@andrew-m-leonard andrew-m-leonard commented Nov 19, 2021

Add a new --source-date (epoch seconds) option to jar and jmod to allow specification of time to use for created/updated jar/jmod entries. This then allows the ability to make the content deterministic.

Signed-off-by: Andrew Leonard anleonar@redhat.com


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed
  • Change requires a CSR request to be approved

Issues

  • JDK-8276766: Enable jar and jmod to produce deterministic timestamped content
  • JDK-8277755: Enable jar and jmod to produce deterministic timestamped content (CSR)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6481/head:pull/6481
$ git checkout pull/6481

Update a local copy of the PR:
$ git checkout pull/6481
$ git pull https://git.openjdk.java.net/jdk pull/6481/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6481

View PR using the GUI difftool:
$ git pr show -t 6481

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6481.diff

…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@andrew-m-leonard andrew-m-leonard marked this pull request as draft Nov 19, 2021
@andrew-m-leonard andrew-m-leonard marked this pull request as ready for review Nov 19, 2021
@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Nov 19, 2021

👋 Welcome back aleonard! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Nov 19, 2021

@andrew-m-leonard The following labels will be automatically applied to this pull request:

  • compiler
  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added core-libs compiler labels Nov 19, 2021
…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@openjdk openjdk bot added the rfr label Nov 19, 2021
@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Nov 19, 2021

/csr

@openjdk openjdk bot added the csr label Nov 19, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Nov 19, 2021

@andrew-m-leonard has indicated that a compatibility and specification (CSR) request is needed for this pull request.
@andrew-m-leonard please create a CSR request for issue JDK-8276766. This pull request cannot be integrated until the CSR request is approved.

@jgneff
Copy link
Member

@jgneff jgneff commented Nov 22, 2021

Thank you for this timely pull request, Andrew! I need this pull request and also #6395 to enable reproducible builds in JavaFX. I drove myself crazy this weekend with time zones, and if I understand your proposed changes correctly, it looks as if you're hitting the same problems as I did:

  1. The SOURCE_DATE_EPOCH environment variable is defined as the number of seconds since the epoch of 1970-01-01T00:00:00Z, but the new command option is defined as the number of milliseconds. That makes it difficult to set --source-date=$SOURCE_DATE_EPOCH on the command line.

  2. Calling the method ZipEntry.setTime(long) will not allow for reproducible builds when the builds run in different time zones.

For the second problem, run the included Java Time program as shown below:

$ javac Time.java 
$ echo $SOURCE_DATE_EPOCH
1637085342
$ date --date="@$SOURCE_DATE_EPOCH"
Tue 16 Nov 2021 09:55:42 AM PST
$ java Time
Build timestamp = 2021-11-16T17:55:42Z
$ for f in *.zip; do zipinfo -v $f | grep -e Archive -e modified; done
Archive:  FailsInNome.zip
  file last modified on (DOS date/time):          2021 Nov 16 08:55:42
Archive:  FailsInRome.zip
  file last modified on (DOS date/time):          2021 Nov 16 18:55:42
Archive:  WorksInNome.zip
  file last modified on (DOS date/time):          2021 Nov 16 17:55:42
Archive:  WorksInRome.zip
  file last modified on (DOS date/time):          2021 Nov 16 17:55:42
import java.io.FileOutputStream;
import java.io.IOException;
import java.time.Instant;
import java.time.LocalDateTime;
import java.time.ZoneOffset;
import java.time.temporal.ChronoUnit;
import java.util.TimeZone;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

public class Time {

    static void writeZipFile(String name, ZipEntry entry) throws IOException {
        var output = new ZipOutputStream(new FileOutputStream(name));
        output.putNextEntry(entry);
        output.closeEntry();
        output.close();
    }

    public static void main(String[] args) throws IOException {
        var instant = Instant.now().truncatedTo(ChronoUnit.SECONDS);
        var sourceDateEpoch = System.getenv("SOURCE_DATE_EPOCH");
        if (sourceDateEpoch != null) {
            long seconds = Long.parseLong(sourceDateEpoch);
            instant = Instant.ofEpochSecond(seconds);
        }
        System.out.println("Build timestamp = " + instant);

        var entry = new ZipEntry("Entry");

        long newTime = 1000 * instant.getEpochSecond();
        TimeZone.setDefault(TimeZone.getTimeZone("America/Nome"));
        entry.setTime(newTime);
        writeZipFile("FailsInNome.zip", entry);
        TimeZone.setDefault(TimeZone.getTimeZone("Europe/Rome"));
        entry.setTime(newTime);
        writeZipFile("FailsInRome.zip", entry);

        var dosTime = LocalDateTime.ofInstant(instant, ZoneOffset.UTC);
        TimeZone.setDefault(TimeZone.getTimeZone("America/Nome"));
        entry.setTimeLocal(dosTime);
        writeZipFile("WorksInNome.zip", entry);
        TimeZone.setDefault(TimeZone.getTimeZone("Europe/Rome"));
        entry.setTimeLocal(dosTime);
        writeZipFile("WorksInRome.zip", entry);
    }
}

jgneff added a commit to jgneff/jfx that referenced this issue Nov 22, 2021
Create the build timestamp as a zoned date and time in UTC instead
of a local date and time. If SOURCE_DATE_EPOCH is defined, set the
last modification time of ZIP and JAR entries to the local date and
time in UTC of the instant defined by SOURCE_DATE_EPOCH.

Add a comment as a reminder to make JMOD files deterministic when
the following enhancements are complete:

* Enable deterministic file content ordering for Jar and Jmod
  https://bugs.openjdk.java.net/browse/JDK-8276764
  openjdk/jdk#6395

* Enable jar and jmod to produce deterministic timestamped content
  https://bugs.openjdk.java.net/browse/JDK-8276766
  openjdk/jdk#6481
@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Nov 23, 2021

@jgneff Hi John, thanks for the comment, I hadn't realized that aspect, but sort of obvious when you look at the ZIP spec for dostime, which has no timezone info.
So with this in mind the --source-date= option I am looking to add for jar/jmod, will need to state the timestamp is an "Epoch timestamp", ie. seconds since Jan1 1970 in UTC, and that as such the jar/jmod dostime will represent UTC "local time", akin to the EPOCH_SOURCE_DATE spec: https://reproducible-builds.org/specs/source-date-epoch/
I will update my PR to to use this.
thanks

…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@openjdk openjdk bot removed the rfr label Nov 23, 2021
…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@openjdk openjdk bot added the rfr label Nov 23, 2021
…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
Copy link
Member

@magicus magicus left a comment

Looks good to me, but you need more reviewers

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Thank you for your efforts here.

I just happened to notice that you created a CSR, which is great, thank you.

Please add an explanation as to why the compatibility risk is minimal as that field is currently empty to the CSR.

The current version of the PR looks reasonable. Please see a few additional comments below

test/jdk/tools/jar/JarEntryTime.java Outdated Show resolved Hide resolved
src/jdk.jartool/share/man/jar.1 Outdated Show resolved Hide resolved
src/jdk.jartool/share/classes/sun/tools/jar/Main.java Outdated Show resolved Hide resolved
@mbreinhold
Copy link
Member

@mbreinhold mbreinhold commented Nov 24, 2021

A user who’s not familiar with the lingo of reproducible builds will be mystified by an option named --source-date. A user who just wants to set the timestamp of new entries won’t be looking for an option whose name includes “source.”

Please consider providing a more general option, say --date, which takes an ISO 8601 date/time string. That would solve the general problem while also satisfying the requirement for reproducible builds. In the build it’s easy enough to convert the seconds-since epoch value of SOURCE_DATE_EPOCH to an ISO 8601 string (date -d @$SOURCE_DATE_EPOCH --iso-8601=seconds).

@jgneff
Copy link
Member

@jgneff jgneff commented Nov 25, 2021

I just tested this pull request by building my JavaFX pull request on six Linux architectures, and it works beautifully!

First I built this pull request branch of OpenJDK for Linux on the following hardware platforms: amd64 (x86_64), arm64 (aarch64), armhf (armv7l), i386 (i636), ppc64el (ppc64le), and s390x (s390x).

Then I added three lines of Groovy code to the JavaFX build.gradle file where it invokes the jmod tool:

    if (sourceDateEpoch != null) {
        args("--source-date", sourceDateEpoch)
    }

Then I built JavaFX twice for each of the architectures, with each build on a clean, isolated container created from scratch. The result is 6 architectures × 4,360 files = 26,160 identical files between the two sets of builds. The JMOD archives were the last missing piece for JavaFX, so thank you, Andrew, for completing the puzzle.

By the way, my builds of OpenJDK did not include your jarjmodorder branch, yet the order for the entries in the JMOD archives are the same. In fact, I checked my old builds and the entry order has always been the same between builds for the JMOD archives. Only the entry timestamps were different. Is that expected? I mean, although the order is not deterministic, is it expected that they would end up in the same order when the archives are built in the same environment by the same build process?

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Nov 25, 2021

I just tested this pull request by building my JavaFX pull request on six Linux architectures, and it works beautifully!

First I built this pull request branch of OpenJDK for Linux on the following hardware platforms: amd64 (x86_64), arm64 (aarch64), armhf (armv7l), i386 (i636), ppc64el (ppc64le), and s390x (s390x).

Then I added three lines of Groovy code to the JavaFX build.gradle file where it invokes the jmod tool:

    if (sourceDateEpoch != null) {
        args("--source-date", sourceDateEpoch)
    }

Then I built JavaFX twice for each of the architectures, with each build on a clean, isolated container created from scratch. The result is 6 architectures × 4,360 files = 26,160 identical files between the two sets of builds. The JMOD archives were the last missing piece for JavaFX, so thank you, Andrew, for completing the puzzle.

By the way, my builds of OpenJDK did not include your jarjmodorder branch, yet the order for the entries in the JMOD archives are the same. In fact, I checked my old builds and the entry order has always been the same between builds for the JMOD archives. Only the entry timestamps were different. Is that expected? I mean, although the order is not deterministic, is it expected that they would end up in the same order when the archives are built in the same environment by the same build process?

@jgneff thank you for testing, that's great news
on the "ordering" issue, it depends on your OS architecture among probably other things... for example try this:

  • Build 1: on a Ubuntu 20.04 running on an Intel processor
  • Build 2: on a Ubuntu 20.04 running on an AMD processor
    I suspect you will find the order will differ...

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Nov 25, 2021

A user who’s not familiar with the lingo of reproducible builds will be mystified by an option named --source-date. A user who just wants to set the timestamp of new entries won’t be looking for an option whose name includes “source.”

Please consider providing a more general option, say --date, which takes an ISO 8601 date/time string. That would solve the general problem while also satisfying the requirement for reproducible builds. In the build it’s easy enough to convert the seconds-since epoch value of SOURCE_DATE_EPOCH to an ISO 8601 string (date -d @$SOURCE_DATE_EPOCH --iso-8601=seconds).

Thanks @mbreinhold , good point, i'll update to use --date=<iso-8601-date>

@jgneff
Copy link
Member

@jgneff jgneff commented Nov 25, 2021

Please consider providing a more general option, say --date, which takes an ISO 8601 date/time string.

The jar and jmod tools will need to truncate, restrict, or convert the time zone and fractional seconds permitted by the ISO 8601 date and time format.

The only way I found to store a timestamp using the methods of java.util.zip.ZipEntry that was independent of the system's time zone was by passing a local date and time in UTC to setTimeLocal(LocalDateTime), truncated to seconds.

You can store a zoned date and time indirectly (in seconds) by adding an extra extended timestamp field to each entry with setLastModifiedTime(FileTime). Unfortunately, that method also stores the "MS-DOS date and time" as the local date and time in the default time zone of the JVM (the time zone of the build machine). Furthermore, the extra field added to each entry increases the size of the archive.

The beauty of the SOURCE_DATE_EPOCH value is that it avoids any confusion in its interpretation while providing a straightforward solution to the only users ever known to need it.

…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@openjdk
Copy link

@openjdk openjdk bot commented Nov 29, 2021

@andrew-m-leonard this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout jarjmodtimestamps
git fetch https://git.openjdk.java.net/jdk master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot removed the compiler label Dec 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 10, 2021

@vicente-romero-oracle
The compiler label was successfully removed.

…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

@LanceAndersen thank you for the review comments, i've just done a new commit with them all done.. assuming all the tests run ok, i'll integrate

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Hi Andrew,

Thank you for the changes. This looks much much better. Please see additional simplification comment below.

I kicked off a Mach5 run in the meantime

test/jdk/tools/jar/SourceDateDataProvider.java Outdated Show resolved Hide resolved
…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

@LanceAndersen let me know if mach5 looks good please, then I think we're good to go..

@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Dec 10, 2021

@andrew-m-leonard if this enhancement is now intended to go into JDK 19, then you can simply integrate it into jdk mainline when it is ready. In that case, the fix version of the CSR should be adjusted to 19 to match.

If, however, you would still like to get it into JDK 18, you will need to make a late enhancement request per JEP 3, and then recreate this PR against the jdk18 stabilization repo.

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

@kevinrushforth thanks Kevin, I had totally missed that RDP1 was yesterday! i'll have some thought on what's best, cheers

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

@jgneff John, I know you have an interest in this, what is your urgency for this support? jdk-18 or 19 ?

@jgneff
Copy link
Member

@jgneff jgneff commented Dec 10, 2021

@jgneff John, I know you have an interest in this, what is your urgency for this support? jdk-18 or 19 ?

It's not urgent. I'm just being impatient. 😄

If this pull request is integrated only into JDK 19, JavaFX won't be able to support reproducible builds until OpenJFX 20 in March 2023. Java projects in general are late to the reproducible builds party. Debian, for example, builds 31,571 packages and less than three percent fail to build in a reproducible manner. Those failing packages include OpenJDK and OpenJFX. Debian plans eventually to make reproducibility a requirement, and other distributions may follow.

The only true urgency, of course, is to provide Java project owners better tools to detect the next supply chain attack on the packages they distribute.

@LanceAndersen
Copy link
Contributor

@LanceAndersen LanceAndersen commented Dec 10, 2021

@LanceAndersen let me know if mach5 looks good please, then I think we're good to go..

As soon as the Mach5 finishes, I will let you know

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

@jgneff thanks John, i'm going to raise the JEP 3 request and see where I get, as I concur with your statement:

The only true urgency, of course, is to provide Java project owners better tools to detect the next supply chain attack on the packages they distribute.

The risk is minimal, also given the extensive testing we have done.

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 10, 2021

…tent

Signed-off-by: Andrew Leonard <anleonar@redhat.com>
@kevinrushforth
Copy link
Member

@kevinrushforth kevinrushforth commented Dec 10, 2021

@AlanBateman will need to review your request.

Copy link
Contributor

@LanceAndersen LanceAndersen left a comment

Hi Andrew,

The latest Mach5 runs remain clean and the updates look good so you are good to integrate when you are ready!

test/jdk/tools/jar/ReproducibleJar.java Outdated Show resolved Hide resolved
test/jdk/tools/jar/ReproducibleJar.java Outdated Show resolved Hide resolved
test/jdk/tools/jar/ReproducibleJar.java Outdated Show resolved Hide resolved
test/jdk/tools/jar/ReproducibleJar.java Outdated Show resolved Hide resolved
@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 11, 2021

Hi Andrew,

The latest Mach5 runs remain clean and the updates look good so you are good to integrate when you are ready!

Thank you @LanceAndersen

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 11, 2021

/integrate

@openjdk
Copy link

@openjdk openjdk bot commented Dec 11, 2021

Going to push as commit db68a0c.
Since your change was applied there have been 139 commits pushed to the master branch:

  • 6eb6ec0: 8278525: Additional -Wnonnull errors happen with GCC 11
  • 81c56c7: 8278456: Define jtreg jdk_desktop test group time-based sub-tasks for use by headful testing.
  • 61736f8: Merge
  • 0602f4c: 8277621: ARM32: multiple fastdebug failures with "bad AD file" after JDK-8276162
  • 3df8dc4: 8278538: Test langtools/jdk/javadoc/tool/CheckManPageOptions.java fails after the manpage was updated
  • ed5d53a: 8273179: Update nroff pages in JDK 18 before RC
  • afd065b: 8278415: [TESTBUG] vmTestbase/nsk/stress/stack/stack018.java fails with "java.lang.Error: TEST_RFE"
  • 4f594e6: 8278381: [GCC 11] Address::make_raw() does not initialize rspec
  • 8eb453b: 8277072: ObjectStreamClass caches keep ClassLoaders alive
  • 3e0b083: 8278533: Remove some unused methods in c1_Instruction and c1_ValueMap
  • ... and 129 more: https://git.openjdk.java.net/jdk/compare/45da3aea22fd85f214e661b2c98631cb91ddb55d...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Dec 11, 2021
@openjdk openjdk bot added integrated and removed ready rfr labels Dec 11, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 11, 2021

@andrew-m-leonard Pushed as commit db68a0c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@andrew-m-leonard
Copy link
Author

@andrew-m-leonard andrew-m-leonard commented Dec 14, 2021

@LanceAndersen fyi.i've raised a docs enhancement for the closed repo "man" pages to be updated for this: https://bugs.openjdk.java.net/browse/JDK-8278764

jgneff added a commit to jgneff/openjfx that referenced this issue Dec 16, 2021
Build with the OpenJDK Snap package that includes the pull request:

8276766: Enable jar and jmod to produce deterministic timestamped
content
openjdk/jdk#6481
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core-libs integrated
9 participants