Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddModuleInfo does not generate reproducible archives (regression) #199

Closed
agentgt opened this issue May 19, 2023 · 13 comments · Fixed by #211
Closed

AddModuleInfo does not generate reproducible archives (regression) #199

agentgt opened this issue May 19, 2023 · 13 comments · Fixed by #211
Labels
released Issue has been released

Comments

@agentgt
Copy link

agentgt commented May 19, 2023

#185 Does not appear to be fixed for me.

The issue appears to be that the JDK NIO Filesystem Zip abstraction changes the zip file every time regardless of timestamp:

try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {

What I think needs to be used is the Plexus Archiver which I believe uses Apache Commons Compress.

@aalmiray
Copy link
Contributor

Is it really the archive generator or could it be the timestamp parser at a previous step?

@agentgt
Copy link
Author

agentgt commented May 19, 2023

Per my comments on #185 I don't think it is the timestamp.

I tried building with -Duser.timezone=UTC which made all the time fields the same in the build but it still made a different jar.

I checked the module-info.class files and they are the same.

I will try an isolated test later of using FileSystems.newFileSystem

@agentgt
Copy link
Author

agentgt commented May 19, 2023

Darn my isolated test does not show a problem:

EDIT: I can reproduce it!

First copy a regular jar made by maven and call it original.jar and put in the CWD.

Now run this test which will pass but take note of the hash.

Now run the test again and the hash will change.

It appears that the Zip filesystem will produce the same results within the same JVM launch but changes across executions.

public class ZipTest {

	@Test
	public void testName()
			throws Exception {
		String hash1 = run();
		System.out.println(hash1);
		String hash2 = run();
		System.out.println(hash2);
		assertEquals(hash1, hash2);

	}
	
	String run() throws Exception {
		var original = Path.of("original.jar");
		var outputJar = Path.of("some.jar");
		Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		Map<String, String> env = new HashMap<>();
		env.put("create", "true");
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		URI uri = URI.create("jar:" + outputJar.toUri());
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
			Path path = zipfs.getPath("module-info.txt");
			Files.write(
					path,
					clazz,
					StandardOpenOption.CREATE,
					StandardOpenOption.WRITE,
					StandardOpenOption.TRUNCATE_EXISTING);
			Files.setLastModifiedTime(path, ft);
		}
		return sha256(outputJar);
		
	}
	
	String sha256(Path path) throws @NonNull NoSuchAlgorithmException, IOException {
		var bytes = Files.readAllBytes(path);
		MessageDigest digest = MessageDigest.getInstance("SHA-256");
		byte[] hash = digest.digest(bytes);
		return HexFormat.of().formatHex(hash);
	}
}

@agentgt
Copy link
Author

agentgt commented May 19, 2023

Here is an easy way to try it.

Save the below as ZipMain.java

import java.io.IOException;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.nio.file.FileSystem;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.StandardCopyOption;
import java.nio.file.StandardOpenOption;
import java.nio.file.attribute.FileTime;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.time.Instant;
import java.util.HashMap;
import java.util.HexFormat;
import java.util.Map;


public class ZipMain {

	public static void main(
			String[] args) {
		try {
			var hash = run(Path.of(args[0]));
			System.out.println(hash);
		}
		catch (Exception e) {
			e.printStackTrace();
		}
	}
	
	static String run(Path original) throws Exception {
		var outputJar = Path.of("output.jar");
		System.out.println("Copying " + original + " to " + outputJar);
		Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		Map<String, String> env = new HashMap<>();
		env.put("create", "true");
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		URI uri = URI.create("jar:" + outputJar.toUri());
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (FileSystem zipfs = FileSystems.newFileSystem(uri, env)) {
			Path path = zipfs.getPath("module-info.txt");
			Files.write(
					path,
					clazz,
					StandardOpenOption.CREATE,
					StandardOpenOption.WRITE,
					StandardOpenOption.TRUNCATE_EXISTING);
			Files.setLastModifiedTime(path, ft);
		}
		return sha256(outputJar);
		
	}
	
	static String sha256(Path path) throws NoSuchAlgorithmException, IOException {
		var bytes = Files.readAllBytes(path);
		MessageDigest digest = MessageDigest.getInstance("SHA-256");
		byte[] hash = digest.digest(bytes);
		return HexFormat.of().formatHex(hash);
	}
}

Now run it:

java ZipMain.java some.jar

Take note of hash.

Run it again:

java ZipMain.java some.jar

Different hash.

@agentgt
Copy link
Author

agentgt commented May 19, 2023

If you use Apache Commons Compress:

Replace the run method with this one:

	String run() throws Exception {
		var original = Path.of("original.jar");
		var outputJar = Path.of("some.jar");
		//Files.copy(original, outputJar, StandardCopyOption.REPLACE_EXISTING);
		byte[] clazz = "Lets use a string".getBytes(StandardCharsets.UTF_8);
		Instant timestamp = Instant.ofEpochSecond(1671757006);
		FileTime ft = FileTime.from(timestamp);
		try (
				JarArchiveInputStream jis = new JarArchiveInputStream(Files.newInputStream(original));
				JarArchiveOutputStream jout = new JarArchiveOutputStream(
						Files.newOutputStream(
								outputJar,
								StandardOpenOption.CREATE,
								StandardOpenOption.WRITE,
								StandardOpenOption.TRUNCATE_EXISTING))) {
			ChangeSet cs = new ChangeSet();
			JarArchiveEntry entry = new JarArchiveEntry("modules-info.txt");
			entry.setLastModifiedTime(ft);
			cs.add(entry, new ByteArrayInputStream(clazz), true);
			ChangeSetPerformer performer = new ChangeSetPerformer(cs);
			performer.perform(jis, jout);
		}
		return sha256(outputJar);
		
	}

It returns the same hash across executions.

@agentgt
Copy link
Author

agentgt commented May 22, 2023

Here is my current solution at the moment that does not require a change to moditect. I have moditect generate my module-info.class with the normal add-module-info (is there a way to generate module-info.class without updating the jar?).

I then put that module-info.class somewhere and rename it to avoid issues. Then I use ant to update the jar:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-antrun-plugin</artifactId>
        <version>3.1.0</version>
        <executions>
          <execution>
            <id>add-module-info</id>
            <phase>package</phase>
            <goals>
              <goal>run</goal>
            </goals>
            <configuration>
              <target>
                <copy file="${project.build.sourceDirectory}/module-info.klass" tofile="${project.build.directory}/antrun/module-info.class" />
                <jar update="true" jarfile="${project.build.directory}/${project.artifactId}-${project.version}.jar" modificationtime="${project.build.outputTimestamp}000">
                  <fileset file="${project.build.directory}/antrun/module-info.class" />
                </jar>
              </target>
            </configuration>
          </execution>
        </executions>
      </plugin>

Ant apparently updates the Jar safely without changing the hash. It doesn't appear to be using apache commons compress but it does not use the NIO virtual filesystem.

The zip NIO virtual filesystem appears to be the problem. I'm not sure what meta data its adding as its barely a byte worth of changes according to diffoscope.

Ant's jar code be something moditect borrows or calls instead of commons compress.

I just can't believe I'm the only one experiencing this... it is a big deal because I have an annotation processor library and I absolutely want that one jar to be reproducible for security reasons (since the compiler kicks it off).

@cowtowncoder (it appears jackson is using moditect) or @gunnarmorling

Have you guys tried running:

mvn clean install 
mvn clean verify artifact:compare

https://maven.apache.org/guides/mini/guide-reproducible-builds.html

@hboutemy
Copy link
Contributor

@aalmiray today, I finally found time to dig into the jackson-databind reproducibility issue, and I used zipdetails to dig into jar files details to find where the non-reproducible bits come from.
Here is the result https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/com/fasterxml/jackson/databind/jackson-databind-2.15.2.diffoscope
It seems current code perfectly sets the usual modification time, but forgets to set access time and change time = fields that are not displayed by usual zip tools, but that is stored in zip file
I did not study how NIO can set these yet...

@aalmiray
Copy link
Contributor

@hboutemy thank you for that. I can follow up.

hboutemy added a commit to hboutemy/moditect that referenced this issue Sep 14, 2023
@hboutemy
Copy link
Contributor

Hi @aalmiray , PR #211 created, can you merge and plan the next bugfix release, please?

aalmiray pushed a commit that referenced this issue Sep 14, 2023
@aalmiray aalmiray added the released Issue has been released label Nov 5, 2023
@aalmiray
Copy link
Contributor

aalmiray commented Nov 5, 2023

🎉 This issue has been resolved in 1.1.0 (Release Notes)

@hboutemy
Copy link
Contributor

hboutemy commented Nov 16, 2023

@aalmiray I'm happy to confirm that latest Jackson 2.16.0 release is now fully reproducible, thanks to this 1.1.0 moditect release
https://github.com/jvm-repo-rebuild/reproducible-central/blob/master/content/com/fasterxml/jackson/databind/README.md (will be updated tonight)

@cowtowncoder
Copy link

Whoa! Update to latest Moditect for Jackson builds paid off.

@agentgt
Copy link
Author

agentgt commented Nov 16, 2023

Yes this is especially helpful for any projects that are annotation processors as having the module-info in an annotation project can be tricky (the processor will accidentally get loaded compiling itself).

And annotation processors should be reproducible because they get kicked off by the compiler so there are security concern there.

@hboutemy has been doing a fantastic job on reproducible builds and deserves a ton of praise for the PR and the reproducible project!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
released Issue has been released
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants