Working Group Proposal - Binary Reproducibility #54907
Ladicek
started this conversation in
Design Discussions
Replies: 2 comments 5 replies
-
|
WG board ready: WG - Binary Reproducibility |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
It's possible to prove that builds are not reproducible, but it is really hard if not impossible to prove that they are. What are the acceptance criteria? |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Objective
Ensure that Quarkus application builds are binary reproducible. That is, building the same source with the same toolchain produces bit-for-bit identical output.
The Problem
Binary reproducibility (the property that building the same source code with the same tools produces identical artifacts) is a cornerstone of supply chain security and build trust. It allows independent parties to verify that a published binary was genuinely built from its claimed source, and it simplifies debugging, caching, and compliance audits.
Quarkus performs significant build-time work: it runs deployment processors, generates bytecode via recorders or Gizmo, and assembles the final application artifact. Each of these steps is a potential source of non-determinism. When iteration order over sets, maps, or annotation metadata is not stable, the generated JARs can differ between builds, even though the inputs are identical.
Known sources of non-determinism
Several sources of non-determinism have already been identified and fixed:
hashCode()in JandexClassInfo: the absence of a properhashCode()implementation led to unpredictable behavior whenClassInfoinstances were stored in hash-based collections, propagating non-determinism into any build step that iterated over such collections.HashMaporHashSetwhere insertion order matters for bytecode generation, or used non-comparableMultiBuildItems as an input to bytecode generation.The Proposed Solution
Finish fixing remaining sources of non-determinism
Continue the work already in progress to identify and fix non-deterministic behavior in Quarkus core and extensions. This includes stabilizing bytecode generation, ensuring deterministic ordering in build items, and fixing any remaining issues in the augmentation pipeline.
Establish reproducibility verification criteria
Define what "reproducible" means in precise, testable terms for Quarkus applications. The criteria are conceptually obvious (same input -> same output), but the details matter: which artifacts are compared, how dev-services-related bytecode is handled, what constitutes "same toolchain", and how to account for legitimate differences (e.g., build timestamps in Maven metadata).
Build automated reproducibility testing into CI
Add CI infrastructure that performs reproducibility checks (building the same application multiple times and comparing the output) to catch regressions early. This is already underway with the reproducibility test mechanism introduced in #54420 and the proposed nightly CI workflow in #54895.
Produce guidelines for extension developers
Extensions contribute bytecode and resources to the final application. Extension authors need clear guidance on how to write build steps that preserve reproducibility: for example, using sorted collections, stable comparators, and deterministic code generation patterns. These guidelines will help both Quarkus core extensions and Quarkiverse extensions.
It is my personal belief that for the foreseeable future, we should be able to rely on stable iteration of
HashSets andHashMaps, even if these collections explicitly document that their iteration order is undefined, provided that the keys have deterministichashCode(). They almost always have or can have; one class that is often used as a key and doesn't have deterministichashCode()isjava.lang.Class.Definition of Done
Scope of Work
In Scope
Out of Scope (for now)
Organizing the Work
Communication
Timeline
This working group is not tied to a specific Quarkus release. The work is incremental (each fix independently improves the situation) and will continue until the definition of done is satisfied.
Existing Work
Significant progress has already been made. Notable PRs include:
And many more fixes across individual extensions.
Beta Was this translation helpful? Give feedback.
All reactions