New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducible builds via Docker #799
Conversation
This is great! I think |
Benchmark for Linux-cuda
Benchmark for Linux-default
Benchmark for macOS-default
Benchmark for macOS-metal
|
It'd be great if we had a test for this somehow. I'm also interested in what instructions we need to have in docs for users on various platforms. |
Yeah I agree. I'm all open for testing suggestions. So far I came up with the following idea:
|
What if we did a manual build locally on any platform, then we checked in the resulting IMAGE_ID. Then in CI, we just run the build and compare? |
Great idea! that would work! However we must be sure that the Cargo.lock used to do the local build is the same that gets checked in CI. I can try to put together a test for this 👍 |
branches: [ main, 'release-*' ] | ||
pull_request: | ||
branches: [ main, 'release-*' ] | ||
|
||
workflow_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally I think this type of PR checker would live as job
inside of the main.yml
like: https://github.com/risc0/risc0/blob/main/.github/workflows/main.yml#L195
Any reason here to have it broken off?
clap = { version = "4.0", features = ["derive"] } | ||
const_format = "0.2" | ||
dirs = "5.0" | ||
docker-generate = "0.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL about this crate! Super useful
@@ -0,0 +1,11 @@ | |||
# To build run: docker build -f Dockerfile.release -t risczero/risc0-guest-builder:v0.17 . | |||
FROM ubuntu:20.04@sha256:3246518d9735254519e1b2ff35f95686e4a5011c90c85344c1f38df7bae9dd37 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small nit 20.04 will EOL in 2024, vs 22.04 which is is EOL (base support tier) in 2026
Should we jump to 22.04 right now to extend the lifecycle of these images?
RUN curl --proto '=https' --tlsv1.2 --retry 10 --retry-connrefused -fsSL 'https://sh.rustup.rs' | sh -s -- -y | ||
ENV PATH="/root/.cargo/bin:${PATH}" | ||
RUN cargo install cargo-risczero | ||
RUN cargo risczero install |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we update the remote toolchain, I suspect this could impact the image to change? @flaub is it possible to risczero install <VERSION>
so we could optionally lock to a rust release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the next evolution of the install command is to have it specify version.
The docker container already uses |
Benchmark for Linux-cuda
Benchmark for Linux-default
Benchmark for macOS-default de0c3baClick to hide benchmark
Benchmark for macOS-metal de0c3baClick to hide benchmark
|
My first thought it that we're using the term "reproducible build" in a non-standard way. For an example of historical context, the organization I've supported since my Docker days goes into the motivations: https://reproducible-builds.org. A reproducible build implies: identical source code -> identical binary via a deterministic compiler, a guarantee that is portable across disparate environments into perpetuity. This is important for distributing applications that can be rebuilt anywhere with guaranteed security, a primary motivation behind reproducible builds in OSS. This is not the contract we are implying with our tool. We are concerned about Image ID stability. As I understand, the Image ID ignores parts of the ELF that are unimportant to the program semantics, so binary comparison is unimportant in the zkVM security model. In our model, rustc could produce binaries who's binary hashes do not match but have the same Image ID -- that distinction might not be important in our VM security model, but that is not generalizable to meet the same bar implied by a "reproducible build" of a compiled program. So, I'd recommend renaming this feature, preferably avoiding anything that implies a deterministic build. Then document it heavily, so people aren't confused what we are/aren't guaranteeing. |
I am confused about this because we can emit a MemoryImage object reproduce-ably and deterministically using this system (ideally assuming we squash most of the entropy bugs). The ELF might not be, but you don't need a ELF, you can skip right to the memory image as your primary input on both the ZKVM and Bonsai. |
I just double checked and while the ImageID is derived from most members of the MemoryImage object, I think its possible to have the data on disk change without the ImageID change because of the If that is the case then I think my above comment is probably not accurate. |
Benchmark for Linux-cuda b24512eClick to hide benchmark
Benchmark for Linux-default b24512eClick to hide benchmark
Benchmark for macOS-default b24512eClick to hide benchmark
Benchmark for macOS-metal
|
The ImageID computation is a function that has a type like:
where:
If any of the parameters change, you get a different ImageID. The part that we're concerned with here for a "reproducible build" is just the With this approach (using docker) or any other approach we attempt, the goal is to be able to ensure that if multiple users submit the same source code, they get identical |
risc0/cargo-risczero/Cargo.toml
Outdated
reqwest = { version = "0.11", default-features = false, features = [ | ||
"blocking", | ||
"json", | ||
"rustls-tls", | ||
"gzip", | ||
] } | ||
risc0-zkvm = { workspace = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to directly depend on risc0-binfmt
here since all the types you're using from risc0-zkvm
are just re-exports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done 👍
#[ignore] // requires Docker to be installed | ||
fn test_reproducible_multiply_method() { | ||
let multiply_test = Tester { | ||
manifest_path: "examples/factors/methods/guest/Cargo.toml".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something a bit more persistent would be risc0/zkvm/methods/guest/Cargo.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not committing the Cargo.lock for that method though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should. I'm thinking all guests should have their lock files committed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok added 👍
/// Overwrites if an ELF with the same name already exists. | ||
fn build(&self) -> Result<()> { | ||
Command::new("docker") | ||
.args(["build", "--output=elfs/", "--target=binary", "."]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the output should land under target/
, so maybe target/riscv-guest/riscv32im-risc0-zkvm-elf/docker
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified 👍
.gitignore
Outdated
rust-project.json | ||
target/ | ||
tmp/ | ||
elfs/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer the ELFs to land under target/
someplace
This highlights my concern. We aren't concerned about the resulting binary on disk being identical through separate builds and we aren't testing binary hashes in our build system. In this scenario, we could have different binaries on disk produce the same ID because parts are ignored -- that would be an example of a violation of reproducible builds as I've seen understood in OSS and in PL/compiler space. For the Web3 example, the NEAR documentation linked in Slack does claim to build identical binaries: https://docs.near.org/sdk/rust/building/reproducible-builds
In the NEAR case, two slightly different on disk but identical semantically contracts would not meet their definition of reproducibility. We are claiming that some derivative of those compiled binaries + other inputs will be identical. That's a different definition, and one that requires its own terminology IMO. Or, at least, very careful explanation how/why our definition is different. |
We are also coming up with a new Operating System, which is the zkvm. What I claim is that a MemoryImage is equivalent to what an ELF or PE does in other OSes. If we could have |
In fact, I think we should also tell users that they should check that their ELF binary is identical after using this docker build. I'm also fine with doing this comparison in our CI tests. |
Benchmark for Linux-cuda
Benchmark for Linux-default ff8635aClick to hide benchmark
Benchmark for macOS-default ff8635aClick to hide benchmark
Benchmark for macOS-metal ff8635aClick to hide benchmark
|
Benchmark for Linux-cuda 60989d8Click to hide benchmark
Benchmark for Linux-default
Benchmark for macOS-default 60989d8Click to hide benchmark
Benchmark for macOS-metal
|
Benchmark for Linux-cuda 8dbd3c5Click to hide benchmark
Benchmark for Linux-default
Benchmark for macOS-default 8dbd3c5Click to hide benchmark
Benchmark for macOS-metal 8dbd3c5Click to hide benchmark
|
This PR extends the
cargo risczero
tool with a newbuild
command.It is based on a docker image built with this Dockerfile and pushed to the risczero Docker hub account.
Installation
To install the tool just run:
Example usage
# Build the factors example cargo risczero build --manifest-path examples/factors/methods/guest/Cargo.toml
This will generate an output similar to:
Packages that generate multiple ELFs are supported as well:
# Build the chess example cargo risczero build --manifest-path risc0/zkvm/methods/guest/Cargo.toml
This will generate an output similar to:
Resolves #755
Resolves #116