Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add information for reproducible builds to metadata #525

Closed
5 tasks
cmichi opened this issue Apr 23, 2022 · 13 comments
Closed
5 tasks

Add information for reproducible builds to metadata #525

cmichi opened this issue Apr 23, 2022 · 13 comments
Assignees
Labels
enhancement New feature or request

Comments

@cmichi
Copy link
Collaborator

cmichi commented Apr 23, 2022

Needed to enable reproducible builds for use-ink/ink#1226.

We need:

  • rust version
  • cargo contract version
  • release vs. debug build
  • wasm-opt version

ToDo

  • Emit the information to the metadata on cargo contract build
  • Add cargo contract verify [TARGET.contract] which reads the information from the metadata, builds the current workspace according to it and compares the resulting output.contract file against the TARGET
  • Check that no environment variables are passed through from the user to rustc
  • Create follow-up issue for Solang to do the same.
  • Create follow-up issue for ask! to do the same.
@cmichi cmichi added the enhancement New feature or request label Apr 23, 2022
@athei athei changed the title Add rustc + cargo-contract version to Wasm blob Add information for reproducible builds to wasm custom section May 20, 2022
@cmichi
Copy link
Collaborator Author

cmichi commented May 27, 2022

We should also check for Rust compiler env variables that can influence the build process and persist them in the custom section.

@athei
Copy link
Contributor

athei commented Jun 5, 2022

Don't we control the build through cargo contract. Meaning: We don't pass through any environment variables. If we do: We should filter everything out. Otherwise we would always miss something.

@athei athei changed the title Add information for reproducible builds to wasm custom section Add information for reproducible builds to metadata Jul 7, 2022
@athei
Copy link
Contributor

athei commented Jul 7, 2022

I updated the issue so that it says that we want to emit this data into the metadata. Reasoning is that this makes the life of everyone much easier without really preventing any use case: We always want to validate metadata and wasm anyways. Having a validated wasm with wrong metadata is just as dangerous as invalidated. Additionally, the information included are partly specific to ink! and should not leak to pallet-contracts.

The verify command in his first instantiation should only do a check whether the used nightly is the same as the one specified in the contract to be verified. On mismatch it should error out. Ultimately we want it to download the correct toolchain and use it.

@HCastano
Copy link
Contributor

HCastano commented Sep 7, 2022

I haven't been able to create reproducible build artifacts while working through #696
across the different machines I have. This is gonna be an update on my findings so far.

At the moment I'm building contracts manually using cargo instead of using
cargo-contract (but matching the settings used by cargo-contract). This reduces the
possible variability in the build introduced by other steps in the cargo-contract build
pipeline (e.g wasm-opt) (thanks for that suggestion Michi).

#!/bin/bash
set -eux

# Our default profile from https://github.com/paritytech/cargo-contract/blob/master/src/workspace/profile.rs#L30-L39
echo '[profile.release]
opt-level = "z"
lto = "fat"
codegen-units = 1
overflow-checks = true
panic = "abort"' >> Cargo.toml

# Our flags from https://github.com/paritytech/cargo-contract/blob/master/src/cmd/build.rs#L277-L302
export RUSTC_BOOTSTRAP=1
export RUSTFLAGS="-C link-arg=-zstack-size=65536 -C link-arg=--import-memory -Clinker-plugin-lto"

# Try removing user/system specific paths from the build artifacts, hasn't helped
# though.
#
# From here: https://github.com/cbeuw/lotus/blob/master/cargo-build.sh
export CARGO_HOME=$HOME/.cargo
RUSTFLAGS="$RUSTFLAGS -Z remap-cwd-prefix=. --remap-path-prefix $CARGO_HOME=cargo_home"

cargo build --no-default-features --release --target=wasm32-unknown-unknown \
    -Zbuild-std -Zbuild-std-features=panic_immediate_abort

# Quick way to compare the build artifacts
sha256sum target/wasm32-unknown-unknown/release/erc20.wasm

I've got three Linux boxes that I've been using for testing (I also have some Macs, but
I'm gonna ignore those for now). Machines A and B are running Ubunutu 20.08 and 18.04
respectively, and Machine C is running Debian 5.16.

I've been able to get matching build artifacts on Machines A and B, but not on C.

They all have the same rustc, cargo, and LLVM versions.

I've found differences in the versions of clang, and cc/gcc, which according to
rust-secure-code/cargo-repro#3 could be part of the issue.

> the rustc binary is identical
> cargo build --locked is used
> the build environment is sufficiently similar, for example if two systems have a
> different version of gcc installed you're most likely going to get a different
> result. In the same way, having a different openssl version installed is most likely
> also going vary the output

(Happy to check others libraries which you think may affect the build)

The Wat files between Machine {A, B} and Machine C are pretty similar. From what I can
tell it's function naming and a little bit of function re-ordering that's different. You
can see a Gist of the Wat files here.

I think we can get reproducible builds on similar-ish setups (e.g Linux x86 vs. Linux
x86) but not different vastly different ones (Linux x86 vs. macOS x86, or macOS M1 vs.
macOS x86).

However, I'm not entirely sure what compiler settings or system libraries need to match
in order to make this happen. I'll do a bit more reading, but I think we should start
going down the Dockerfile route for now and circle back non-Dockerized builds later.

Breadcrumbs

Here's a list of resources I've used so that somebody can follow the breadcrumbs in the
future if needed.

@athei
Copy link
Contributor

athei commented Sep 8, 2022

I guess docker is the way then. But let's only use it for what it is really necessary: Making sure the same operating system environment is used. So we still emit toolchain version etc. into the metadata. But additionally we also emit which version of the docker file is used (or None when it is build without the container). So the verify command will error out if it is not run in the same container.

One problem that arises is that people will not be aware that they need to build their contract within docker to have any chance of having it verified. They then build it on whatever machine and deploy it to chain. Then it is too late. So should we maybe have a warning somewhere while building to inform people to build in docker to have a chance to verify?

Alternatively, we just ignore this problem and don't do docker: It is the block explorers job to cover the main architectures and build it on the appropriate one.

@HCastano
Copy link
Contributor

HCastano commented Sep 8, 2022

I guess docker is the way then. But let's only use it for what it is really necessary:
Making sure the same operating system environment is used. So we still emit
toolchain version etc. into the metadata. But additionally we also emit which version
of the docker file is used (or None when it is build without the container). So the
verify command will error out if it is not run in the same container.

I think the approach we should take here is to provide a "reference" environment with a
Docker image, but not necessarily force people who run cargo contract verify to use it.
This means we shouldn't add this info to the metadata (although that field in the metadata
is open to whatever, so we people could write info about the Docker environment if they
wanted to).

One problem that arises is that people will not be aware that they need to build their
contract within docker to have any chance of having it verified. They then build it on
whatever machine and deploy it to chain. Then it is too late. So should we maybe have
a warning somewhere while building to inform people to build in docker to have a chance
to verify?

I would say that's similar to the srtool workflow, in that if you want to share your contract
with people and get it verified you have to use the same underlying environment (so image).
We can mention it somewhere in the cargo-contract output too.

Alternatively, we just ignore this problem and don't do docker: It is the block explorers job
to cover the main architectures and build it on the appropriate one.

I'd tie this in with the first thing I mentioned, of just providing the reference image and people
can do as they wish after that.

@athei
Copy link
Contributor

athei commented Sep 8, 2022

I am just arguing to keep this "reference environment" as slim as possible. Only operating system. Cause it will eventually get stale and people will get angry that they need to use old compilers and stuff. So don't just emit "ref env v3" into the metadata. Cause then you force people to use its compiler version.

@cmichi
Copy link
Collaborator Author

cmichi commented Sep 8, 2022

Argh. All solutions are really not ideal.

@HCastano Thanks for the detailed writeup, that's very helpful for following along. Did you find a difference in openssl between the machines? Maybe we could still avoid Docker by persisting any third-party dependency version in the metadata.

@cmichi
Copy link
Collaborator Author

cmichi commented Sep 14, 2022

Thinking out loud here: What if we work around mismatching Wasm bytecode hashes by hashing some other representation of the contract that can be deterministically built?

Specifically: check if the LLVM IR is deterministic for builds ‒ so an earlier representation of the contract. The LLVM IR can be emitted via cargo rustc -- --emit=llvm-ir.

cargo-contract would for release builds have to hash this other representation, we would need a new metadata field (deterministic_hash or sth), and also add this hash in a custom section of the on-chain Wasm. The cargo contract verify command would then do the same build and compare the deterministic_hash against the one on-chain in the custom Wasm section.

This solution would also not be ideal, due to the increased blob size. But we would gain the ability for everyone to upload contract source code + metadata to a block explorer, no matter if the contract was build in the reference environment. My concern is that only few users will go through the additional hoop of setting up the reference environment.

@athei
Copy link
Contributor

athei commented Sep 15, 2022

Apart from the fact that this is a moonshot anyways (I don't think this byte code will keep matching) this won't work: A malicious contract author can freely choose this hash to be anything they want and can henceforth make you believe any source code will match. The only way is to verify the wasm that is actually executing.

@h4x3rotab
Copy link
Contributor

Please note that no only the environment, but also the crates dependencies can also affect the final contract. Just fixing the cargo.lock file may not be enough. It doesn't guarantee the deps in reproduce are identical to the original build.

@ascjones
Copy link
Collaborator

Related #1148

@SkymanOne
Copy link
Contributor

Closing the issue as this it resolved by #1148 and #1306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants