Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing the precompiled executable #2575

Closed
bjorn3 opened this issue Aug 18, 2023 · 24 comments
Closed

Reproducing the precompiled executable #2575

bjorn3 opened this issue Aug 18, 2023 · 24 comments

Comments

@bjorn3
Copy link

bjorn3 commented Aug 18, 2023

It seems you used a nightly version of rustc (rustc version 1.73.0-nightly (f3623871c 2023-08-06)), but which linker version and which musl version? And what Cargo.lock was used? How can I reproduce a bit for bit identical executable?

https://reproducible-builds.org/

@dvdsk
Copy link

dvdsk commented Aug 19, 2023

I think we can take this a bit further. How about building the executable on Github CI? Then the method of creating the executable is documented in the repo. It would be trivial to replicate (fork the repo and run CI). The executable itself might even be added/replaced on each release by CI. That might ease maintenance.

Can anyone with experience and or real knowledge with supply chain security pitch in if this could work?

(at the very least this approach seems to assume GitHub is trusted).

kayabaNerve added a commit to serai-dex/serai that referenced this issue Aug 19, 2023
Achieves three notable updates.

1) Resolves RUSTSEC-2022-0093 by updating libp2p-identity.
2) Removes 3 old rand crates via updating ed25519-dalek (a dependency of
libp2p-identity).
3) Sets serde_derive to 1.0.171 via updating to time 0.3.26 which pins at up to
1.0.171.

The last one is the most important. The former two are niceties.

serde_derive, since 1.0.171, ships a non-reproducible binary blob in what's a
complete compromise of supply chain security. This is done in order to reduce
compile times, yet also for the maintainer of serde (dtolnay) to leverage
serde's position as the 8th most downloaded crate to attempt to force changes
to the Rust build pipeline.

While dtolnay's contributions to Rust are respectable, being behind syn, quote,
and proc-macro2 (the top three crates by downloads), along with thiserror,
anyhow, async-trait, and more (I believe also being part of the Rust project),
they have unfortunately decided to refuse to listen to the community on this
issue (or even engage with counter-commentary). Given their political agenda
they seem to try to be accomplishing with force, I'd go as far as to call their
actions terroristic (as they're using the threat of the binary blob as
justification for cargo to ship 'proper' support for binary blobs).

This is arguably representative of dtolnay's past work on watt. watt was a wasm
interpreter to execute a pre-compiled proc macro. This would save the compile
time of proc macros, yet sandbox it so a full binary did not have to be run.

Unfortunately, watt (while decreasing compile times) fails to be a valid
solution to supply chain security (without massive ecosystem changes). It never
implemented reproducible builds for its wasm blobs, and a malicious wasm blob
could still fundamentally compromise a project. The only solution for an end
user to achieve a secure pipeline would be to locally build the project,
verifying the blob aligns, yet doing so would negate all advantages of the
blob.

dtolnay also seems to be giving up their role as a FOSS maintainer given that
serde no longer works in several environments. While FOSS maintainers are not
required to never implement breaking changes, the version number is still 1.0.
While FOSS maintainers are not required to follow semver, releasing a very
notable breaking change *without a new version number* in an ecosystem which
*does follow semver*, then refusing to acknowledge bugs as bugs with their work
does meet my personal definition of "not actively maintaining their existing
work". Maintenance would be to fix bugs, not introduce and ignore.

For now, serde > 1.0.171 has been banned. In the future, we may host a fork
without the blobs (yet with the patches). It may be necessary to ban all of
dtolnay's maintained crates, if they continue to force their agenda as such,
yet I hope this may be resolved within the next week or so.

Sources:

serde-rs/serde#2538 - Binary blob discussion

This includes several reports of various workflows being broken.

serde-rs/serde#2538 (comment)

dtolnay commenting that security should be resolved via Rust toolchain edits,
not via their own work being secure. This is why I say they're trying to
leverage serde in a political game.

serde-rs/serde#2526 - Usage via git broken

dtolnay explicitly asks the submitting user if they'd be willing to advocate
for changes to Rust rather than actually fix the issue they created. This is
further political arm wrestling.

serde-rs/serde#2530 - Usage via Bazel broken

serde-rs/serde#2575 - Unverifiable binary blob

https://github.com/dtolnay/watt - dtolnay's prior work on precompilation
@messense
Copy link

How about building the executable on Github CI?

+1 for this. It'd be nice to also publish sigstore signatures using sigstore/gh-action-sigstore-python.

@frank10gm
Copy link

It would be preferable to avoid using precompiled binaries altogether.

@dvdsk
Copy link

dvdsk commented Aug 19, 2023

It would be preferable to avoid using precompiled binaries altogether.

Lets keep on subject before this issue derails and any useful comments get drowned out. If you have unrelated problems or ideas please add them to an existing issue that specifically addresses those if it exists or open a new issue.

Though it might make Microsoft very happy lets not turn GitHub into a new social media platform :)

@dtolnay
Copy link
Member

dtolnay commented Aug 19, 2023

From examining the diffoscope comparison in #2538 (comment) between my build and one inside rust:alpine (https://pkgbuild.com/~kpcyrd/2023-08-18-serde-derive-forensics/diffoscope-serde_derive-1.0.177.html), I think the differences are:

  1. Initially, a different choice of nightly rustc, which means the standard library contains different code and the compiler contains different optimizations. I think build.sh should be updated to run cargo +nightly-YYYY-MM-DD build with a particular date, rather than latest +nightly.

  2. Different linker — this would account for the presence or absence of a GNU build-id note in the program header, leading to various 1-word offsets.

  3. Different version of the quote and syn dependencies. I checked the changes here since July 27 and I don't immediately see any that would make a difference other than in debuginfo (which is not present), so I don't know if these are a factor, but it's something that needs to be addressed anyway with a lockfile going forward.

For number 2, probably docker is the easiest way to reconcile this? If so, I would love to accept a PR that updates build.sh to run docker (/ podman?) with some usefully pinned base image and copies the result out.

@dtolnay dtolnay changed the title How to reproduce the precompiled executables? Reproducing the precompiled executable Aug 19, 2023
@aDotInTheVoid
Copy link

Would you mind sharing what linux distro and linker version you used to build the existing precompilled binaries?

@dtolnay
Copy link
Member

dtolnay commented Aug 19, 2023

$ cat /etc/issue
Ubuntu 22.04.3 LTS \n \l

$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty.

@aDotInTheVoid
Copy link

I've got marginally closer to reproducing, using the same rustc version and linker. It now has the same elf header. Relocations still seem different.

Code here. Sorry it's a bit of a mess. My hypothesis is different dependency versions may change relocations, even if not code. I'll investigate more tomorrow (unless someone else solves it)

bors added a commit to rust-lang/miri that referenced this issue Aug 20, 2023
pin a version of serde without intransparent unreproducible binary blobs

Serde is [shipping a binary blob in its derive crate](serde-rs/serde#2538), which is highly unexpected and subverts user trust. To make matters worse, the binary is [not even reproducible](serde-rs/serde#2575), making the crate largely unauditable and relinquishing the security benefits of open-source software. Build times are not nearly painful enough to justify forcing users to trust binary blobs.
@V0ldek
Copy link

V0ldek commented Aug 20, 2023

Since the idea of relying on GitHub CI was thrown around, let me chime in that if the binary is indeed the direction going forward, it should be the only acceptable choice.

If serde wants to do this, do it right: maintain and verify the binary's provenance.

Implementing SLSA for Rust builds is very easy in GitHub CI (see SLSA GitHub Generator - Provenance for Rust. It would give all users an attestation that the binary included with their serde was verifiably built by this repo's CI, from available source, without tampering.

There's quite a few high-profile projects that implement SLSA (e.g. Python's flask). It would be a good thing for the Rust community to have a high-profile project that also treats supply-chain vulnerabilities with a high level of importance.

@LikeLakers2
Copy link

LikeLakers2 commented Aug 20, 2023

I support V0ldek's suggestion here. Normally I would simply leave a react, but I feel it's important to explain why I support it.

I'm a user of serde, and frankly I'm not a big fan of the precompiled executable stuff as it currently is. The main problem (but definitely not the only problem) I have with it is that it forces me to trust in a manner that makes me (and, as I can tell, many others) uncomfortable: We have to trust that dtolnay (or even a third-party) hasn't tampered with the executable that he's packaged into serde_derive. I would much rather just build serde_derive from scratch every build than need to trust in that manner.

If the precompiled executable was opt-in, that would be the end of that. However, assuming an opt-in function wasn't on the table (for whatever reason), the suggestion proposed in V0ldek's comment still feels far more comfortable to me. It allows dtolnay to precompile serde_derive without me needing to worry about if the binary was edited before distribution.

Hence, I support V0ldek's suggestion.

@MinisculeGirraffe
Copy link

We have to trust that dtolnay (or even a third-party) hasn't tampered with the executable that he's packaged into serde_derive

I'd say that it's perfectly reasonable to trust @dtolnay, I think the third party part is the more important risk.

Binary blobs are a lot like STDs in the sense that you're not just dealing with just this binary, you're dealing with all the other software that's running on the machine at compilation time. Unless you can completely trust the machine the binary was built on, you can't trust the binary. Trust in the maintainer has nothing to do with it. It's better to get hacked because my machine was running insecure software, rather than getting hacked because someone else's machine was.

If an opt-in approach to it isn't possible, a verifiably built binary from the repo's CI is a significant improvement from the existing model

@dbischof90
Copy link

dbischof90 commented Aug 20, 2023

We have to trust that dtolnay (or even a third-party) hasn't tampered with the executable that he's packaged into serde_derive

I'd say that it's perfectly reasonable to trust @dtolnay, I think the third party part is the more important risk.

Binary blobs are a lot like STDs in the sense that you're not just dealing with just this binary, you're dealing with all the other software that's running on the machine at compilation time. Unless you can completely trust the machine the binary was built on, you can't trust the binary. Trust in the maintainer has nothing to do with it. It's better to get hacked because my machine was running insecure software, rather than getting hacked because someone else's machine was.

If an opt-in approach to it isn't possible, a verifiably built binary from the repo's CI is a significant improvement from the existing model

To be fair, that would make @dtolnay s credentials the potential weak spot, if those are compromised, that person has a direct attack vector. Just because his accounts commits and signs a check does not mean it's literally him. I would also, if absolutely necessary, not have that assumption as the point of failure in hundreds of projects, including the compiler itself.

Are there mechanics to have a separately signed binary stored with an audited service, whose authenticity can be checked before I update my crates? That should be a reasonable minimal base for requiring users to download a binary based on good faith.

@LikeLakers2
Copy link

LikeLakers2 commented Aug 20, 2023

I'd say that it's perfectly reasonable to trust @dtolnay, I think the third party part is the more important risk.

I do trust dtolnay. But I shouldn't need to have faith that dtolnay is doing everything right. That's why I'm against the precompiled executable stuff - and for the suggestion V0ldek made.

If I can verify an executable I downloaded against one that was compiled by a source I trust (i.e. Github Actions, or one compiled by my own computer), I will feel a lot more comfortable using that executable. But right now, we aren't able to verify that dtolnay isn't tampering with the executable in ways not exposed to the github repository.

@kayabaNerve
Copy link

To clarify, if a GH CI-based path is taken, how feasible would that to be locally reproduce? I understand how to locally reproduce Docker-based solutions, yet haven't tried locally setting up a GH CI runner.

@yaleman
Copy link

yaleman commented Aug 20, 2023 via email

@V0ldek
Copy link

V0ldek commented Aug 20, 2023

To clarify, if a GH CI-based path is taken, how feasible would that to be locally reproduce? I understand how to locally reproduce Docker-based solutions, yet haven't tried locally setting up a GH CI runner.

SLSA provenance would give enough information to fully reproduce the build – GH CI can be run locally.

That's in theory. In practice it would be much easier for people to reproduce if it was a Docker-based build (most people interested most likely have Docker, or would be able to set it up quickly enough).

SLSA has a generator for Docker-based builds as well.


As for some of the other discussion, I think it's good to grasp the benefit that SLSA lvl. 3 provenance would give – anyone would be able to independently check that the binary used during build of serde_derive is:

  1. produced by the CI Action in this repository,
  2. from a particular run of the Action,
  3. which is 1-1 tied to a particular commit hash.

The source code used for the build (and the pipeline logic) is auditable and reproducible. Then, you can also automatically verify that the bin included in the crate is actually the result of that specific build. Provenance is signed by GH, so to break all this the attacker would need to compromise both the authors' creds to crates.io, and hosted GH runners; and at that point we can all pack and go home anyway, the binary isn't even necessary for such a powerful attacker model.

@kayabaNerve
Copy link

If in practice a Docker-based solution would be "much easier", that sounds like the goal to have.

@jayvdb
Copy link

jayvdb commented Aug 21, 2023

This issue may no longer be relevant since https://github.com/serde-rs/serde/releases/tag/v1.0.184 removed the pre-compiled binaries.

@dtolnay
Copy link
Member

dtolnay commented Aug 21, 2023

I intentionally kept this open because some people are likely still interested in ascertaining exactly what got run in their environment over the past 4 weeks, for forensic purposes.

@dtolnay
Copy link
Member

dtolnay commented Aug 21, 2023

The Docker / SLSA discussion is no longer going to be relevant, but #2575 (comment) + #2575 (comment) are still important if anyone is trying to reproduce the old builds.

kayabaNerve added a commit to serai-dex/serai that referenced this issue Aug 21, 2023
* dalek 4.0

* cargo update

Moves to a version of Substrate which uses curve25519-dalek 4.0 (not a rc).
Doesn't yet update the repo to curve25519-dalek 4.0 (as a branch does) due
to the official schnorrkel using a conflicting curve25519-dalek. This would
prevent installation of frost-schnorrkel without a patch.

* use half-aggregation for tm messages

* fmt

* fix pr comments

* cargo update

Achieves three notable updates.

1) Resolves RUSTSEC-2022-0093 by updating libp2p-identity.
2) Removes 3 old rand crates via updating ed25519-dalek (a dependency of
libp2p-identity).
3) Sets serde_derive to 1.0.171 via updating to time 0.3.26 which pins at up to
1.0.171.

The last one is the most important. The former two are niceties.

serde_derive, since 1.0.171, ships a non-reproducible binary blob in what's a
complete compromise of supply chain security. This is done in order to reduce
compile times, yet also for the maintainer of serde (dtolnay) to leverage
serde's position as the 8th most downloaded crate to attempt to force changes
to the Rust build pipeline.

While dtolnay's contributions to Rust are respectable, being behind syn, quote,
and proc-macro2 (the top three crates by downloads), along with thiserror,
anyhow, async-trait, and more (I believe also being part of the Rust project),
they have unfortunately decided to refuse to listen to the community on this
issue (or even engage with counter-commentary). Given their political agenda
they seem to try to be accomplishing with force, I'd go as far as to call their
actions terroristic (as they're using the threat of the binary blob as
justification for cargo to ship 'proper' support for binary blobs).

This is arguably representative of dtolnay's past work on watt. watt was a wasm
interpreter to execute a pre-compiled proc macro. This would save the compile
time of proc macros, yet sandbox it so a full binary did not have to be run.

Unfortunately, watt (while decreasing compile times) fails to be a valid
solution to supply chain security (without massive ecosystem changes). It never
implemented reproducible builds for its wasm blobs, and a malicious wasm blob
could still fundamentally compromise a project. The only solution for an end
user to achieve a secure pipeline would be to locally build the project,
verifying the blob aligns, yet doing so would negate all advantages of the
blob.

dtolnay also seems to be giving up their role as a FOSS maintainer given that
serde no longer works in several environments. While FOSS maintainers are not
required to never implement breaking changes, the version number is still 1.0.
While FOSS maintainers are not required to follow semver, releasing a very
notable breaking change *without a new version number* in an ecosystem which
*does follow semver*, then refusing to acknowledge bugs as bugs with their work
does meet my personal definition of "not actively maintaining their existing
work". Maintenance would be to fix bugs, not introduce and ignore.

For now, serde > 1.0.171 has been banned. In the future, we may host a fork
without the blobs (yet with the patches). It may be necessary to ban all of
dtolnay's maintained crates, if they continue to force their agenda as such,
yet I hope this may be resolved within the next week or so.

Sources:

serde-rs/serde#2538 - Binary blob discussion

This includes several reports of various workflows being broken.

serde-rs/serde#2538 (comment)

dtolnay commenting that security should be resolved via Rust toolchain edits,
not via their own work being secure. This is why I say they're trying to
leverage serde in a political game.

serde-rs/serde#2526 - Usage via git broken

dtolnay explicitly asks the submitting user if they'd be willing to advocate
for changes to Rust rather than actually fix the issue they created. This is
further political arm wrestling.

serde-rs/serde#2530 - Usage via Bazel broken

serde-rs/serde#2575 - Unverifiable binary blob

https://github.com/dtolnay/watt - dtolnay's prior work on precompilation

* add Rs() api to  SchnorrAggregate

* Correct serai-processor-tests to dalek 4

* fmt + deny

* Slash malevolent validators  (#294)

* add slash tx

* ignore unsigned tx replays

* verify that provided evidence is valid

* fix clippy + fmt

* move application tx handling to another module

* partially handle the tendermint txs

* fix pr comments

* support unsigned app txs

* add slash target to the votes

* enforce provided, unsigned, signed tx ordering within a block

* bug fixes

* add unit test for tendermint txs

* bug fixes

* update tests for tendermint txs

* add tx ordering test

* tidy up tx ordering test

* cargo +nightly fmt

* Misc fixes from rebasing

* Finish resolving clippy

* Remove sha3 from tendermint-machine

* Resolve a DoS in SlashEvidence's read

Also moves Evidence from Vec<Message> to (Message, Option<Message>). That
should meet all requirements while being a bit safer.

* Make lazy_static a dev-depend for tributary

* Various small tweaks

One use of sort was inefficient, sorting unsigned || signed when unsigned was
already properly sorted. Given how the unsigned TXs were given a nonce of 0, an
unstable sort may swap places with an unsigned TX and a signed TX with a nonce
of 0 (leading to a faulty block).

The extra protection added here sorts signed, then concats.

* Fix Tributary tests I broke, start review on tendermint/tx.rs

* Finish reviewing everything outside tests and empty_signature

* Remove empty_signature

empty_signature led to corrupted local state histories. Unfortunately, the API
is only sane with a signature.

We now use the actual signature, which risks creating a signature over a
malicious message if we have ever have an invariant producing malicious
messages. Prior, we only signed the message after the local machine confirmed
it was okay per the local view of consensus.

This is tolerated/preferred over a corrupt state history since production of
such messages is already an invariant. TODOs are added to make handling of this
theoretical invariant further robust.

* Remove async_sequential for tokio::test

There was no competition for resources forcing them to be run sequentially.

* Modify block order test to be statistically significant without multiple runs

* Clean tests

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>

* Add DSTs to Tributary TX sig_hash functions

Prevents conflicts with other systems/other parts of the Tributary.

---------

Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
RalfJung pushed a commit to RalfJung/rust that referenced this issue Aug 21, 2023
pin a version of serde without intransparent unreproducible binary blobs

Serde is [shipping a binary blob in its derive crate](serde-rs/serde#2538), which is highly unexpected and subverts user trust. To make matters worse, the binary is [not even reproducible](serde-rs/serde#2575), making the crate largely unauditable and relinquishing the security benefits of open-source software. Build times are not nearly painful enough to justify forcing users to trust binary blobs.
@wucke13
Copy link
Contributor

wucke13 commented Aug 21, 2023

Just another throw-in: If reproducibility is a concern, it would be possible to provide a Nix build script for the binary. Provided that one has any recent version of nix, that should take out almost every variability out of the build.

@faulesocke
Copy link

Am I correct with the assumption, that the pre-compiled binary is gone and will stay gone and this issue only exists for documentation purposes now?

@jayvdb
Copy link

jayvdb commented Aug 21, 2023

@faulesocke , yes, this issue only exists for anyone who is worried that the binary that they were using could have been nefarious. I believe there is an extremely small number of people for whom that is an ongoing concern , given the binary behaved as it should have, had all the expected symbols, and the author is respected and is not compromised.

... and will stay gone ...

According to the release :

eventually we'd like to use a first-class precompiled macro if such a thing becomes supported by cargo / crates.io

So it should come back in the future, but only when that approach is blessed & supported by the Rust team.

@dtolnay dtolnay closed this as completed Aug 23, 2023
@jhpratt
Copy link

jhpratt commented Aug 24, 2023

@dtolnay Were the binaries fully reproduced? If not, why close the issue? As you said, it's still useful to know for the ~month that they were present.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests