New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reproducing the precompiled executable #2575
Comments
I think we can take this a bit further. How about building the executable on Github CI? Then the method of creating the executable is documented in the repo. It would be trivial to replicate (fork the repo and run CI). The executable itself might even be added/replaced on each release by CI. That might ease maintenance. Can anyone with experience and or real knowledge with supply chain security pitch in if this could work? (at the very least this approach seems to assume GitHub is trusted). |
Achieves three notable updates. 1) Resolves RUSTSEC-2022-0093 by updating libp2p-identity. 2) Removes 3 old rand crates via updating ed25519-dalek (a dependency of libp2p-identity). 3) Sets serde_derive to 1.0.171 via updating to time 0.3.26 which pins at up to 1.0.171. The last one is the most important. The former two are niceties. serde_derive, since 1.0.171, ships a non-reproducible binary blob in what's a complete compromise of supply chain security. This is done in order to reduce compile times, yet also for the maintainer of serde (dtolnay) to leverage serde's position as the 8th most downloaded crate to attempt to force changes to the Rust build pipeline. While dtolnay's contributions to Rust are respectable, being behind syn, quote, and proc-macro2 (the top three crates by downloads), along with thiserror, anyhow, async-trait, and more (I believe also being part of the Rust project), they have unfortunately decided to refuse to listen to the community on this issue (or even engage with counter-commentary). Given their political agenda they seem to try to be accomplishing with force, I'd go as far as to call their actions terroristic (as they're using the threat of the binary blob as justification for cargo to ship 'proper' support for binary blobs). This is arguably representative of dtolnay's past work on watt. watt was a wasm interpreter to execute a pre-compiled proc macro. This would save the compile time of proc macros, yet sandbox it so a full binary did not have to be run. Unfortunately, watt (while decreasing compile times) fails to be a valid solution to supply chain security (without massive ecosystem changes). It never implemented reproducible builds for its wasm blobs, and a malicious wasm blob could still fundamentally compromise a project. The only solution for an end user to achieve a secure pipeline would be to locally build the project, verifying the blob aligns, yet doing so would negate all advantages of the blob. dtolnay also seems to be giving up their role as a FOSS maintainer given that serde no longer works in several environments. While FOSS maintainers are not required to never implement breaking changes, the version number is still 1.0. While FOSS maintainers are not required to follow semver, releasing a very notable breaking change *without a new version number* in an ecosystem which *does follow semver*, then refusing to acknowledge bugs as bugs with their work does meet my personal definition of "not actively maintaining their existing work". Maintenance would be to fix bugs, not introduce and ignore. For now, serde > 1.0.171 has been banned. In the future, we may host a fork without the blobs (yet with the patches). It may be necessary to ban all of dtolnay's maintained crates, if they continue to force their agenda as such, yet I hope this may be resolved within the next week or so. Sources: serde-rs/serde#2538 - Binary blob discussion This includes several reports of various workflows being broken. serde-rs/serde#2538 (comment) dtolnay commenting that security should be resolved via Rust toolchain edits, not via their own work being secure. This is why I say they're trying to leverage serde in a political game. serde-rs/serde#2526 - Usage via git broken dtolnay explicitly asks the submitting user if they'd be willing to advocate for changes to Rust rather than actually fix the issue they created. This is further political arm wrestling. serde-rs/serde#2530 - Usage via Bazel broken serde-rs/serde#2575 - Unverifiable binary blob https://github.com/dtolnay/watt - dtolnay's prior work on precompilation
+1 for this. It'd be nice to also publish sigstore signatures using sigstore/gh-action-sigstore-python. |
It would be preferable to avoid using precompiled binaries altogether. |
Lets keep on subject before this issue derails and any useful comments get drowned out. If you have unrelated problems or ideas please add them to an existing issue that specifically addresses those if it exists or open a new issue. Though it might make Microsoft very happy lets not turn GitHub into a new social media platform :) |
From examining the diffoscope comparison in #2538 (comment) between my build and one inside
For number 2, probably docker is the easiest way to reconcile this? If so, I would love to accept a PR that updates build.sh to run docker (/ podman?) with some usefully pinned base image and copies the result out. |
Would you mind sharing what linux distro and linker version you used to build the existing precompilled binaries? |
$ cat /etc/issue
Ubuntu 22.04.3 LTS \n \l
$ ld --version
GNU ld (GNU Binutils for Ubuntu) 2.38
Copyright (C) 2022 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) a later version.
This program has absolutely no warranty. |
I've got marginally closer to reproducing, using the same rustc version and linker. It now has the same elf header. Relocations still seem different. Code here. Sorry it's a bit of a mess. My hypothesis is different dependency versions may change relocations, even if not code. I'll investigate more tomorrow (unless someone else solves it) |
pin a version of serde without intransparent unreproducible binary blobs Serde is [shipping a binary blob in its derive crate](serde-rs/serde#2538), which is highly unexpected and subverts user trust. To make matters worse, the binary is [not even reproducible](serde-rs/serde#2575), making the crate largely unauditable and relinquishing the security benefits of open-source software. Build times are not nearly painful enough to justify forcing users to trust binary blobs.
Since the idea of relying on GitHub CI was thrown around, let me chime in that if the binary is indeed the direction going forward, it should be the only acceptable choice. If serde wants to do this, do it right: maintain and verify the binary's provenance. Implementing SLSA for Rust builds is very easy in GitHub CI (see SLSA GitHub Generator - Provenance for Rust. It would give all users an attestation that the binary included with their There's quite a few high-profile projects that implement SLSA (e.g. Python's flask). It would be a good thing for the Rust community to have a high-profile project that also treats supply-chain vulnerabilities with a high level of importance. |
I support V0ldek's suggestion here. Normally I would simply leave a react, but I feel it's important to explain why I support it. I'm a user of If the precompiled executable was opt-in, that would be the end of that. However, assuming an opt-in function wasn't on the table (for whatever reason), the suggestion proposed in V0ldek's comment still feels far more comfortable to me. It allows dtolnay to precompile Hence, I support V0ldek's suggestion. |
I'd say that it's perfectly reasonable to trust @dtolnay, I think the third party part is the more important risk. Binary blobs are a lot like STDs in the sense that you're not just dealing with just this binary, you're dealing with all the other software that's running on the machine at compilation time. Unless you can completely trust the machine the binary was built on, you can't trust the binary. Trust in the maintainer has nothing to do with it. It's better to get hacked because my machine was running insecure software, rather than getting hacked because someone else's machine was. If an opt-in approach to it isn't possible, a verifiably built binary from the repo's CI is a significant improvement from the existing model |
To be fair, that would make @dtolnay s credentials the potential weak spot, if those are compromised, that person has a direct attack vector. Just because his accounts commits and signs a check does not mean it's literally him. I would also, if absolutely necessary, not have that assumption as the point of failure in hundreds of projects, including the compiler itself. Are there mechanics to have a separately signed binary stored with an audited service, whose authenticity can be checked before I update my crates? That should be a reasonable minimal base for requiring users to download a binary based on good faith. |
I do trust dtolnay. But I shouldn't need to have faith that dtolnay is doing everything right. That's why I'm against the precompiled executable stuff - and for the suggestion V0ldek made. If I can verify an executable I downloaded against one that was compiled by a source I trust (i.e. Github Actions, or one compiled by my own computer), I will feel a lot more comfortable using that executable. But right now, we aren't able to verify that dtolnay isn't tampering with the executable in ways not exposed to the github repository. |
To clarify, if a GH CI-based path is taken, how feasible would that to be locally reproduce? I understand how to locally reproduce Docker-based solutions, yet haven't tried locally setting up a GH CI runner. |
Part of the attestation should include information on how it was generated, package versions, environment etc, but Docker-in-GHA is also an option to make that easier.
…On 2023-08-21 07:57 Luke Parker wrote:
To clarify, if a GH CI-based path is taken, how feasible would that to be locally reproduce? I understand how to locally reproduce Docker-based solutions, yet haven't tried locally setting up a GH CI runner.
—
Reply to this email directly, view it on GitHub <#2575 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABJB7H5U4TKJVIFPUXF2ODXWKB6HANCNFSM6AAAAAA3VRLAWE>.
You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>
|
SLSA provenance would give enough information to fully reproduce the build – GH CI can be run locally. That's in theory. In practice it would be much easier for people to reproduce if it was a Docker-based build (most people interested most likely have Docker, or would be able to set it up quickly enough). SLSA has a generator for Docker-based builds as well. As for some of the other discussion, I think it's good to grasp the benefit that SLSA lvl. 3 provenance would give – anyone would be able to independently check that the binary used during build of
The source code used for the build (and the pipeline logic) is auditable and reproducible. Then, you can also automatically verify that the bin included in the crate is actually the result of that specific build. Provenance is signed by GH, so to break all this the attacker would need to compromise both the authors' creds to crates.io, and hosted GH runners; and at that point we can all pack and go home anyway, the binary isn't even necessary for such a powerful attacker model. |
If in practice a Docker-based solution would be "much easier", that sounds like the goal to have. |
This issue may no longer be relevant since https://github.com/serde-rs/serde/releases/tag/v1.0.184 removed the pre-compiled binaries. |
I intentionally kept this open because some people are likely still interested in ascertaining exactly what got run in their environment over the past 4 weeks, for forensic purposes. |
The Docker / SLSA discussion is no longer going to be relevant, but #2575 (comment) + #2575 (comment) are still important if anyone is trying to reproduce the old builds. |
* dalek 4.0 * cargo update Moves to a version of Substrate which uses curve25519-dalek 4.0 (not a rc). Doesn't yet update the repo to curve25519-dalek 4.0 (as a branch does) due to the official schnorrkel using a conflicting curve25519-dalek. This would prevent installation of frost-schnorrkel without a patch. * use half-aggregation for tm messages * fmt * fix pr comments * cargo update Achieves three notable updates. 1) Resolves RUSTSEC-2022-0093 by updating libp2p-identity. 2) Removes 3 old rand crates via updating ed25519-dalek (a dependency of libp2p-identity). 3) Sets serde_derive to 1.0.171 via updating to time 0.3.26 which pins at up to 1.0.171. The last one is the most important. The former two are niceties. serde_derive, since 1.0.171, ships a non-reproducible binary blob in what's a complete compromise of supply chain security. This is done in order to reduce compile times, yet also for the maintainer of serde (dtolnay) to leverage serde's position as the 8th most downloaded crate to attempt to force changes to the Rust build pipeline. While dtolnay's contributions to Rust are respectable, being behind syn, quote, and proc-macro2 (the top three crates by downloads), along with thiserror, anyhow, async-trait, and more (I believe also being part of the Rust project), they have unfortunately decided to refuse to listen to the community on this issue (or even engage with counter-commentary). Given their political agenda they seem to try to be accomplishing with force, I'd go as far as to call their actions terroristic (as they're using the threat of the binary blob as justification for cargo to ship 'proper' support for binary blobs). This is arguably representative of dtolnay's past work on watt. watt was a wasm interpreter to execute a pre-compiled proc macro. This would save the compile time of proc macros, yet sandbox it so a full binary did not have to be run. Unfortunately, watt (while decreasing compile times) fails to be a valid solution to supply chain security (without massive ecosystem changes). It never implemented reproducible builds for its wasm blobs, and a malicious wasm blob could still fundamentally compromise a project. The only solution for an end user to achieve a secure pipeline would be to locally build the project, verifying the blob aligns, yet doing so would negate all advantages of the blob. dtolnay also seems to be giving up their role as a FOSS maintainer given that serde no longer works in several environments. While FOSS maintainers are not required to never implement breaking changes, the version number is still 1.0. While FOSS maintainers are not required to follow semver, releasing a very notable breaking change *without a new version number* in an ecosystem which *does follow semver*, then refusing to acknowledge bugs as bugs with their work does meet my personal definition of "not actively maintaining their existing work". Maintenance would be to fix bugs, not introduce and ignore. For now, serde > 1.0.171 has been banned. In the future, we may host a fork without the blobs (yet with the patches). It may be necessary to ban all of dtolnay's maintained crates, if they continue to force their agenda as such, yet I hope this may be resolved within the next week or so. Sources: serde-rs/serde#2538 - Binary blob discussion This includes several reports of various workflows being broken. serde-rs/serde#2538 (comment) dtolnay commenting that security should be resolved via Rust toolchain edits, not via their own work being secure. This is why I say they're trying to leverage serde in a political game. serde-rs/serde#2526 - Usage via git broken dtolnay explicitly asks the submitting user if they'd be willing to advocate for changes to Rust rather than actually fix the issue they created. This is further political arm wrestling. serde-rs/serde#2530 - Usage via Bazel broken serde-rs/serde#2575 - Unverifiable binary blob https://github.com/dtolnay/watt - dtolnay's prior work on precompilation * add Rs() api to SchnorrAggregate * Correct serai-processor-tests to dalek 4 * fmt + deny * Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com> * Add DSTs to Tributary TX sig_hash functions Prevents conflicts with other systems/other parts of the Tributary. --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
pin a version of serde without intransparent unreproducible binary blobs Serde is [shipping a binary blob in its derive crate](serde-rs/serde#2538), which is highly unexpected and subverts user trust. To make matters worse, the binary is [not even reproducible](serde-rs/serde#2575), making the crate largely unauditable and relinquishing the security benefits of open-source software. Build times are not nearly painful enough to justify forcing users to trust binary blobs.
Just another throw-in: If reproducibility is a concern, it would be possible to provide a Nix build script for the binary. Provided that one has any recent version of nix, that should take out almost every variability out of the build. |
Am I correct with the assumption, that the pre-compiled binary is gone and will stay gone and this issue only exists for documentation purposes now? |
@faulesocke , yes, this issue only exists for anyone who is worried that the binary that they were using could have been nefarious. I believe there is an extremely small number of people for whom that is an ongoing concern , given the binary behaved as it should have, had all the expected symbols, and the author is respected and is not compromised.
According to the release :
So it should come back in the future, but only when that approach is blessed & supported by the Rust team. |
@dtolnay Were the binaries fully reproduced? If not, why close the issue? As you said, it's still useful to know for the ~month that they were present. |
It seems you used a nightly version of rustc (rustc version 1.73.0-nightly (f3623871c 2023-08-06)), but which linker version and which musl version? And what
Cargo.lock
was used? How can I reproduce a bit for bit identical executable?https://reproducible-builds.org/
The text was updated successfully, but these errors were encountered: