Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make build more reproducible #182

Closed
sssoleileraaa opened this issue Jul 22, 2020 · 6 comments
Closed

Make build more reproducible #182

sssoleileraaa opened this issue Jul 22, 2020 · 6 comments

Comments

@sssoleileraaa
Copy link
Contributor

Description

Right now, to build a workstation package, we run

PKG_PATH=path/to/tarball PKG_VERSION=x.y.z make some-package

If we make the tarballs reproducible, then we could just run:

PKG_VERSION=x.y.z make some-package

Which would also allow us to skip manual tarball checksum verification and signing. We still have the python wheels left over as the remaining piece that is not reproducible but as @emkll put it earlier today, we're working our way down the chain of reproducibility. Curious what others know and think about this.

@eloquence eloquence added this to Next sprint candidates in SecureDrop Team Board Jul 23, 2020
@eloquence
Copy link
Member

@conorsch has expressed interest in doing a first investigation of tarball reproducibility during the 7/23-8/5 sprint.

FWIW, some ideas for reproducible/idempotent tarballs here: https://stackoverflow.com/a/54908072 (@rmol also mentioned using --mtime, which per that comment may not be fully sufficient).

@eloquence eloquence moved this from Next sprint candidates to SecureDrop Sprint #55 - 7/23-8/5 in SecureDrop Team Board Jul 23, 2020
@conorsch conorsch moved this from SecureDrop Sprint #55 - 7/23-8/5 to In Development in SecureDrop Team Board Jul 28, 2020
@conorsch
Copy link
Contributor

conorsch commented Jul 28, 2020

Which would also allow us to skip manual tarball checksum verification and signing.

Good point. Zooming out a bit, we've got two fundamental goals with the build logic:

  1. Be absolutely certain that the code we're shipping is the code we reviewed and tagged with the prod key.
  2. Enable anyone, SD maintainer or not, to rebuild the exact same .deb files to confirm that the binary debs we ship match what's in our open source repos.

As @creviera mentions, the tarballs (created via python setup.py sdist in the various project repos) are not yet fully reproducible, mostly due to a small bit of variation in the metadata. Fortunately, support for the SOURCE_DATE_EPOCH env var is coming to a setup.py near you: python/cpython#20331 — but that doesn't help us today. Because the tarballs aren't reproducible, we're signing them and committing them to this repository, then we're also setting SOURCE_DATE_EPOCH to a static value for the debian packages: https://github.com/freedomofpress/securedrop-debian-packaging/blob/f6879bfe320158ef8e0e83323f12733dffce3462/scripts/build-debianpackage#L88-L89

Since we already have a method for ensuring reproducible builds of the binary debian packages, we technically don't need fully reproducible tarballs in order to satisfy 2 above. So, we can remove some of the manual steps that @creviera rightly points out aren't adding much, running simply:

# Current method
PKG_PATH=path/to/tarball PKG_VERSION=x.y.z make some-package

# Potential new method
make some-package

The PKG_PATH can be optional—if a local tarball exists, fine, but otherwise, we can simply recreate one from the source code—that's the point of reproducible builds, after all! As for PKG_VERSION, that can also be made optional: if not provided, assume that we're trying to (re)build the most recent signed tag on the upstream repo. So, for the case of securedrop-client:

$ PKG_PATH=tarballs/securedrop-client-0.2.1.tar.gz PKG_VERSION=0.2.1 make securedrop-client
PKG_NAME="securedrop-client" ./scripts/build-debianpackage

[..snip..]

$ sha256sum /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb
f683a95f1afd11675bffcca8bb2e5f981de64f8d0775ca8ed34a91b90855e56f  /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb

# Store the deb for later, so we can confirm reproducible
$ mkdir /tmp/reproducible-builds-test
$ cp /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb /tmp/reproducible-builds-test/sdc1.deb
$ rm /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb
$ git checkout -
Switched to branch 'repro-without-tarball'

# Note the lack of PKG_PATH and PKG_VERSION in the next command.
# PKG_VERSION will be looked up from the repo (0.2.1 is latest),
# and PKG_PATH will be a tarball built from that verified tag.
$ make securedrop-client

[..snip..]

$ sha256sum /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb
f683a95f1afd11675bffcca8bb2e5f981de64f8d0775ca8ed34a91b90855e56f  /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb
$ cp /home/user/debbuild/packaging/securedrop-client_0.2.1+buster_all.deb /tmp/reproducible-builds-test/sdc2.deb
$ sha256sum /tmp/reproducible-builds-test/*
f683a95f1afd11675bffcca8bb2e5f981de64f8d0775ca8ed34a91b90855e56f  /tmp/reproducible-builds-test/sdc1.deb
f683a95f1afd11675bffcca8bb2e5f981de64f8d0775ca8ed34a91b90855e56f  /tmp/reproducible-builds-test/sdc2.deb
$ diffoscope /tmp/reproducible-builds-test/* && echo "SUCCESS: packages are identical"
SUCCESS: packages are identical

It'd be a good idea for us to provide an overview of what's currently reproducible and what's not in our packaging logic, since there are so many moving pieces—tarballs, wheels, and binary debian packages—with recommended next steps. For now, though, the slight win of eliminating manual steps by developers while still retaining the ability to create byte-for-byte identical packages seems worthwhile, while we monitor for broader support in upstream tooling.

@conorsch
Copy link
Contributor

the tarballs (created via python setup.py sdist in the various project repos) are not yet fully reproducible, mostly due to a small bit of variation in the metadata

It's worth pointing out that technically speaking, we could make the tarballs fully reproducible on our own, by:

  1. running python setup sdist
  2. unpacking the tarball into a dir
  3. setting TAR_{M,C,A}TIME env vars to the same value as SOURCE_DATE_EPOCH, re-tarring
  4. re-gzip with --no-name passed to gzip

Then we get .tar.gz file that's byte-for-byte reproducible, which we could have CI build and publish anywhere we like, e.g. to github release objects, or to the "securedrop-debian-packaging" repo, as we do now. Given that the Python community is hopefully going to provide first-class support of SOURCE_DATE_EPOCH, making steps 2-4 above unnecessary, I'd prefer to wait and let upstream handle it.

During standup today, @kushaldas expressed concerns about dynamically generating tarballs, rather than posting static ones as a historical artifact for later reference. A bit more discussion among the team is warranted before we make a call on how to proceed.

@eloquence
Copy link
Member

We discussed this further at today's tech mtg.

  • We don't have consensus yet on removing the tarballs/ directory from this repo, but we want to at least see if we can automate/simplify the process of populating it as much as possible.
  • @conorsch will take a stab at a workaround like the one described above to also make the tarball part of the pipeline reproducible, possibly as part of the Builds source tarballs dynamically #185 PR.
  • It's worth restating that the Debian packages are already reproducible without that step.

@eloquence eloquence moved this from In Development to Near Term - SD Workstation in SecureDrop Team Board Aug 20, 2020
@eloquence
Copy link
Member

Can we consider this issue resolved by #185?

@eloquence
Copy link
Member

Closing per above, but we'll file some follow-up issues for next steps in verifying reproducibility & automating builds.

SecureDrop Team Board automation moved this from Near Term - SD Workstation to Done Sep 8, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

3 participants