Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for SOURCE_DATE_EPOCH in sdist. #2133

Open
Carreau opened this issue May 24, 2020 · 3 comments
Open

support for SOURCE_DATE_EPOCH in sdist. #2133

Carreau opened this issue May 24, 2020 · 3 comments

Comments

@Carreau
Copy link
Contributor

Carreau commented May 24, 2020

SOURCE_DATE_EPOCH is useful for reproducible build, when set, no timestamp should be greater than this value.

It seem that setuptools sdist does not support SOURCE_DATE_EPOCH, I've traced it to the following:

sdit inherit from Commands, which leads to these successives calls.

Lib/distutils/cmd.py:Command.make_archive
Lib/distutils/archive_util.py:make_archive
Lib/distutils/archive_util.py:ARCHIVE_FORMATS
Lib/distutils/archive_util.py:make_tarball

Make tarball seem to be the right place to monkeypatch to look for SOURCE_DATE_EPOCH as it itself can pass a filter to tarfile.add(), which will ensure the mtime is bounded (it already pass a filter to set uid/gid).

With this most sdist (except tgz) are reproducibles. TGZ has this last problem that GzipFile adds time.time() in the header and that's a bit harder to patch.

Carreau added a commit to Carreau/setuptools that referenced this issue May 24, 2020
This pulls just enough of distutils' and modify the make_tarball
function in order to respect SOURCE_DATE_EPOCH; this will ensure that
_when set_ no timestamp in the final archive is greater than timestamp.

This allows (but is not always sufficient), to make bytes for bytes
reproducible build for example:

 - This does not work with `gztar`, and zip does embed a timestamp in
 the header which currently is `time.time()` in the standard library.

 - if some fields passed to setup.py have on determinstic ordering (for
 example using sets for dependencies).

 Partial work toward pypa#2133, with this I was able to make two bytes-identical
 sdist of IPython.
Carreau added a commit to Carreau/setuptools that referenced this issue May 25, 2020
This pulls just enough of distutils' and modify the make_tarball
function in order to respect SOURCE_DATE_EPOCH; this will ensure that
_when set_ no timestamp in the final archive is greater than timestamp.

This allows (but is not always sufficient), to make bytes for bytes
reproducible build for example:

 - This does not work with `gztar`, and zip does embed a timestamp in
 the header which currently is `time.time()` in the standard library.

 - if some fields passed to setup.py have on determinstic ordering (for
 example using sets for dependencies).

 Partial work toward pypa#2133, with this I was able to make two bytes-identical
 sdist of IPython.

You will see three types of modifications:

 - Referring explicitly to some of distutils namespace in a couple of
 places, to avoid duplicating more code. Note that despite some names
 _not_ changing as the name resolution is with respect to current
 module, unchanged functions will now use our modified version.

 - overwrite `make_archive` in sdist to use our patched version of the
 functions in archive_utils.

 - update make_tarball to look for SOURCE_DATE_EPOCH in environment and
 setup a filter to modify mtime while taring.
@joshuagl
Copy link

joshuagl commented Feb 9, 2021

There's some excellent work towards this started in #2136, thanks @Carreau! Are you planning to pick this up? If not, perhaps I could help finish up this work?

We would like to be able to produce reproducible sdists for python-tuf. (Curious readers can see: theupdateframework/python-tuf#1269)

@Carreau
Copy link
Contributor Author

Carreau commented Feb 9, 2021

Are you planning to pick this up? If not, perhaps I could help finish up this work?

At some point; but I don't have much time these days; feel free to take over.

@tiran
Copy link
Contributor

tiran commented Mar 18, 2021

I'm interested in reproducible sdists, too. Reproducible artifacts make it much easier to verify the provenance of code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants