Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect SOURCE_DATE_EPOCH when taring sdist. #2136

Closed
wants to merge 3 commits into from

Conversation

Carreau
Copy link
Contributor

@Carreau Carreau commented May 24, 2020

This pulls just enough of distutils' and modify the make_tarball
function in order to respect SOURCE_DATE_EPOCH; this will ensure that
when set no timestamp in the final archive is greater than timestamp.

This allows (but is not always sufficient), to make bytes for bytes
reproducible build for example:

  • This does not work with gztar, and zip does embed a timestamp in
    the header which currently is time.time() in the standard library.

  • if some fields passed to setup.py have on determinstic ordering (for
    example using sets for dependencies).

Partial work toward #2133, with this I was able to make two bytes-identical
sdist of IPython.

Summary of changes

pull many function as-is from distutils with the exception of make_tarball to which a couple of lines are added.

Pull Request Checklist

  • Changes have tests
  • News fragment added in changelog.d. See documentation for details

No tests yet, nor changelog, I want to make sure this is the right approach first.

@jaraco
Copy link
Member

jaraco commented May 25, 2020

This is great. Thanks for putting this together.

I would recommend one change to your approach - if you could separate the commits of (a) copying code from distutils from (b) tweaking that code, it will make it easier to later reconcile any differences that might need to be merged from the original.

As is, if I try to review this, I can't tell what functionality was copied and what was changed. If you can separate those changes in the commit history, I'm pretty sure the changes here are acceptable.

As a side note, there is a larger effort to incorporate all of distutils, which if implemented would have made this change more straightforward.

@Carreau
Copy link
Contributor Author

Carreau commented May 25, 2020

Yes, I can do this, no problem. The one issue though will be flake 8 will complain about the raw distutils code, but it should be pretty straitforward.

function copy-pasted as is from cpython  Lib/distutils/archive_utils.py
2602d97a0ae92b2d320909024e901c202b003e14 as of May 25 2020.

Add relevant import at top of file as well
This pulls just enough of distutils' and modify the make_tarball
function in order to respect SOURCE_DATE_EPOCH; this will ensure that
_when set_ no timestamp in the final archive is greater than timestamp.

This allows (but is not always sufficient), to make bytes for bytes
reproducible build for example:

 - This does not work with `gztar`, and zip does embed a timestamp in
 the header which currently is `time.time()` in the standard library.

 - if some fields passed to setup.py have on determinstic ordering (for
 example using sets for dependencies).

 Partial work toward pypa#2133, with this I was able to make two bytes-identical
 sdist of IPython.

You will see three types of modifications:

 - Referring explicitly to some of distutils namespace in a couple of
 places, to avoid duplicating more code. Note that despite some names
 _not_ changing as the name resolution is with respect to current
 module, unchanged functions will now use our modified version.

 - overwrite `make_archive` in sdist to use our patched version of the
 functions in archive_utils.

 - update make_tarball to look for SOURCE_DATE_EPOCH in environment and
 setup a filter to modify mtime while taring.
@Carreau
Copy link
Contributor Author

Carreau commented May 25, 2020

Force-pushed with proper commit descriptions. I'd appreciate a pointer to a test that check a tar (not tar.gz as those are not yet reproducible), and can modify it to make sure the hash can be reproduced.

@benoit-pierre
Copy link
Member

Have you checked #1512?

@Carreau
Copy link
Contributor Author

Carreau commented May 25, 2020

I have not, but thanks, you definitely found the same issues that I did, though I went for much smaller and only tackle the minimal amount in one PR.

I'll likely get inspiration from your tests.

@kushaldas
Copy link

Any update on this?

@jaraco
Copy link
Member

jaraco commented Aug 14, 2020

@Carreau From your last comment, it wasn't clear to me whether you wanted to do more work on this or not. Can you clarify what the status is of this request?

@jaraco jaraco marked this pull request as draft September 23, 2020 20:34
@jaraco jaraco removed the draft label Sep 23, 2020
@jaraco
Copy link
Member

jaraco commented Oct 17, 2020

I'm closing this for now, but happy to revive the effort at any point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants