Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding tests to ensure pip install is reproducible #7808

Open
FRidh opened this issue Mar 1, 2020 · 16 comments
Open

Adding tests to ensure pip install is reproducible #7808

FRidh opened this issue Mar 1, 2020 · 16 comments
Labels
C: tests Testing and related things state: awaiting PR Feature discussed, PR is needed

Comments

@FRidh
Copy link

FRidh commented Mar 1, 2020

Environment

  • Nixpkgs at 9a4d723e436d8c1b94782dbff6d2d8dd4bf5dce5
  • pip version: 20.0.2
  • Python version: CPython 3.7
  • OS: Any Linux OS using Nix builds

Description
Using pip at 19.3.1 building of Python packages was reproducible, however, since we upgraded to 20.0.2 that is no longer the case (NixOS/nixpkgs#81441). The issue is that bytecode now refers to a unpacked-wheel folder (see e.g. https://r13y.com/diff/b03e875784ba16e60ec59d1d5e720b13fc7ae1311edf409b819b0b264acd5393-aba2cac73c5234af8fbf694100bc631800ca397ad7afeed335d38895322a7c71.html).

The function for this unpacked-wheel was added in #7483, I have not bisected this any further though cc @chrahunt @raboof

Expected behavior
Reproducible build.

How to Reproduce

@chrahunt
Copy link
Member

chrahunt commented Mar 2, 2020

This makes sense if during the build you were specifying --build-dir. This should be resolved by #6030, since we will byte-compile the .py files in-place.

@FRidh
Copy link
Author

FRidh commented Mar 2, 2020

We build wheels using

python setup.py bdist_wheel

and install them using

python -m pip install ./*.whl --no-index --prefix="$out" --no-cache $pipInstallFlags --build tmpbuild

@chrahunt
Copy link
Member

chrahunt commented Mar 3, 2020

Thanks. --build would have forced pip to use the same directory for the initial wheel extraction, assuming that directory was always the same during installation. Then since we do the byte-compilation on the extraction directory (instead of the install directory), it would have had the same embedded directory path each time.

A straightforward way to fix this without requiring too much change is to recurse over the unpacked wheel directory explicitly and use py_compile.compile with a dfile parameter that matches the expected install directory.

@chrahunt
Copy link
Member

chrahunt commented Jul 5, 2020

This should be fixed by #8541, however we will need tests before calling this done. Tests would look like:

  1. create a wheel in-memory that contains some .py files (with tests.lib.wheel.make_wheel)
  2. pip install that wheel with SOURCE_DATE_EPOCH set
  3. hash all the files
  4. uninstall that package, verify all files are gone
  5. set tmpdir that pip would use to a different directory
  6. pip install that wheel again
  7. hash all the files and compare against previous digests

and would probably go in tests/functional/test_install_wheel.py (which can be referenced for some of the other steps).

@pfmoore
Copy link
Member

pfmoore commented Jul 5, 2020

pip install that wheel with SOURCE_DATE_EPOCH set

Am I missing something here? There's no mention of SOURCE_DATE_EPOCH in the pip codebase, tests, or documentation. So I don't think we should be making any commitments (much less mandating via tests) of how pip will behave if that's set.

@chrahunt
Copy link
Member

chrahunt commented Jul 5, 2020

It's not us, but the standard library. If we're using either of compileall.compile_file or py_compile.compile it should be handled for us automatically.

I don't have a stake in this, so I won't drive the discussion more than that. Anyone can feel free to pick this up and put time into it. :)

@pfmoore
Copy link
Member

pfmoore commented Jul 5, 2020

It's not us, but the standard library.

Ah, OK. In which case, I'd say that pip's test should simply be that we pass None to the stdlib function (and hence that we follow stdlib behaviour). We shouldn't test that the stdlib does what it says it will do.

If, however, we want to document for pip what guarantees we give around reproducibility, then yes we should have a test that ensures we deliver that behaviour. But I'd say document first, then test - we don't need any more tests that check undocumented behaviour 🙁

I don't have a stake in this

Nor do I, beyond not wanting to be expected to support a behaviour that I don't understand and personally have no need for. So I'll say no more for now. As you say, someone who has an interest in this can define what they want and push the discussion on that basis.

zimbatm added a commit to zimbatm/nixpkgs that referenced this issue Oct 31, 2020
The previous attempt wasn't covering all of the bases. It relied on
invoking that pip-install-hook, and didn't apply to pip itself.

The core issue is that the generated .pyc files embed some of the
temporary paths, which are randomly generated. See
https://r13y.com/diff/bf8c3ca3148ebff9ecf41f294cc60b9f209c006d49699e356969ff32d736f1c6-8806a7cca91fdd300e48736bfcd57c4d0b54c1cc2fd61609f35143170862b59c.html

In this new attempt, the approach is to patch the TempFile
implementation directly, so that it creates stable temporary
directories. We also assume that if SOURCE_DATE_EPOCH is set, we are in
a scenario where reproducible builds are desirable and enter that
branch.

See also pypa/pip#7808
zimbatm added a commit to NixOS/nixpkgs that referenced this issue Oct 31, 2020
The previous attempt wasn't covering all of the bases. It relied on
invoking that pip-install-hook, and didn't apply to pip itself.

The core issue is that the generated .pyc files embed some of the
temporary paths, which are randomly generated. See
https://r13y.com/diff/bf8c3ca3148ebff9ecf41f294cc60b9f209c006d49699e356969ff32d736f1c6-8806a7cca91fdd300e48736bfcd57c4d0b54c1cc2fd61609f35143170862b59c.html

In this new attempt, the approach is to patch the TempFile
implementation directly, so that it creates stable temporary
directories. We also assume that if SOURCE_DATE_EPOCH is set, we are in
a scenario where reproducible builds are desirable and enter that
branch.

See also pypa/pip#7808
@nanonyme
Copy link

nanonyme commented Nov 3, 2020

@pfmoore if PR was made that would make argument of said compile functions None to use standard library convention instead, would it be accepted? This is where it's documented https://docs.python.org/3/whatsnew/3.7.html#py-compile

@nanonyme
Copy link

nanonyme commented Nov 3, 2020

The end result is essentially checked hash so bytecode is still validated against source for changes but not based on mtime but instead through hash.

@nanonyme
Copy link

nanonyme commented Nov 3, 2020

Looks like https://github.com/pypa/pip/blob/master/src/pip/_vendor/distlib/util.py#L595 is fine as is. It will result in whatever standard library uses.

@nanonyme
Copy link

nanonyme commented Nov 3, 2020

Similarly https://github.com/pypa/pip/blob/master/src/pip/_internal/operations/install/wheel.py#L715 looks fine as is. The default is checked hash assuming Python works as intended.

@nanonyme
Copy link

nanonyme commented Nov 3, 2020

Any chance the title could be reflected that this issue is not in fact about fixing pip but adding tests? :)

@FRidh
Copy link
Author

FRidh commented Nov 5, 2020

I can confirm the issue is indeed resolved with 20.2. Thank you. It would be great to have tests to avoid this issue from occurring again.

FRidh added a commit to NixOS/nixpkgs that referenced this issue Nov 5, 2020
Reproducible builds of pyproject projects using pip is resolved.

Fixes pypa/pip#7808
Fixes #81441

The more recent c409f69 caused trouble
with pyproject troubles and had to be reverted anyway.
#102222 (comment)

Revert "pythonPackages.pip: make reproducible (#102222)"

This reverts commit c409f69.

Revert "python3Packages.pip: allow setting reproducible temporary directory via NIX_PIP_INSTALL_TMPDIR"

This reverts commit aedbade.
FRidh added a commit to NixOS/nixpkgs that referenced this issue Nov 5, 2020
Reproducible builds of pyproject projects using pip is resolved.

Fixes pypa/pip#7808
Fixes #81441

The more recent c409f69 caused trouble
with pyproject troubles and had to be reverted anyway.
#102222 (comment)

Revert "pythonPackages.pip: make reproducible (#102222)"

This reverts commit c409f69.

Revert "python3Packages.pip: allow setting reproducible temporary directory via NIX_PIP_INSTALL_TMPDIR"

This reverts commit aedbade.
@pradyunsg pradyunsg changed the title pip install no longer reproducible Addiing tests to ensure pip install is reproducible Nov 5, 2020
@pradyunsg pradyunsg changed the title Addiing tests to ensure pip install is reproducible Adding tests to ensure pip install is reproducible Nov 5, 2020
@pradyunsg
Copy link
Member

PRs adding tests are welcome!

@pradyunsg pradyunsg added state: awaiting PR Feature discussed, PR is needed C: tests Testing and related things and removed type: bug A confirmed bug or unintended behavior labels Nov 5, 2020
@nanonyme
Copy link

Uagh. I just read through a reproducible report out of our build system and apparently without the nixos patch pip using non-deterministic pip-req-build path results in eg in debuginfo for cython ending up with these paths hardcoded. I think the core thing here is that you can't externally affect the seed of mkdtemp. (which is not really a problem in pip but in CPython)

@nanonyme
Copy link

There was a bug open about this https://bugs.python.org/issue32276 but it was rejected. IMHO it's silly to blame everything around CPython about not being reproducible when the source of unreproducibility is in CPython.

SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 6, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 18, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
SomberNight added a commit to SomberNight/electrum that referenced this issue Aug 30, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
SomberNight added a commit to SomberNight/electrum that referenced this issue Sep 26, 2022
We compile from tar.gz, instead of using pre-built binary wheels from PyPI.
(or if the dep is pure-python, use tar.gz instead of "source-only" wheel)

-----
Some unorganised things below for future reference.

```
$ dsymutil -dump-debug-map dist1/hid.cpython-39-darwin.so
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hid.o unable to open object file: No such file or directory
warning: (x86_64) /private/var/folders/1n/zc14m3td0rg4nt0ftklmm7z00000gn/T/pip-install-bm88zvc1/hidapi_cd307bc31ab34252b77d11d6d7212fc5/build/temp.macosx-10.9-x86_64-3.9/hidapi/mac/hid.o unable to open object file: No such file or directory
---
triple:          'x86_64-apple-darwin'
binary-path:     'dist1/hid.cpython-39-darwin.so'
...
```

```
$ nm -pa dist1/hid.cpython-39-darwin.so
```

- https://stackoverflow.com/questions/10044697/where-how-does-apples-gcc-store-dwarf-inside-an-executable
- pypa/pip#6505
- pypa/pip#7808 (comment)
- NixOS/nixpkgs#91272
- cython/cython#1576
- https://github.com/cython/cython/blob/9d2ba1611b28999663ab71657f4938b0ba92fe07/Cython/Compiler/ModuleNode.py#L913
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: tests Testing and related things state: awaiting PR Feature discussed, PR is needed
Projects
None yet
Development

No branches or pull requests

5 participants