Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement provenance_url.json #11865

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Conversation

fridex
Copy link

@fridex fridex commented Mar 14, 2023

Signed-off-by: Fridolin Pokorny <fridolin.pokorny@datadoghq.com>
@fridex
Copy link
Author

fridex commented Mar 15, 2023

Checking the failed test:

FAILED tests/functional/test_wheel.py::test_pip_wheel_ext_module_with_tmpdir_inside - AssertionError: Script returned code: 1

I'm not sure whether this PR introduces the given failure. Also, it looks like it fails only on Mac.

@webknjaz webknjaz added skip news Does not need a NEWS file entry (eg: trivial changes) and removed skip news Does not need a NEWS file entry (eg: trivial changes) labels Mar 16, 2023
@fridex fridex marked this pull request as draft March 16, 2023 21:39
url=self.download_info.redacted_url,
info=ArchiveInfo("sha256=" + sha256.hexdigest()),
provenance_file=True,
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick drive-by comment. I'll review in depth when the pep is accepted. I note however that all the pieces should be in place to implement this in pip quite easily, since download_info is actually the same as provenance_url and is available when obtaining wheels from cache too.

To avoid confusion and adding a non-standard provenance_file flag to DirectUrl, I'd add a new provenance_url argument to install_wheel(), to which we can likely pass download_info as-is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See also #11875

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still confused as to what the "url" should be in the case of something like pip install requests (and whether the download_info field holds that data) - in particular, given that pip builds and caches wheels locally, so that there may not actually be a URL for the wheel that got installed.

To be clear I think this is something that needs addressing in the PEP, not here, but I think it's an important difference between this case and the direct_url.json case (where there is a URL by definition). I don't think the PEP can assume that pip does the right thing in this case - it needs to say what it wants, and then we can check if download_info matches that (and if it does, following that spec becomes a new requirement on download_info for future changes to keep in mind).

Copy link
Member

@sbidoul sbidoul Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current version the PEP is indeed unclear on this. I am assuming download_info corresponds to the intent of the PEP, but that intent absolutely needs to be made explicit in the PEP.

For reference, download_info is the URL of original artifact that was downloaded. So the sdist URL if the wheel was built locally. It works also when the locally built wheel comes from the cache, since we now record this URL in the cache entry. This is documented in the installation report specification.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid confusion and adding a non-standard provenance_file flag to DirectUrl, I'd add a new provenance_url argument to install_wheel(), to which we can likely pass download_info as-is.

Thanks, will incorporate this suggestion.

I'm still confused as to what the "url" should be in the case of something like pip install requests (and whether the download_info field holds that data) - in particular, given that pip builds and caches wheels locally, so that there may not actually be a URL for the wheel that got installed.

pip creates origin.json in pip's cache (for both, cached wheels from http and the ones built from sdists). Info from this file seems to be used and correctly propagated internally inside pip source code (using pip with changes in this PR):

$ pip cache purge
$ pip uninstall requests
Found existing installation: requests 2.28.2
Uninstalling requests-2.28.2:
  Would remove:
    /Users/fridolin.pokorny/git/fridex/pip/.venv/lib/python3.8/site-packages/requests-2.28.2.dist-info/*
    /Users/fridolin.pokorny/git/fridex/pip/.venv/lib/python3.8/site-packages/requests/*
Proceed (Y/n)? y
  Successfully uninstalled requests-2.28.2
$ pip install requests
Collecting requests
  Downloading requests-2.28.2-py3-none-any.whl (62 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 62.8/62.8 kB 1.8 MB/s eta 0:00:00
Requirement already satisfied: charset-normalizer<4,>=2 in ./.venv/lib/python3.8/site-packages (from requests) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in ./.venv/lib/python3.8/site-packages (from requests) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.venv/lib/python3.8/site-packages (from requests) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in ./.venv/lib/python3.8/site-packages (from requests) (2022.12.7)
Installing collected packages: requests
Successfully installed requests-2.28.2
$ cat .venv/lib/python3.8/site-packages/requests-2.28.2.dist-info/provenance_url.json
{"archive_info": {"hash": "sha256=64299f4909223da747622c030b781c0d7811e359c37124b4bd368fb8c6518baa", "hashes": {"sha256": "64299f4909223da747622c030b781c0d7811e359c37124b4bd368fb8c6518baa"}}, "url": "https://files.pythonhosted.org/packages/d2/f4/274d1dbe96b41cf4e0efb70cbced278ffd61b5c7bb70338b62af94ccb25b/requests-2.28.2-py3-none-any.whl"}%                                                                                                                                                         $ pip uninstall requests
Found existing installation: requests 2.28.2
Uninstalling requests-2.28.2:
  Would remove:
    /Users/fridolin.pokorny/git/fridex/pip/.venv/lib/python3.8/site-packages/requests-2.28.2.dist-info/*
    /Users/fridolin.pokorny/git/fridex/pip/.venv/lib/python3.8/site-packages/requests/*
Proceed (Y/n)? y
  Successfully uninstalled requests-2.28.2
$ pip install requests                                                                                                      
Collecting requests
  Using cached requests-2.28.2-py3-none-any.whl (62 kB)
Requirement already satisfied: charset-normalizer<4,>=2 in ./.venv/lib/python3.8/site-packages (from requests) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in ./.venv/lib/python3.8/site-packages (from requests) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.venv/lib/python3.8/site-packages (from requests) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in ./.venv/lib/python3.8/site-packages (from requests) (2022.12.7)
Installing collected packages: requests
Successfully installed requests-2.28.2
$ cat .venv/lib/python3.8/site-packages/requests-2.28.2.dist-info/provenance_url.json
{"archive_info": {"hash": "sha256=64299f4909223da747622c030b781c0d7811e359c37124b4bd368fb8c6518baa", "hashes": {"sha256": "64299f4909223da747622c030b781c0d7811e359c37124b4bd368fb8c6518baa"}}, "url": "https://files.pythonhosted.org/packages/d2/f4/274d1dbe96b41cf4e0efb70cbced278ffd61b5c7bb70338b62af94ccb25b/requests-2.28.2-py3-none-any.whl"}%                                                                                                                                                         $

Similarly, provenance_url.json is also created when installing sdists (but wheel must be installed, not legacy installations).

@pfmoore Thanks for bringing this - we can definitely add this to the PEP to make it explicit.

@github-actions github-actions bot added the needs rebase or merge PR has conflicts with current master label Mar 27, 2023
Copy link
Contributor

@frenzymadness frenzymadness left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very fast first review.

@@ -0,0 +1,2 @@
Implement storing provenance_url.json file as specified in PEP 9999.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the number won't change any more.

Suggested change
Implement storing provenance_url.json file as specified in PEP 9999.
Implement storing provenance_url.json file as specified in PEP 710.

direct_url_path = os.path.join(dest_info_dir, PROVENANCE_URL_METADATA_NAME)
else:
direct_url_path = os.path.join(dest_info_dir, DIRECT_URL_METADATA_NAME)
print(direct_url_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug print?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bot:chronographer:provided needs rebase or merge PR has conflicts with current master
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants