Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle locking all direct reference URL forms. #2060

Merged
merged 4 commits into from
Feb 23, 2023

Conversation

jsirois
Copy link
Member

@jsirois jsirois commented Feb 17, 2023

Previously, it was assumed that direct reference URLs were either VCS
URLs or else wheel or sdist URLs. This neglected two remaining cases,
both of which failed to lock with better or worse error messages. The
two missed cases were:

  1. URLs of source archives not conforming to the sdist quasi-standard
    naming convention of <project name>-<version>.{.tar.gz,.zip}.
  2. Local file:// URLs pointing at project directories.

A notable case of the 1st are project archives provided by GitHub.
A notable need for the 2nd case comes from Pants where Pip proprietary
requirement strings are not handled (e.g.: /path/to/project) and a
direct reference URL must be used instead (e.g.: projectname @ file:///path/to/project).

Fixes #2057

@@ -213,8 +213,9 @@ def _chmod(self, info, path):
# This magic works to extract perm bits from the 32 bit external file attributes field for
# unix-created zip files, for the layout, see:
# https://www.forensicswiki.org/wiki/ZIP#External_file_attributes
attr = info.external_attr >> 16
os.chmod(path, attr)
if info.external_attr > 0xFFFF:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a bug exposed by GitHub zips, they have no perm bits set (signs of MS I think). Their tarballs present no such problems.

components = fname.rsplit("-", 1)
if len(components) == 2:
project_name, version = components
return cls(project_name=project_name, version=version)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this cause issues if the URL coincidentally happens to have a single - in it, but isn't an sdist? For instance: given https://example.com/some-package.zip (or some other suffix that's more version-like), it seems like this'll happily return ProjectNameAndVersion(project_name="some", version="package") and thus _extract_resolve_data and its callers might not fall into the correct code path (and/or crash)?

If I'm understanding this correctly, one idea might be not trying to handle sdists specially, and let them fall through to this new code for arbitrary archives, since I imagine the pip logging is similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place to fix and, as you suggest, just not trying to handle non-wheels at all in the locker should work now.

Copy link
Member Author

@jsirois jsirois Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right - no. This is complicated. I'll need to revisit the case of additional artifacts. You can't get additional artifacts for an archive URL direct reference - direct reference requirements are always singular. But for a normal sdist from PyPI, it can be just one of many archives that serve as artifacts for a universal lock of a particular pin - and this is where the current importance of identifying a Pin cheaply lies. More work on this tomorrow am...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this was indeed complicated, but really only due to organic growth in the Locker logic. I took a clean-slate approach to that and it's now much simpler. The upshot is that parsing name and version from an artifact is now only ever done for wheels - which it's guaranteed to work for, and for links Pip identified as candidates via the file name / index structure. These can include sdists that are never built and we can rely on Pip here and avoid a build just to get name / version it has already blessed.

pex/resolve/locker.py Outdated Show resolved Hide resolved
@jsirois
Copy link
Member Author

jsirois commented Feb 23, 2023

This last commit represents a very large change. I may close this out after vetting green across the various Pip permutations checked by CI and post a new PR with the changes split up.

Previously, it was assumed that direct reference URLs were either VCS
URLs or else wheel or sdist URLs. This neglected two remaining cases,
both of which failed to lock with better or worse error messages. The
two missed cases were:
1. URLs of source archives not conforming to the sdist quasi-standard
   naming convention of `<project name>-<version>.{.tar.gz,.zip}`.
2. Local file:// URLs pointing at project directories.

A notable case of the 1st are project archives provided by GitHub.
A notable need for the 2nd case comes from Pants where Pip proprietary
requirement strings are not handled (e.g.: `/path/to/project`) and a
direct reference URL must be used instead (e.g.: `projectname @
/path/to/project`).

Fixes pex-tool#2057
Avoid parsing name and version from anything but wheels, where there is
a hard standard.
@jsirois
Copy link
Member Author

jsirois commented Feb 23, 2023

Alright 83583e2 is now solid. PTAL.

hashing.file_hash(source_archive_path, digest)
source_fingerprint = Fingerprint.from_digest(digest)
self._selected_path_to_pin[selected_path] = build_result.pin
elif "file" == artifact_url.scheme:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard for me to tell by eyeballing, but are we certain we will never encounter a schemeless URL (i.e., just a file path) in the parsed output, that will eventually get passed here?

In other words, should also check for a None scheme and treat it like a file: ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, certain. ITs cover that case across Pex's closed universe of Pips. Pip records all of ., $PWD and file://$PWD as file://$PWD in the logs. The only anomaly is in its Saved ... log lines (which are used by the lock log analyzer to filter out backtracks). In those lines local file paths that are directories never appear. This one anomaly carries a fair bit of supporting code and tests of its own.

@jsirois jsirois merged commit 93e904a into pex-tool:main Feb 23, 2023
@jsirois jsirois deleted the issues/2057 branch February 23, 2023 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants