Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul of venv_metadata_inspector.py, now using modern python libraries #588

Closed
wants to merge 199 commits into from

Conversation

itsayellow
Copy link
Contributor

@itsayellow itsayellow commented Dec 18, 2020

  • I have added an entry to docs/changelog.md

Summary of changes

Closes #587. Closes #534. Closes #528 .

This revamps venv_metadata_inspector.py to use modern python libraries like packaging and importlib.metadata / importlib_metadata instead of pkg_resources for venv metadata inspection.

This fixes some things we didn't know we had problems with (see Issues above). One thing I noticed about the old venv_metadata_inspector.py that isn't captured in any Issue, is that currently it completely ignores any environment markers, and just assumes every entry in an install_requires is equally valid. The new code evaluates both markers and the presence or absence of package extras (which may be present in dependencies) correctly.

I started with #344, and made changes.

The biggest changes are to venv_metadata_inspector.py and shared_libs.py

I have two questions:

  • Do we need to list importlib_metadata in pipx's install_requires if pipx doesn't actually use it in pipx code, but rather installs it in the shared venv? RESOLVED--Removed it from install_requires
  • The list of libraries installed in the shared venv is:pip, wheel, packaging, importlib-metadata, and setuptools. Are all of these necessary? RESOLVED--Keeping all of them.

After this PR merges I hope to release pipx 0.16.0.0!

Test plan

The code was extensively tested with our new "slow" tests and 62 packages:
https://github.com/itsayellow/pipx/actions/runs/429239351

There were no errors about pipx's venv metadata inspection missing apps or missing apps of dependencies (This is specifically checked in the slow tests). The only problems were packages that would not install at all on the CI systems due to missing wheels and/or missing header files on the system.

@itsayellow itsayellow changed the title No pkg resources Overhaul of venv_metadata_inspector.py, now using modern python libraries Dec 18, 2020
setup.cfg Outdated Show resolved Hide resolved
Comment on lines +38 to +40
"packaging",
"importlib-metadata",
"setuptools",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need packaging and importlib metadata as shared lib?

Copy link
Contributor Author

@itsayellow itsayellow Dec 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the major change to venv_metadata_inspector.py. Instead of it using setuptools' pkg_resources to inspect each venv, it now uses packaging and importlib[.-]metadata.

From packaging.requirements it uses Requirement, and from packaging.utils it uses canonicalize_name().
From importlib[.-]metadata it uses distribution(), distributions(), and type Distribution.

I couldn't figure out a way to get along without both of those dependencies if we transition away from pkg_resources. If anyone can think of a way I'd be happy to pursue it! Adding more libraries to the shared libs does increase the chance that we will conflict with requirements in the venvs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid importlib-metadata in 3.8+ environments by adding a marker here (plus try-except imports in venv_metadata_inspector)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was guarding against the user possibly having a mix of 3.8+ and <3.8 venvs, in which case the shared libs would need importlib-metadata even if pipx and possibly other venvs were running in 3.8+.

Copy link
Contributor Author

@itsayellow itsayellow Dec 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although now that I think about it, if the shared libs are installed 3.8+ but used with a venv that is <3.8, does that imply that the <3.8 venv gets to use importlib.metadata?

That doesn't seem to work unfortunately. I just tried it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is already a try-except sequence in venv_metadata_inspector.py to first attempt to use importlib.metadata and then to fall back on using importlib-metadata

@itsayellow itsayellow requested a review from a team December 18, 2020 20:54
Copy link
Contributor

@gaborbernat gaborbernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsayellow
Copy link
Contributor Author

@cs01, any thoughts? I realize it's around Christmas and you probably have better things occupying yourself. Just thought I'd try to at least give you an opportunity to chime in before I start releasing things. 🙂

Happy Holidays everyone, btw 🎄

except FileNotFoundError:
pass

# not sure what is found here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the case? I wonder if we can remove this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wondered too...but I was also really worried about corner cases because the universe of strange python packages is vast.

I wonder if this duplicates the above for loop for path in dist.files. Maybe one of the other python packaging experts can give some guidance?

Copy link
Contributor Author

@itsayellow itsayellow Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uranusjr , @gaborbernat , do either of you know if installed-files.txt is only used for Python 2.*? It's hard to find documentation on it, but what I find all seems to refer to Python 2.*. If installed-files.txt is only used for Python 2.* then we can definitely remove the code that searches it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think it has anything to do with Python 2/3. AFAIK installed-files.txt is a setuptools/distutils artifact from the old days before .dist-info was specified. Newer setuptools versions should never write this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find references to ancient pip generating this file (there is a mention in changelog for pip 0.3 https://pip.pypa.io/en/stable/news/#id556)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless of the library of origin, it seems very old. If we are updating our shared libraries this code that views installed-files.txt seems like it can be removed.

Let me know if there are any objections, otherwise I will remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the code closely; can someone explain what it actually does? While new(-ish) packaging tools would never generate the file, it may still be present in systems based on Python packaging but never bothered to update.

Copy link
Contributor Author

@itsayellow itsayellow Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just goes through every line in file installed-files.txt and sees if any of them is in the venv bin/ subdirectory. Basically exactly the same thing as the for loop above it does by searching the iterable Distribution.files from importlib[.-]metadata.

If this is part of ancient pip, I'm wondering if it was created each time on install on the client machine, and not made as part of the package distribution?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the few mentions I can find on the internet, from a setuptools Issue discussing a possible solution (June 2015):
pypa/setuptools#371 (comment)

Just have setuptools add installed-files.txt to the egg-info directory. pip has been doing this for a long time before .dist-info was on the radar.

@cs01
Copy link
Member

cs01 commented Dec 24, 2020

Wow this is great, an early Christmas present :). This and will handle all the annoying corner cases, and be a nice foundation for the future of pipx.

The list of libraries installed in the shared venv is: pip, wheel, packaging, importlib-metadata, and setuptools. Are all of these necessary?

The packages installed into the shared libs were originally introduced in https://github.com/pipxproject/pipx/pull/168/files.

As you know, the shared libs are essentially "merged" at runtime with pipx's package-specific venvs via the pth file. This is to save time by avoiding installing a bunch of the same things over and over for each venv. It forms a common environment for the app's venvs to be able to run pip and the metadata inspector.

pip is definitely required to install packages. pip often needs to build things from source. The source build system is often setuptools, so that is required in case the package is an sdist. To build into a wheel, it needs the wheel package. packaging, importlib_metadata are used directly by venv_metadata_inspector.py so they are required too.

This LGTM!

Comment on lines +35 to +40
self.required_packages = [
"pip",
"wheel",
"packaging",
"importlib-metadata",
"setuptools",
Copy link
Member

@cs01 cs01 Dec 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized there are no version constraints on these package specifiers. I have been getting bit on some semver changes on my other projects so it has been on my mind. However, thinking about these a little, it's likely we always want to be on the latest version of all of them since they all work in tandem, and we'll want the latest metadata changes available.

Adding more libraries to the shared libs does increase the chance that we will conflict with requirements in the venvs.

I wonder what would happen here. The shared lib packages are added via the pth file, and aren't part of the requirements of the package being installed. I assume pip (and its new dependency resolver) only applies to the requirements/package being installed (and its dependencies). The shared lib pip invocation is separate from app package installation and manipulation, so maybe there won't be any issues here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what would happen here. The shared lib packages are added via the pth file, and aren't part of the requirements of the package being installed. I assume pip (and its new dependency resolver) only applies to the requirements/package being installed (and its dependencies). The shared lib pip invocation is separate from app package installation and manipulation, so maybe there won't be any issues here?

It depends on how and when the shared libs are injected. pip determines whether a package is installed by inspecting directories on sys.path, so if the shared libs are injected when pip install is run, pip will consider those packages as “installed”. If you’re going to add version constraints to those libraries, they could end up conflicting with applications users want to install.

I think maybe the best way to implement venv_metadata_inspecor.py is to run it as a zipapp instead. pipx can create a zipapp venv_metadata_inspecor.pyz that contains the script and all its dependencies in a cache directory, and run python venv_metadata_inspecor.pyz instead. The same technique can actually be used for pip install to avoid potential edge cases where an application depends on a certain pip and setuptools version not compatible with pipx (not likely to happen though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zipapp idea is very interesting. Wouldn't pipx still need to be running inside of the target venv anyway for importlib[.-]metadata.Distribution to get information? And wouldn't that end up with the same version conflicts?

I don't see any way to point importlib[.-]metadata to examine another venv than the current one, in which case I'm worried even a zipapp would encounter the same version conflicts.

Is there a way around these problems?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh once I actually wrote out "any way to point importlib[.-]metadata to examine another venv than the current one" I realized that is the key, and I found this pointer, which may show the way to do that:
https://docs.python.org/3/library/importlib.metadata.html#extending-the-search-algorithm

I'm wondering if we could hook into that to facilitate running the code in one venv and inspecting another?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we could do THAT, then we could make venv inspection just another part of pipx and not a separate runnable file. That would really be cool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I'm only changing sys.path in order to change what importlib[.-]metadata is looking at when we get info about the distributions. Otherwise, the code is still part of the pipx package and not in a separate venv.

Copy link
Member

@uranusjr uranusjr Dec 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK, so you do all the inspection outside of the target virtual environment? That should work.

I believe you can even avoid sys.path manipulation altogether by passing a path context, e.g. importlib_metadata.distributions(path=path_to_venv_site_packages) (not sure if this would work, @jaraco is the expert on this).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you can even avoid sys.path manipulation altogether by passing a path context, e.g. importlib_metadata.distributions(path=path_to_venv_site_packages) (not sure if this would work, @jaraco is the expert on this).

That would be even better. Let me see what I can do.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path parameter is meant to be a sequence like sys.path so something like importlib.metadata.distributions(path=[path_to_venv_site_packages]) may be what you're after.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jaraco. I think I've got that working in another branch.

@itsayellow itsayellow mentioned this pull request Dec 27, 2020
1 task
@itsayellow
Copy link
Contributor Author

Closing in favor of instead using #593

@itsayellow itsayellow closed this Dec 27, 2020
@itsayellow itsayellow deleted the no_pkg_resources branch April 13, 2021 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants