Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please provide a way to install pipenv without using vendored dependencies #1887

Closed
venthur opened this issue Apr 1, 2018 · 29 comments
Closed
Labels
Type: Vendored Dependencies This issue affects vendored dependencies within pipenv.

Comments

@venthur
Copy link

venthur commented Apr 1, 2018

As discussed in #1885 it is very hard for downstream (i.e. Debian) maintainers to package pipenv when it provides almost all its dependencies in the vendor and patched directory.

Quoting myself from #1885 (comment)

My main concern is still security: If your dependencies also handle their dependencies by vendoring them (and theirs) you'll end up with multiple versions of the same package inside your vendors directory (e.g. the requests package). If one of those packages needs an update you'll have a hard time fixing all of them. For downstream maintainers (Debian, Suse, etc) it is even worse as we usually just update the problematic library and assume this bug to be fixed for all packages depending on it. Now we actively have to search for copies of that library everywhere.

So in order to make life for downstream package maintainers easier, please provide a way -- similar to what pip does -- to install pipenv "unbundeled" and thus using the packages installed on the target system.

@ncoghlan
Copy link
Member

ncoghlan commented Apr 1, 2018

Reference for the way pip handles its dependencies: https://github.com/pypa/pip/blob/master/src/pip/_vendor/README.rst

Most of the constraints and rationale there hold for pipenv as well, with one exception: pipenv currently allows patched dependencies, in order to work around some challenges with the affected upstream libraries. Ideally, those patches will eventually be able to be replaced by either merged upstream changes, or else new vendored dependencies, but actually doing that is outside the scope of this issue. I've filed #1889 to cover looking at options to improve the way we handle that.

@venthur
Copy link
Author

venthur commented Apr 1, 2018

Trying to replace some vendored packages with globally installed ones revealed interesting errors. E.g. when replacing semver I suddendly get ModuleNotFoundError: No module named 'semver' when running the test suite. I suspect that the unittests are doing something with sys.path before running certain tests, because a pipenv run python -m semver runs just fine while the pipenv run pytest tests/ fails with the ModuleNotFoundError

Any pointers what to watch out for when tackling this issue?

@ncoghlan
Copy link
Member

ncoghlan commented Apr 1, 2018

Note: I closed #1889 again, as it turns out I had misremembered how the pip vendoring works. Their approach does support patching the vendored packages, it's just that as a matter of policy, they aim to only apply the minimal patches needed to make the vendoring work. And even pip don't adhere solely to that policy, as the build time parts of setuptools are all dropped (keeping only pkg_resources).

@techalchemy
Copy link
Member

@venthur pipenv run drops the vendor/ and patched/ directories at the head of sys.path and runs in a virtualenv with no global site packages. If you need to use site packages you can install with pipenv install —site-packages

@venthur
Copy link
Author

venthur commented Apr 1, 2018

@techalchemy Wait, I've removed the semver from the vendor directory and pip installed it in the virtualenv. Running pipenv run python -m semver does work, but pipenv run pytest tests/ does not. If pipenv would ignore the packages already installed in that virtualenv, how can the first command work without import error?

I suspect that something happens in pipenv.vendor.delegator.run which is used by the PipenvInstance in tests/test_pipenv.py

@techalchemy
Copy link
Member

@venthur I believe delegator.run won't work as expected for the pipenv run pytest tests/ use case unless your environment is configured correctly for pipenv development and you are in the root folder of the project. In this instance typically it means you would have to run

pip install -e .
pipenv install --dev

When I test locally I usually test on a ramdisk and usually make a virtualenv to use for this.

@venthur
Copy link
Author

venthur commented Apr 1, 2018

This is exactly what I'm doing. When I said I pip installed semver, I meant: I've added semver to setup.py so it gets installed when I call python setup.py develop followed by pipenv install --dev.

Maybe this diff makes a bit more clear what I'm doing: master...venthur:fix/unbundle (I've basically removed jinja2 and semver -- no specific reasons, gotta start somewhere) and added them to setup.py.

Then I followed the advice in CONTRIBUTING.rst

$ cd pipenv
$ virtualenv ~/pipenv-venv  # You can use a different path if you like.
$ source ~/pipenv-venv/bin/activate
$ python setup.py develop
$ pipenv install --dev


$ # To run the test suite locally::
$ pipenv run pytest tests

and get the import error for semver. Calling

$ pipenv run python -m semver

works, however, which is the confusing part.

@techalchemy
Copy link
Member

@venthur I think I understand what's happening...

Is this fixed if you delete tests/__init__.py?

@venthur
Copy link
Author

venthur commented Apr 1, 2018

@techalchemy unfortunately not. Same error as before.

@techalchemy
Copy link
Member

@venthur ok, this relates specifically to how pytest handles paths -- unless invoked with python -m i think it might not add the virtualenv's pythonpath correctly because we are creating further virtualenvs during the testing process. Its kind of complicated, but the -m invocation is likely the most correct one for now

@venthur
Copy link
Author

venthur commented Apr 1, 2018

I'm fairly certain it has something to do how the new process is created in tests.test_pipenv.PipenvInstance using pipenv.vendor.delegator.run. While I can still import semver in the test process (i.e. everything that is called by pytest), I can't import it anymore as soon as I am in the context of the new subprocess created by delegator.run (e.g. in resolver.py) -- I've spent a few hours digging through the code but I can't find anything.

Here's a quick way to reproduce the problem:

  1. remove the semver from pipenv/vendor
  2. install semver manually in your vitualenv (emulating pip install pipenv with semver in setup.py's install_requires)
  3. run: pipenv run pytest tests/test_pipenv.py::TestPipenv::test_local_zipfiles

@techalchemy
Copy link
Member

@venthur as I mentioned, this is because resolver.py is actually not run simply inside a delegator.run call, but is run inside a separate virtualenv which is created specifically for testing (without access to global site packages) and therefore won't be aware of anything that is installed in the context of the originating environment. As @ncoghlan mentioned, the ecosystem is quite complex and partly accounts for why we vendor things

@techalchemy
Copy link
Member

Generally speaking, resolver.py is invoked inside the context of the virtualenv we resolve dependencies against. So roughly speaking:

$ pipenv install <somepackage> 
=> calls site-packages/lib/pipenv/cli.py
==> ensures virtualenv in $WORKON_HOME/$(cwd)-$(deterministic_hash) or $(cwd)/.venv
====> generates lockfile by calling $virtualenv/bin/python $global_pipenv_dir/resolver.py <whatever>

You can see the last step is in the context of a new virtualenv, which doesn't have the originating environment's python path anymore which is why debundling things loses them. You'll find in cli.py and in other parts of our code we specifically insert pipenv onto sys.path at runtime to account for this

@techalchemy
Copy link
Member

@venthur one step closer as of #2035 which brings in the tooling from pip to automate updates of this stuff + the patches involved, not completely organized (the ones in the patched directory for example are more verbose/don't have automated import rewrites), but this should help at least

@uranusjr
Copy link
Member

uranusjr commented Jun 6, 2018

Are there anything left to do on this one?

@eli-schwartz
Copy link
Contributor

Given there was no progress reported on this issue, but I haven't been looking at commit logs to see if there was something that just went unmentioned... my question would be if there's anything that got done.

@uranusjr
Copy link
Member

uranusjr commented Jun 6, 2018

@eli-schwartz There had been a lot of effort to clean up patch generation and licensing issues around vendored libraries. I read like ten of them all related to this issue. I asked if there are anything left because I know there already are things done.

@eli-schwartz
Copy link
Contributor

Patch generation is related inasmuch as it's a prerequisite to effectively getting changes to stick, but I'm not seeing how it itself results in pipenv being able to function with devendored dependencies. I'd see it as a "dependent issue needing to be resolved before this one can be worked on", not "part of this issue". All it really says is "great, now we can finally get started on trying to make devendoring work".

I'm not seeing how licensing is related at all. It's a completely separate issue that simply happened to be discovered during the course of someone's attempts to devendor pipenv. In fact, it's only when pip is vendored that there's licensing issues at all -- removing them (by devendoring them) automatically resolves the licensing issues, because the only thing that needs to be licensed is pipenv itself, as the devendored dependencies coming from the system already come with licenses.

@techalchemy
Copy link
Member

Hm not much to discuss here honestly. Licensing was the issue that had the domino effect of bringing in vendoring and patching tooling because of other distro maintainers needs. As for tooling to de-vendor our vendored dependencies, someone could port the pip tooling over if they were so inclined but it’s not really high priority for us to handle this ourselves right now

@techalchemy
Copy link
Member

Basically I don’t have any issue with having this, but I doubt we are going to build it so if distro maintainers want to have this the fastest route is going to be to PR in the tooling from pip

@eli-schwartz
Copy link
Contributor

If I understand @venthur's previous comments, pipenv doesn't use custom import paths at all, but relies on running some bits of its own code inside the pipenv-created virtualenv, so site-packages cannot be used, then inserting its own "vendor" and "patched" directories onto the path. I'm not sure how pip's tooling would help there.

However, it looks like #1758 would fix this for the case mentioned earlier here, so I guess that would be something that gets us closer.

@techalchemy
Copy link
Member

@eli-schwartz that is just not correct, we have custom import paths all over the codebase.

@jayvdb
Copy link
Contributor

jayvdb commented Mar 20, 2019

I've gone through the process of complete de-vendoring of pipenv/vendor, with only a few non-internet tests failing. Also see #3630 for the tests which are failing probably due to not being declared as needing internet/bugs in the mocking. I am going through the process a second time now, hopefully cleaner, to avoid functional breakages caused by my first de-vendoring.

IMO the first step should be more clarity between pipenv/vendor vs pipenv/patched. There are quite a few packages in pipenv/vendor which are actually functionally patched, and IMO they should be moved to patched , or vendor and patched merged and the patches more clearly split into functional-changes-only and vendoring-only patches, ideally with functional-changes patches applied first (as upstream and de-vendoring needs them to cleanly apply to the upstream sdist), and vendoring patches based on top of the functional-changes-only patches where applicable.

And a concerted effort needs to be put into upstreaming the functional-changes patches. I've found several patches which havent been provided to upstream, and sometimes I couldnt find an issue raised upstream. Probably there are good reasons for this, but it work to be done. Also there are a few functional changes which obviously are not acceptable upstream, such as the sys.path.append at https://github.com/pypa/pipenv/blob/dc41e66/pipenv/vendor/pipdeptree.py#L17 , and as others have indicated those are hacks which need to be removed before they can be de-vendored normally, and some are probably really not easy to remove (and not a priority for the core devs). IMO people wanting to de-vendor are going to need to help tackling these problems.

Anyone interested can find all pipenv/vendor and some pipenv/patched packages as .spec files in https://build.opensuse.org/project/show/devel:languages:python , and https://build.opensuse.org/project/show/home:jayvdb:coala:python3-bears has some patches applied. A static copy of the messy first iteration of de-vendored pipenv .spec can find it at https://build.opensuse.org/package/show/home:jayvdb:branches:home:jayvdb:coala:python3-bears/python-pipenv

The following are my notes wrt the patches in the 2018.11.26 release - some of these are now irrelevant, but packagers of 2018.11.26 release will still benefit from them.

Most important is be3c771 , not a listed patch, replacing cursor with vistir due to recently uncovered legal problems with using GPL cursor in a non-GPL project.

vendor/patches:

patched/patches:

There are also a few other patches possibly needed by de-vendoring

@uranusjr
Copy link
Member

Regarding vendored/patched, we switched to a more programatically patching workflow a while ago, so there’s functionally no differences between vendored and patched (they exist only because of legacy code paths). The indication whether a project is patched is only indicated by it having an entry in tasks/patching or not.

@jayvdb
Copy link
Contributor

jayvdb commented Mar 20, 2019

@uranusjr, but there are two distinct types of patches -- patches only needed for the vendoring to work, and patches which change the vendored package.

@uranusjr
Copy link
Member

Yes, I don’t object to your proposal of repurposing them to distinguish between different kinds of patching, my comment was merely exaplaining why they seem to lack clarity.

@jayvdb
Copy link
Contributor

jayvdb commented Apr 3, 2019

I've mostly finished a second pass of de-vendoring, and anyone attempting it should be aware that there are bugs caused by the de-vendoring, which are not fully diagnosed, but do appear to be quite are serious, and they do not appear to be easily fixed. The only counter-point is RedHat has a partially de-vendored pipenv package, and I havent seen many complaints about it.

You can see some of the problems at PRs #3645 and #3670 , and issue #3644 .

In short, any Pipfile containing a reference to a package which is also a dependency of pipenv may be buggy when pipenv is de-vendored. Expected behaviour of pipenv is broken in many, many ways. This has so far only been verified to occur with requests , but it is a reasonable assumption that the same occurs also for the other de-vendored dependencies.

As requests is a dependency of both pipenv and notpip, some of these failures may only occur for any de-vendored dependencies which are also dependencies of notpip. But notpip has a large dependency list, so this is still a large list of problematic packages which wont work correctly in a Pipfile. locking and uninstalling are the two main problem areas. Installing editable versions also appears problematic.

More analysis needed. Probably a good place to start is creating Pipfile tests which include packaging as a dependency, which occurs in both pipenv/vendor and pipenv/patched/notpip/_vendor , and pkg_resources (setuptools) which is in pipenv/patched/notpip/_vendor and is also an installed part of any venv. And then tests for specific versions of those, and requests, and see if/how pipenv hides the venv installed version of those packages and not let them interfere with the Pipfile managed version. That might expose actual bugs in pipenv that are occur even when not de-vendored. Or if it can properly handle packaging, the same mechanism might be expanded to handle de-vendored packages. if I recall correctly there is special logic inside pip/notpip which hides the venv pre-installed packages, but it may not work correctly when Pipfile also lists the same packages.

If the goal of de-vendoring is only to avoid duplicates and licensing issues, IMO a better approach is to have the pipenv package depend on all of the vendored dependencies (so any change in dependencies triggers a rebuild), and then re-vendor the dependencies at build time so that the pipenv has a copy of the dependencies, but those copies are identical to the real package provided by the distro, and any fixes in the dependencies will be automatically replicated into pipenv. I've done this, and it avoids the problems mentioned above. The downside is that the pipenv package contents will be modified quite frequently, for any minor modification in any dependency, so the package version will be bumped up regularly, and users will be downloading new versions of (quite large) pipenv frequently.

Another word of caution, any very minor bugs in packaging of dependencies can cause test failures, such as bad constraints in the egg-info requires.txt files, as the test suite uses pkg_resources distribution routines, and read all of requires.txt it can find, and fail badly on encountering anything unexpected. In my case, it was a few bad fdupes which were triggering this.

@jayvdb
Copy link
Contributor

jayvdb commented Apr 5, 2019

Latest spec and patches at https://build.opensuse.org/package/show/home:jayvdb:coala:python3-bears/python-pipenv , with specs and patches for de-vendored dependencies all in https://build.opensuse.org/project/show/devel:languages:python .

It is passing all unit and integration tests except for about eight that are explicitly de-selected with reasons given, and a few test scenarios at sarugaku/requirementslib#152 which are not fully understood yet.

@matteius
Copy link
Member

At this time, all vendor and patch dependencies in pipenv are intentionally bundled together to ship a working pipenv. It would not be advised for a package maintainer to include only a subset of pipenv in the current form to say remove vendor and rely on system dependencies for the rest because you may end up with a difference in intended behavior. The best example I can provide is around the more recent patches to support index restricted packages. I feel compelled to close this issue out but if more discussion is required we can revisit or reopen as needed.

@matteius matteius added the Type: Vendored Dependencies This issue affects vendored dependencies within pipenv. label Jun 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Vendored Dependencies This issue affects vendored dependencies within pipenv.
Projects
None yet
Development

No branches or pull requests

7 participants