Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove virtualenv usage from mach, pep-370 is fine #11060

Closed
wants to merge 1 commit into from

Conversation

@gdamjan
Copy link

gdamjan commented May 6, 2016

use pep-370 and PYTHONUSERBASE env variable instead

./mach is modified so it sets PYTHONUSERBASE to python/_virtualenv
also, it tries to import mach.main and if that fails it'll invoke
pip install. when that finishes it re-execs itself (otherwise python won't pick
up the PYTHONUSERBASE as a library path).

all of the virtualenv supporting code is removed from python/mach_bootstrap.py

Why:
virtualenv is an ugly hack, and most of the times a pain. it depends on a copy of the python executable, is
very fragile, and also makes non-relocatable environments.

pep-370 and PYTHONUSERBASE fix all of those, and is officially supported by the
python interpreter.


This change is Reviewable

@highfive
Copy link

highfive commented May 6, 2016

Thanks for the pull request, and welcome! The Servo team is excited to review your changes, and you should hear from @asajeffrey (or someone else) soon.

@highfive
Copy link

highfive commented May 6, 2016

Heads up! This PR modifies the following files:

@mbrubeck
Copy link
Contributor

mbrubeck commented May 7, 2016

@highfive highfive assigned frewsxcv and unassigned asajeffrey May 7, 2016
@highfive
Copy link

highfive commented May 7, 2016

New code was committed to pull request.

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

huh, an interesting breaking of trevis since it already uses a virtualenv (facepalm)

@highfive
Copy link

highfive commented May 7, 2016

New code was committed to pull request.

@larsbergstrom
Copy link
Contributor

larsbergstrom commented May 7, 2016

We should make sure this passes AppVeyor. @vuuk and I had a lot of fun playing with the various pythons in that setting.

Also, I'd be interested in whether @indygreg has any opinions on making this kind of transition away from using virtualenv!

@highfive
Copy link

highfive commented May 7, 2016

New code was committed to pull request.

1 similar comment
@highfive
Copy link

highfive commented May 7, 2016

New code was committed to pull request.

@indygreg
Copy link
Contributor

indygreg commented May 7, 2016

Nice hack to use PYTHONUSERBASE. I may have to steal that idea.

One of the nice things you lose by not using the virtualenv is you don't get pip and a nice python that comes with the virtualenv "activated." It may also still inherit system site-packages, which is undesirable because it undermines isolation and system installed packages may cause conflicts. But you should be able to mimic whatever hacks virtualenv is doing without going through the full virtualenv installer.

Aside from concerns that this sacrifices isolation, I like.

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

setting up PYTHONUSERBASE is just a one liner the same as activating the virtualenv. then you have the shell and pip.

isolation: setting up PYTHONUSERBASE makes that path come before the system site-packages, and if you invoke pip --user --upgrade ... basically the virtualenv package will override the system one (if present). I can add that to the script if needed?

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

if {when :)} the appveyor test passes, I'd like to squash those travis commits, and add small modification to ./mach too. that is ok, right?

@larsbergstrom
Copy link
Contributor

larsbergstrom commented May 7, 2016

@indygreg Thanks for providing the feedback! We look to you for mach/python wizardry :-)

@gdamjan Yes, please feel free to squash and cleanup if/when the AppVeyor build succeeds! Unfortunately, it takes ~62 minutes right now.

@@ -6,7 +6,11 @@
# The beginning of this script is both valid shell and valid python,
# such that the script starts with the shell and is reexecuted with
# the right python.
'''which' python2.7 > /dev/null && exec python2.7 "$0" "$@" || exec python "$0" "$@"
'''true'
HERE=`readlink -f $0`

This comment has been minimized.

Copy link
@frewsxcv

frewsxcv May 7, 2016

Member

This doesn't work for me on OSX:

coreyf@frewbook-pro ~/D/r/servo (remove-virtualenv-from-mach) [1]> ./mach build -d
readlink: illegal option -- f
usage: readlink [-n] [file ...]
usage: dirname path

This comment has been minimized.

Copy link
@gdamjan

gdamjan May 7, 2016

Author

hm, I can make it simpler, I could use $PWD/python/_virtualenv or just remove the readlink altogether.
I've used the readlink/dirname recipe to find the absolute real directory where servo source code is.

$PWD would work if people only ever run ./mach from the source dir.

@frewsxcv
Copy link
Member

frewsxcv commented May 7, 2016

Related issue: #9046

@indygreg
Copy link
Contributor

indygreg commented May 7, 2016

Regarding isolation, the system site-packages will still be on sys.path. This can be problematic for a number of reasons. To understand why, we need to understand what happens when Python starts.

When a Python process starts, it implicitly does an import site (this can be disabled with python -S). Your Python distribution ships a default site.py https://hg.python.org/cpython/file/2.7/Lib/site.py#l523 is the most important part. You can see there is a main() that gets called automatically. And it adds system and user site packages. It's worth noting that PYTHONUSERBASE is not read directly in site.py. Instead, it is consulted by a helper function in sysconfig.py.

site.py executes sitecustomize.py as well. And here's where things get interesting. Various packages register hooks in sitecustomize.py. There are even people on StackOverflow recommending people hack up sitecustomize.py to do things like call sys.setdefaultencoding() which is evil. So basically loading the system or any untrusted site.py/sitecustomize.py can result in unknown and inconsistent behavior.

Sadly, the built-in site.py does not provide a mechanism to disable the system sys.path setting or the loading of sitecustomize.py: there is just python -S to disable site.py.

virtualenv works around this limitation by - get this: installing its own site.py. Embedded within virtualenv.py is a base64 encoded zlib version of a hacked up site.py based on the site.py that ships with CPython. Its main() does things like look for a no-global-site-packages.txt file whose presence tells it not to load the system site-packages. I encourage you to diff a virtualenv's site.py vs your system's to see what virtualenv does.

Anyway, my recollection is the path to site.py is "embedded" in the Python executable via hard-coded path lookup rules e.g. $prefix/lib/python2.7/site.py. When virtualenv creates a bin/python it can create its custom site.py at lib/python2.7/site.py and it gets loaded automatically. Furthermore - and I could be wrong about this - I have vague recollections of virtualenv also doing hacks on the binary Python executable or something equally scary. The virtualenv code is gnarly and I don't feel like verifying that right now.

The important takeaway is that by creating a new bin/python, virtualenv is able to install a custom site.py in a location in its control and it is able to prevent the loading of the global site.py, which ensures only the initialization code in the virtualenv is executed during interpreter start.

If you don't want to create your own bin/python (and at the point you consider that you should just use virtualenv otherwise you are reinventing the wheel), you /could/ in theory provide your own startup .py script containing the bits of site.py you need. You would then python -S this file, do your site.py like foo, massage sys.argv then execfile(). I'm not sure if this would actually work, however: most people just use virtualenv.

The tl;dr is you can't easily isolate from the system site.py and protect yourself against unwanted changes without creating your own bin/python and lib/python2.7/ structure. If you stop using a virtualenv, you'll have random contributors experiencing random failures because some random Python package or user action did something to the system site install that is causing issues with Servo's Python environment. Why take the risk: use a virtualenv.

@indygreg
Copy link
Contributor

indygreg commented May 7, 2016

All that being said, Firefox's mach doesn't activate a virtualenv upon invocation like Servo's does. We do get random bug reports from time to time. We do, however, have a virtualenv that we create for the Firefox build system. It is isolated from the system site.py so we can better guarantee reproducible builds. Many of Firefox's mach commands actually invoke a separate Python process from the build system's virtualenv to achieve isolation.

use pep-370 and PYTHONUSERBASE env variable instead

./mach is modified so it sets PYTHONUSERBASE to python/_virtualenv
also, it tries to import mach.main and if that fails it'll invoke
pip install. when that finishes it re-execs itself (otherwise python won't pick
up the PYTHONUSERBASE as a library path).

all of the virtualenv supporting code is removed from python/mach_bootstrap.py

Why:
virtualenv is an ugly hack, it depends on a copy of the `python` executable, is
very fragile, and also makes non-relocatable environments.

pep-370 and PYTHONUSERBASE fix all of those, and is officially supported by the
python interpreter.
@highfive
Copy link

highfive commented May 7, 2016

New code was committed to pull request.

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

@indygreg I'm nost sure what situation you're thinking can get to conflicts?

using PYTHONUSERBASE and pip --user --ignore-installed will make sure that:

  • the userbase is before any distro installed packages (in site-packages or dist-packages in /usr{/local}/lib/...) - but not before the core python modules (which is fine)
  • --ignore-installed will anyway install needed packages even in the case they are installed by the distro

Why take the risk: use a virtualenv.

Now I agree this change would be a bit controversial. But "just use virtualenv" is not as harmless as it sounds. I'm on several python related irc channels, and virtualenv is incredibly common pain point cause it breaks badly as soon you have a bit more complicated python situation (not to mention upgrades of the system python). Even this PR was motivated because a person had these problems on #servo irc channel.

@autrilla
Copy link
Contributor

autrilla commented May 7, 2016

I'm not sure if this is the proper place to start this discussion, but I don't think this would be a good change to make. virtualenv is actually the de-facto standard for developing Python projects, and moving away from it would be a mistake in my opinion.

Virtualenvs don't actually contain a copy of your system python, but a symlink to them (at least on unixes, not sure about what it does on Windows), and while it's true that you can't simply relocate a virtualenv by moving the directory somewhere else, relocating it is just a matter of pip freezeing and then creating the virtualenv and pip installing.

Also, there's a certain advantage to not relying on environment variables, which are simply extra state. If you use a virtualenv, you can be sure that whenever you run _virtualenv/bin/whatever, it's actually the one you intend, and that gets complicated when you're tinkering with PYTHONUSERBASE.

PEP 370 is 8 years old now, and while some people (myself included) do use pip install --user, which installs to ~/.local (the default PYTHONUSERBASE), I don't know of anyone who actually uses it for anything else, in my opinion for good reasons.

@frewsxcv
Copy link
Member

frewsxcv commented May 7, 2016

Even this PR was motivated because a person had these problems on #servo irc channel.

Do you remember which issue with virtualenv they were encountering?

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

@autrilla

virtualenv is actually the de-facto standard

that's called cargo cult programming :)
(ps. most people don't even know about pep-370 or think it's just ~/.local)

Virtualenvs don't actually contain a copy of your system python

there are some copies, and some symlinks. bin/python is a hard link or a copy (in the case when the virtualenv and /usr are on different filesystems).

there's a certain advantage to not relying on environment variables which are simply extra state.

well, the activate thing from virtualenv is the same thing - also doesn't make any difference for ./mach here

I don't know of anyone who actually uses it for anything else, in my opinion for good reasons

what are those?

@autrilla
Copy link
Contributor

autrilla commented May 7, 2016

@frewsxcv to be honest, I'm somewhat active on #python on freenode, and virtualenv is way more often a solution to someone's problem than it is the problem itself.

@gdamjan

that's called cargo cult programming :)
(ps. most people don't even know about pep-370 or think it's just ~/.local)

That's not really true. In the software world, the community is important, specially in open source. It's better if we use tools most people are familiar with, instead of fiddling with environment variables. Most experienced Python programmers know about it, and they don't use it.

there are some copies, and some symlinks. bin/python is a hard link or a copy (in the case when the virtualenv and /usr are on different filesystems).

bin/python is a symlink on Linux, at least. I think it's libraries that are hard links.

well, the activate thing from virtualenv is the same thing - also doesn't make any difference for ./mach here

You don't have to use activate. afaik mach does not use it. You do have to set the environment variables if you use PYTHONUSERBASE

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

bin/python is a symlink on Linux, at least.

it's a copy on Arch, Ubuntu 16.04 and Debian Jessie (standard distro virtualenv). at 4mb even.

You don't have to use activate. afaik mach does not use it.

it activates it by itself (look at the monstrocity that mach_bootstrap.py is)

Anyway,
I'm gonna update the patch as soon as I find a fix for readlink -f on Mac, then vote it away.

@autrilla
Copy link
Contributor

autrilla commented May 7, 2016

@gdamjan I'm aware that some things are done suboptimally in mach_bootstrap. Some of them are due to the design and limitations of mach, and some of them could be improved. It's something I plan to work on.

It's not really relevant, but on arch: 7641656 lrwxrwxrwx 1 autrilla autrilla 9 May 7 17:30 python -> python2.7

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

It's not really relevant, but on arch: 7641656 lrwxrwxrwx 1 autrilla autrilla 9 May 7 17:30 python -> python2.7

and where is that python2.7?

@autrilla
Copy link
Contributor

autrilla commented May 7, 2016

@gdamjan it's on /usr/bin/.

5690796 -rwxr-xr-x 1 root root 6280 Mar 31 08:30 /usr/bin/python2.7

Notice the inodes are different.

My bad, you are right, python2 and python on the virtualenv just point to python2.7 in the virtualenv, which is indeed a copy.

@frewsxcv
Copy link
Member

frewsxcv commented May 7, 2016

Even this PR was motivated because a person had these problems on #servo irc channel.

Do you remember which issue with virtualenv they were encountering?

I asked this because I don't think there are any issues with virtualenv for Servo. From what I can tell, this pull request is trying to solve a problem that doesn't exist. The only legitimate issues surrounding virtualenv that I know of are related to handling of spaces and unicode characters in paths.

@gdamjan
Copy link
Author

gdamjan commented May 7, 2016

@frewsxcv I don't know what the source issue was, and I don't think the person even found out in the end.
from some of his findings he had several pythons laying around. The problem is, virtualenv is very very fragile in those situations.

@cbrewster
Copy link
Member

cbrewster commented May 7, 2016

I believe it was dherman on IRC that had that issue and he did find out he had an old version of pip laying around. Removing it ended up resolving his issue.

http://logs.glob.uno/?c=mozilla%23servo&s=6+May+2016&e=6+May+2016#c424332

@mbrubeck
Copy link
Contributor

mbrubeck commented May 7, 2016

The root cause of dherman's issue is that there was a pip-2.7 in his $PATH that was outdated, and mach_bootstrap.py has pip-2.7 first in its priority list, so it ran the outdated pip rather than the newer one.

@bors-servo
Copy link
Contributor

bors-servo commented May 7, 2016

The latest upstream changes (presumably #11067) made this pull request unmergeable. Please resolve the merge conflicts.

@jdm
Copy link
Member

jdm commented May 25, 2016

We need to make a decision here.

@larsbergstrom
Copy link
Contributor

larsbergstrom commented May 25, 2016

I believe that we are not ready to make this move at this time. We've only had two types of virtualenv issues:

  1. incorrect binaries on the path, before we even get to virtualenv stuff
  2. an aborted creation of virtualenv with {python-a} and then attempting to run and use that virtualenv with {python-b}

Given the additional analysis in this thread, I'm not convinced that this change would solve any issues that our developers are hitting today, and it risks hitting on new failure paths.

@gdamjan, thanks for opening this so that we could have the discussion! We may revisit this later as we continue to change our build tools stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

You can’t perform that action at this time.