New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate PEX interpreter selection based on target-level interpreter compatibility constraints #5160

Merged
merged 44 commits into from Dec 20, 2017

Conversation

Projects
None yet
5 participants
@CMLivingston
Copy link
Contributor

CMLivingston commented Dec 5, 2017

Problem

Pants is unaware of the new pexrc variable PEX_PYTHON_PATH and cannot make use of it during interpreter selection. PEX_PYTHON_PATH specifies a colon-separated list of paths to python interpreter binaries to use when building and executing a pex. As of PEX version 1.2.14, PEXs can accept interpreter constraints at build time to use for filtering against candidate interpreters specified in a PEX_PYTHON_PATH defined in a pexrc file.

Pants also has no way of passing interpreter compatibility constraints (as specified in a target definition) to a PEX for persistence in PEX-INFO metadata.

Solution

Add logic for adding the interpreter compatibility requirements of targets to the v2 Python backend tasks that handle building/running/testing PEXs, and make Pants interpreter selection tasks aware of PEX_PYTHON_PATH in pexrc files on disk.

Result

Users will now be able to define a list of python interpreters in a pexrc to build/run/test python targets against. The compatibility field in a target will be used to filter candidate interpreters for a lowest-versioned match. PEXs that result from these tasks will contain the compatibility constraints from their respective target definitions in their PEX-INFO metadata.

Chris Livingston added some commits Dec 5, 2017

@CMLivingston CMLivingston changed the title WIP: First pass at integrating PEX interpreter selection Integrate PEX interpreter selection based on target-level interpreter compatibility constraints Dec 5, 2017

Chris Livingston added some commits Dec 5, 2017

@stuhood stuhood requested review from benjyw and kwlzn Dec 5, 2017

# We filter the interpreter cache itself (and not just the interpreters we pull from it)
# because setting up some python versions (e.g., 3<=python<3.3) crashes, and this gives us
# an escape hatch.
filters = filters if any(filters) else self._python_setup.interpreter_constraints
if not any(filters) and not pex_python_path_interpreter_paths:

This comment has been minimized.

@UnrememberMe

UnrememberMe Dec 5, 2017

Contributor

what if filters set contains conflicting filter requirements? This can possibly be true when running pants test :: while having target level constraints.

':lib'
]
)

This comment has been minimized.

@UnrememberMe

UnrememberMe Dec 5, 2017

Contributor

suggest to add a CPython<3 target here and run pants test python_3_binary:

@kwlzn
Copy link
Member

kwlzn left a comment

LGTM w/ a handful of mostly nits.

some tests (both unit and integration) would also be a good idea AFAICT.

@@ -39,6 +40,22 @@ def _matching(cls, interpreters, filters):
if cls._matches(interpreter, filters):
yield interpreter

@property

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

this could probably be @memoized_property

:return: paths to interpreters as specified by PEX_PYTHON_PATH
:rtype: list
"""

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

pants docstring style standard should look like this (note the indentation):

"""A one line summary with punctuation.

Additional descriptions. Would go here. And might
wrap etc.

:return: A description of the thing it returns.
:rtype: list
"""
@@ -39,6 +40,22 @@ def _matching(cls, interpreters, filters):
if cls._matches(interpreter, filters):
yield interpreter

@property
def pex_python_path_list(self):

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

def pex_python_paths ?

# We filter the interpreter cache itself (and not just the interpreters we pull from it)
# because setting up some python versions (e.g., 3<=python<3.3) crashes, and this gives us
# an escape hatch.
filters = filters if any(filters) else self._python_setup.interpreter_constraints
if not any(filters) and not pex_python_path_interpreter_paths:

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

why the and not pex_python_path_interpreter_paths here? shouldn't the interpreter_constraints be applied unilaterally?

setup_paths = (paths
or pex_python_path_interpreter_paths
or self._python_setup.interpreter_search_paths

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

should probably include a note on the option for [python-setup] interpreter-search-paths that a pexrc config defining PPP will supersede it.

# Copyright 2017 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

# This BUILD file is intended to be an example of python 3 compatibile targets.

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

s/compatibile/compatible/

source='main.py',
compatibility=['CPython>3'],
dependencies=[
':lib'

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

indent

def say_something():
print('I am a python 3 library method.')
assert 1/2 == 0.5
return 1/2 # for testing the `./pants run` task

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

would be good to add a comment here noting that the assert 1/2 == 0.5 bit is a python3 specific behavior check for casual onlookers - or else use something more explicitly python3-y in this example, like asyncio.

v = sys.version_info
print(sys.executable)
print('%d.%d.%d' % v[0:3])
return say_something()

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

indent is off here and below



# A simple example to test a python 3 binary target
# Note that 1/2 = 0 in python 2 and 1/2 = 0.5 in python 3

This comment has been minimized.

@kwlzn

kwlzn Dec 5, 2017

Member

..or move this comment closer to the code.

@benjyw
Copy link
Contributor

benjyw left a comment

Thanks for this - very useful!

Just one thing to streamline, and a grammar nit.

@@ -64,7 +64,9 @@ def register_options(cls, register):
'If unspecified, a standard path under the workdir is used.')
register('--interpreter-search-paths', advanced=True, type=list, default=[],
metavar='<binary-paths>',
help='A list of paths to search for python interpreters.')
help='A list of paths to search for python interpreters. Note that if a PEX_PYTHON_PATH '
'variable is defined in a pexrc file, those interpreter paths will take precedence over '

This comment has been minimized.

@benjyw

benjyw Dec 6, 2017

Contributor

Is this the right precedence, and why? Should the paths be merged instead?

I don't have an opinion, just trying to understand the design choice.

This comment has been minimized.

@kwlzn

kwlzn Dec 7, 2017

Member

we use /etc/pexrc both at build-time and at (pex) run-time for a globally consistent view of "blessed" interpreters, so the original thinking was to align strictly to that or else fallback to pants' config.. but I could see merging being useful for testing purposes.

will leave it up to @CMLivingston to make the final decision.

This comment has been minimized.

@CMLivingston

CMLivingston Dec 7, 2017

Contributor

To me it makes the most sense to give it precedence over the setup.interpreter_constraints so we know we are using the same interpreter across all tasks and the runtime of the pex in play. I could see a situation where an interpreter is selected for subsequent tasks because it is a lower version than all interpreters in PPP, and it is possible that this could lead to unexpected errors (although tbh I don't know what they would be exactly because the interpreter product is used in multiple tasks).

For now I will go with giving PPP precedence and writing a log.warn if interpreter-search-paths is passed to let the user know that they have a pexrc somewhere. To me, if someone defines a pexrc, they are saying that they want created pexes to use this config and as of right now I don't see any reason why the Pants case should be an exception.

This comment has been minimized.

@benjyw

benjyw Dec 13, 2017

Contributor

SG

@@ -103,6 +103,10 @@ def _create_binary(self, binary_tgt, results_dir):

builder = PEXBuilder(path=tmpdir, interpreter=interpreter, pex_info=pex_info, copy=True)

# Add binary target's interpreter compatibility to pex info

This comment has been minimized.

@benjyw

benjyw Dec 6, 2017

Contributor

Nit: Period at end of sentence.

@@ -116,6 +120,10 @@ def _create_binary(self, binary_tgt, results_dir):
for tgt in binary_tgt.closure(exclude_scopes=Scopes.COMPILE):
if has_python_sources(tgt) or has_resources(tgt):
source_tgts.append(tgt)
# Add target's interpreter compatibility constraints to pex info

This comment has been minimized.

@benjyw

benjyw Dec 6, 2017

Contributor

Ditto, and throughout.

@@ -103,6 +103,10 @@ def _create_binary(self, binary_tgt, results_dir):

builder = PEXBuilder(path=tmpdir, interpreter=interpreter, pex_info=pex_info, copy=True)

# Add binary target's interpreter compatibility to pex info
for constraint in binary_tgt.compatibility:

This comment has been minimized.

@benjyw

benjyw Dec 6, 2017

Contributor

I think you can scrap this:

The closure iterated over a few lines down includes binary_tgt itself. And has_python_sources is true for PythonBinary targets. So as far as I can tell binary_tgt's constraints will be added there, and there's no need to do so here.

# Copyright 2017 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

# This BUILD file is intended to be an example of python 3 compatible targets.

This comment has been minimized.

@benjyw

benjyw Dec 6, 2017

Contributor

Neat!

setup_paths = (paths
or pex_python_paths

This comment has been minimized.

@kwlzn

kwlzn Dec 7, 2017

Member

would directly reference self.pex_python_paths here vs using the temporary pex_python_paths var once.

# We filter the interpreter cache itself (and not just the interpreters we pull from it)
# because setting up some python versions (e.g., 3<=python<3.3) crashes, and this gives us
# an escape hatch.
filters = filters if any(filters) else self._python_setup.interpreter_constraints
if not any(filters):
filters = self._python_setup.interpreter_constraints

This comment has been minimized.

@kwlzn

kwlzn Dec 7, 2017

Member

seems like there's a tension here between the notion of global interpreter constraints defined on PythonSetup and target level constraints as it pertains to interpreter setup - and I'd expect to see something connecting the dots at least at a validation level.

what happens now if the global interpreter constraints conflict with target level constraints (e.g. if global constraints only ensure a python2.7 interpreter in the interpreter cache, but then a target has constraints pinning to python3.6)? what should happen?

seems to me like currently, this might entirely bypass the interpreter cache which could lead to breakage down the road. a test around that case (and any required fixes) would be good.

# Licensed under the Apache License, Version 2.0 (see LICENSE).

from __future__ import (absolute_import, division, generators, nested_scopes, print_function,
unicode_literals, with_statement)

This comment has been minimized.

@kwlzn

kwlzn Dec 7, 2017

Member

from __future__ should no longer be needed with python3 code afaict.

return say_something()

if __name__ == '__main__':
main()

This comment has been minimized.

@kwlzn

kwlzn Dec 7, 2017

Member

indent

Chris Livingston added some commits Dec 7, 2017

Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
# Return reduce function for testing purposes.
# Note that reduce exists as a built-in in Python 2 and
# does not exist in Python 3
ret = reduce

This comment has been minimized.

@CMLivingston

CMLivingston Dec 11, 2017

Contributor

Using asserts to check for Python version had strange behavior when this is run under pytest so I switched it to reduce for more consistent testing.

os.remove(temp_pexrc)


def bootstrap_python_installer(location):

This comment has been minimized.

@CMLivingston

CMLivingston Dec 11, 2017

Contributor

This method and ensure_python_interpreter will be refactored in the pex library and removed from Pants in a follow-up PR.

This comment has been minimized.

@kwlzn

kwlzn Dec 14, 2017

Member

including a ticket in a comment here for that would be good.

Chris Livingston added some commits Dec 11, 2017

@kwlzn kwlzn requested a review from jsirois Dec 14, 2017

Chris Livingston added some commits Dec 14, 2017

@UnrememberMe

This comment has been minimized.

Copy link
Contributor

UnrememberMe commented Dec 15, 2017

LGTM. Thanks!

@jsirois
Copy link
Member

jsirois left a comment

1 bug noted - the test comment is just sensitivity to shard times which are large already.

install_location = os.path.join(bootstrap_python_installer(location), version)
if not os.path.exists(install_location):
os.environ['PYENV_ROOT'] = os.path.join(location, '.pyenv_test')
subprocess.call([os.path.join(location, '.pyenv_test/bin/pyenv'), 'install', version])

This comment has been minimized.

@jsirois

jsirois Dec 15, 2017

Member

Wow, this seems pretty heavyweight. Searching the PATH for, say, a 2.7.x interpreter and a 3.6.x interpreter, and then running tests with the actual found versions iff found would accomplish the same ends in CI where travis assures us both 2.7. and 3.6. This general tack is taken, for example, in https://github.com/pantsbuild/pants/blob/master/tests/python/pants_test/python/test_interpreter_selection_integration.py#L24-L40

:return: paths to interpreters as specified by PEX_PYTHON_PATH
:rtype: list
"""
ppp = Variables.from_rc().get('PEX_PYTHON_PATH', '')

This comment has been minimized.

@jsirois

jsirois Dec 15, 2017

Member

Maybe just .get('PEX_PYTHON_PATH').

@@ -51,6 +51,9 @@ def execute(self):
python_tgts = self.context.targets(lambda tgt: isinstance(tgt, PythonTarget))
fs = PythonInterpreterFingerprintStrategy()
with self.invalidated(python_tgts, fingerprint_strategy=fs) as invalidation_check:
if PythonSetup.global_instance().interpreter_search_paths and PythonInterpreterCache.pex_python_paths:

This comment has been minimized.

@jsirois

jsirois Dec 15, 2017

Member

PythonInterpreterCache.pex_python_paths is defined above as an instance method, here you access it as a class attribute. Need some fixup in one place or the other, but I expect you want to fix PythonInterpreterCache.pex_python_paths to be a memoized classmethod and call PythonInterpreterCache.pex_python_paths() here.

This comment has been minimized.

@CMLivingston

CMLivingston Dec 18, 2017

Contributor

Fixed it to be a class method call, however I tried memoizing with multiple decorators from pants.util.memo (memoized, memoized_method) in tandem with @classmethod and none were able to function properly in testing and manual runs. I would prefer to follow up in a separate PR to address this if the workaround is not clear here.

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

Still broken:

if PythonSetup.global_instance().interpreter_search_paths and PythonInterpreterCache.pex_python_paths:

Did you forget to push a diff?

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

FWIW, the secret sauce is the following ordering:

@classmethod
@memoized_method
def foo(cls):
  ...

A git grep -n -C1 "@memoized_method" | grep -C1 "@classmethod" will turn up a bunch of examples. I'm not hung up on this however for this PR.

Chris Livingston added some commits Dec 15, 2017

Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
Chris Livingston
@CMLivingston

This comment has been minimized.

Copy link
Contributor

CMLivingston commented Dec 18, 2017

@jsirois this is ready for another look when you are able to. Thanks for the assistance.

)
self.assert_success(pants_run_27)
# Interpreter selection for Python 2 is problematic in CI due to multiple virtualenvs in play.
if not os.getenv('CI'):

This comment has been minimized.

@CMLivingston

CMLivingston Dec 18, 2017

Contributor

I added this line to skip checking for the pexrc interpreter in stdout when run in CI. This test is green locally, and I have been spending a significant amount of time trying to debug why a different interpreter is selected in CI. This case is tricky because there are multiple virtuavenvs in play containing Python 2 (pants venv, travis virtualenv) and my best guess here is that some kind of symlinking or interpreter resolve manipulation is resulting in a different Python interpreter path printed to stdout (from sys.executable) on ./pants run. I think the printed interpreter is somehow symlinked or in some way underlying a travis virtualenv interpreter. Regardless, I think this test is sufficient for in CI as long as self.assert_success(pants_run_27) is successful because an incompatible Python interpreter would have thrown an error and caused the assertion to fail. The finer-grain test on line 80 should pass on local machines as I would expect a more consistent and standardized Python interpreter setup than in Travis. The fact that the Python 3 check on line 88 is successful in CI asserts that the interpreter selection mechanism of pex/interpreter_cache is functioning properly, and I suspect that this issue with Python 2 interpreters has to do with the aforementioned info.

Chris Livingston
@jsirois
Copy link
Member

jsirois left a comment

Some smaller nits to take or leave.

self.assert_success(pants_run_3)
# Protection for when the sys.executable path underlies a symlink pointing to 'python' without '3'
# at the end of the basename.
assert py3_path.split(py3)[0] in pants_run_3.stdout_data

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

Prefer unittest constructs, self.assertIn in this case (https://docs.python.org/2/library/unittest.html#unittest.TestCase.assertIn). Although not documented, there is consensus that using py.test assert magic mixed in with unittest (xunit) style testing is not desirable.

with temporary_dir() as interpreters_cache:
pants_ini_config = {'python-setup': {'interpreter_cache_dir': interpreters_cache}}
pants_run_27 = self.run_pants(
command=['clean-all', 'run', '{}:main_py2'.format(os.path.join(self.testproject, 'python_3_selection_testing'))],

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

Why is the clean-all needed? This is a red-flag and at least deserves a comment.

py_path = subprocess.check_output(['python%s' % version,
'-c',
'import sys; print(sys.executable)']).strip()
return py_path

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

Not sure is os.path.realpath'ing this would save gymnastics in the use-site, but an idea.

@@ -124,6 +124,20 @@ def has_python_version(cls, version):
except OSError:
return False

@classmethod
def python_interpreter_path(cls, version):

This comment has been minimized.

@jsirois

jsirois Dec 20, 2017

Member

The has_python_version method above should probably chain to this new method, returning python_interpreter_path(version) is not None.

@jsirois jsirois merged commit 3fc2616 into pantsbuild:master Dec 20, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

baroquebobcat added a commit to twitter/pants that referenced this pull request Dec 28, 2017

Integrate PEX interpreter selection based on target-level interpreter…
… compatibility constraints (pantsbuild#5160)

### Problem

Pants is unaware of the new pexrc variable `PEX_PYTHON_PATH` and cannot make use of it during interpreter selection. `PEX_PYTHON_PATH` specifies a colon-separated list of paths to python interpreter binaries to use when building and executing a pex. As of PEX version 1.2.14, PEXs can accept interpreter constraints at build time to use for filtering against candidate interpreters specified in a `PEX_PYTHON_PATH` defined in a pexrc file. 

Pants also has no way of passing interpreter compatibility constraints (as specified in a target definition) to a PEX for persistence in PEX-INFO metadata.

### Solution

Add logic for adding the interpreter compatibility requirements of targets to the v2 Python backend tasks that handle building/running/testing PEXs, and make Pants interpreter selection tasks aware of `PEX_PYTHON_PATH` in pexrc files on disk. 

### Result

Users will now be able to define a list of python interpreters in a pexrc to build/run/test python targets against. The compatibility field in a target will be used to filter candidate interpreters for a lowest-versioned match. PEXs that result from these tasks will contain the compatibility constraints from their respective target definitions in their PEX-INFO metadata.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment