Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rerun `select-interpreter` if global Python interpreter constraints have changed #7586

Merged

Conversation

Projects
None yet
5 participants
@Eric-Arellano
Copy link
Contributor

commented Apr 18, 2019

Problem

Currently we only check for changes to the file's compatibility entry in its BUILD file, but really we should be also checking global constraints.

As it stands now, changing --python-setup-interpreter-constraints will have no impact until running ./pants clean-all, which is not meant to be.

This will close #7510.

Solution

Use PythonSetup's compatibility_or_constraints in the fingerprint strategy, which first checks if the targets have compatibility marked in their BUILD entries, and then falls back to the global interpreter constraints.

Alternative solution: invalidate upon subsystem changes

Alternatively, we could add self.fingerprint to our file naming scheme, so that any invalidation to PythonSetup or to PythonInterpreterCache would also invalidate this task. See d35e80c#diff-3d23a0b5af958441c095d5acd551ac16R101 for this implementation.

We did not go down this route because it would result in more invalidation than necessary. This task is only ever impacted by changes to --python-setup-interpreter-constraints and compatibility entries.

Also revert auto clean-all

#7151 and #7522 were workarounds to this underlying issue.

Result

Calling PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS="['CPython==3.6.*']" ./pants test --cache-ignore tests/python/pants_test/util:strutil the first time will cause an invalidation, then the task will not run upon subsequent invocations.

Running PANTS_PYTHON_SETUP_INTERPRETER_CONSTRAINTS="['CPython==2.7.*']" ./pants test --cache-ignore tests/python/pants_test/util:strutil right after will properly cause an invalidation and result in Python 2.7 being used for the tests.

Eric-Arellano added some commits Apr 18, 2019

Initial fix for interpreter constraints invalidation
This is probably not going to be the final design, but this does solve the problem!
Add initial test, even though it's failing
I think the issue is with the test itself, that it's failing to properly update the global options.

Eric-Arellano added some commits Apr 18, 2019

Simplify test to focus only on invalidation, not which interpreter is…
… used

The unit test should only be doing one thing, and before it was doing multiple. This makes the test simpler to understand and isolates the behavior better.
Rely on PythonSetup subsystem invalidating itself, rather than the Ta…
…sk's fingerprint

A Task's fingerprint() function first checks the options for its subsystems that have fingerprint=True marked https://github.com/pantsbuild/pants/blob/7819724acb0f0e6fd83a3cd9ec6b5b72a9799721/src/python/pants/task/task.py#L310.

This is a much more idiomatic approach. A fingerprint strategy should only concern itself with targets, and not subsystems.

However, we can only choose the custom fingerprint strategy or this default fingerprint() function at the moment. This is not expected, and the next step is to fix that bug.
Restore support for custom fingerprint strategy
The issue was not only checking all_vts to determine if the interpreter needed to change. This does not work because some targets have no compatibility, so return None.

Instead, we need to track both the self.fingerprint and target_set_id.

@Eric-Arellano Eric-Arellano marked this pull request as ready for review Apr 18, 2019

@Eric-Arellano Eric-Arellano changed the title WIP: Clear the Python `select-interpreter` cache if global interpreter constraints have changed Clear the Python `select-interpreter` cache if global interpreter constraints have changed Apr 18, 2019

Use PythonSetup.compatibility_or_constraints()
This is a dedicated function to first check for compatibility and then to fallback on global constraints.

This is a much more focused fingerprint strategy that reduces unnecessary invalidation and involves less code changes.

Thanks John!
Show resolved Hide resolved src/python/pants/backend/python/subsystems/python_setup.py
@@ -65,15 +66,17 @@ def execute(self):
if not python_tgts_and_reqs:
return
python_tgts = [tgt for tgt in python_tgts_and_reqs if isinstance(tgt, PythonTarget)]
fs = PythonInterpreterFingerprintStrategy()
interpreter_cache = PythonInterpreterCache.global_instance()
fs = PythonInterpreterFingerprintStrategy(python_setup=interpreter_cache._python_setup)

This comment has been minimized.

Copy link
@Eric-Arellano

Eric-Arellano Apr 18, 2019

Author Contributor

I think this is right to refer to the interpreter cache's value of PythonSetup, rather than this file making its own explicit link to it.

But this is a private member, so not sure?

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 18, 2019

Contributor

If we wanted to do this, I would probably create a PythonInterpreterFingerprintStrategy in PythonInterpreterCache and allow for just:

    fs = PythonInterpreterCache.global_instance().fingerprint_strategy

or it could be a .get_fingerprint_strategy() method instead of a property.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 18, 2019

Contributor

I know we agreed that it would probably be best to have PythonInterpreterCache expose ._python_setup, but after seeing DependencyContext in the jvm backend, I'm thinking that it might be more idiomatic to have the PythonInterpreterCache subsystem produce the fingerprint strategy directly:

def create_fingerprint_strategy(self, classpath_products):
return ResolvedJarAwareFingerprintStrategy(classpath_products, self)

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 18, 2019

Member

This strategy is (currently) very specific to this task and to selecting a global interpreter for a context. In particular, it doesn't fingerprint anything else about the target: only its constraints. ResolvedJarAwareFingerprintStrategy falls back to the target fingerprint, so is a bit more general.

Should it be fingerprinting more of the target so that it could be used in more places? Yes, probably. For example, if there are linters that consume these constraints and which are fingerprinted, they should re-run when the constraints have changed... BUT, I think that those tasks are already dynamically selecting interpreters per-target, and thus aren't using this product.

All this to say: this feels like a good incremental step. Looking at the other Task.invalidated blocks for python tasks and seeing whether they're accounting for interpreters would be good. But I don't think it's a blocker here.

Show resolved Hide resolved src/python/pants/backend/python/tasks/select_interpreter.py
@benjyw

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Still not entirely clear to me why simply fingerprint=True on that option doesn't work.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

commented Apr 18, 2019

Still not entirely clear to me why simply fingerprint=True on that option doesn't work.

My impression is that this is because the invalidation from that option itself works, but the current interpreter file path finding doesn't use the task fingerprint in the path it generates relative to self.workdir, in:

return os.path.join(self.workdir, target_set_id, 'interpreter.binary')

Eric-Arellano added some commits Apr 18, 2019

Bump implementation version
We're fixing the caching issue, so should bump the version. Also we change the name `no_targets` to `no_constraints`.
Remove fingerprint=True from --interpreter-constraints
As explained in #7586 (comment), this has no impact on select_interpreter.py.

@Eric-Arellano Eric-Arellano changed the title Clear the Python `select-interpreter` cache if global interpreter constraints have changed Rerun `select-interpreter` if global Python interpreter constraints have changed Apr 18, 2019

@stuhood
Copy link
Member

left a comment

Thanks for diving in on this Eric!

Show resolved Hide resolved src/python/pants/backend/python/tasks/select_interpreter.py Outdated
@@ -44,7 +46,7 @@ class SelectInterpreter(Task):
def implementation_version(cls):
# TODO(John Sirois): Fixup this task to use VTS results_dirs. Right now version bumps aren't

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 18, 2019

Member

This TODO is relevant to this issue. But I think that it should be removed.

Because this task is unioning a bunch of different cache keys in order to come up with a single interpreter for all of them, it will not be able to actually use the "automatic caching" as described here: https://www.pantsbuild.org/dev_tasks.html#enabling-artifact-caching-for-tasks ... and it won't be able to use the results dirs. It's similar to an ivy or coursier resolve in that regard (...and probably to the pyprep tasks as well).

As an additional defense beyond the unioned target cache keys (which should be sufficient) it should probably write any data it produces under:

@memoized_property
def versioned_workdir(self):
"""The Task.workdir suffixed with a fingerprint of the Task implementation version.
When choosing whether to store values directly in `self.workdir` or below it in
the directory returned by this property, you should generally prefer this value.
:API: public
"""
versioned_workdir = os.path.join(self.workdir, self.implementation_version_slug())
safe_mkdir(versioned_workdir)
return versioned_workdir

...but that will only include the task fingerprint, so the unioned target keys are still necessary.

This comment has been minimized.

Copy link
@cosmicexplorer

cosmicexplorer Apr 18, 2019

Contributor

(that won't be the whole task fingerprint though, including the fingerprints of transitively dependent subsystems, just the implementation version -- is there a reason that's sufficient here, or would we need to modify self.versioned_workdir?)

This comment has been minimized.

Copy link
@stuhood

stuhood Apr 18, 2019

Member

There is some annoying redundancy in there, but I'm not sure it's worth tackling here.

The long multi-part path that is generated under .pants.d for "automatic caching" (linked above) is generated here:

def _results_dir_path(self, key, stable):
"""Return a results directory path for the given key.
:param key: A CacheKey to generate an id for.
:param stable: True to use a stable subdirectory, false to use a portion of the cache key to
generate a path unique to the key.
"""
# TODO: Shorten cache_key hashes in general?
return os.path.join(
self._results_dir_prefix,
key.id,
self._STABLE_DIR_NAME if stable else sha1(key.hash.encode('utf-8')).hexdigest()[:12]
)

...but that only actually kicks in when Task.create_target_dirs or Task.cache_target_dirs are set for automatic caching. It would be nice to move all of the task identity/fingerprinting stuff out of that codepath (self._results_dir_prefix in there) and into Task itself (to replace Task.workdir and Task.versioned_workdir)... but that would be good to do in a separate change. And I've waffled about touching it because I have hoped that we will be able to focus on v2 tasks more exclusively soon.

Use @Property instead of @memoized_property for global_instance()
global_instance() is already memoized efficiently. Simpler is better here.
@benjyw

benjyw approved these changes Apr 18, 2019

Revert "Remove fingerprint=True from --interpreter-constraints"
This reverts commit 0dc915f.

Was decided it's safer to add than not.

@Eric-Arellano Eric-Arellano merged commit e1d16a8 into pantsbuild:master Apr 19, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@Eric-Arellano Eric-Arellano deleted the Eric-Arellano:interpreter-constraints-cache branch Apr 19, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.