Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python_dist doesn't safeguard against pex by-version caching, may provide stale dists when used #5449

Closed
CMLivingston opened this issue Feb 7, 2018 · 6 comments · Fixed by #6022
Assignees

Comments

@CMLivingston
Copy link
Contributor

As of #5141, Pants has a new target that allows users to define setup.py-based distributions. A new Pants task will run the user-provided setup.py and produce a wheel for consumption by other Python targets.

Currently, the pex resolver handles resolving these dists into the output pex, and caches them using the name of the wheel in the pex resolver cache. This can be problematic if two python_dist targets in the same project have the same name and version info in their respective setup.py's, leading to undefined behavior in the event that there is a collision in the pex resolver cache.

One solution is to write a setuptools plugin to hijack setup.py execution and inject a local version identifier into the wheel name within the context of Pants task execution, allowing the resolver to safely cache wheels because wheel names will contain their corresponding python_dist target's unique sha.

cc @kwlzn

@stuhood stuhood added the python label Feb 7, 2018
@kwlzn kwlzn changed the title setuptools plugin for injecting local version identifier into python_dist wheel names python_dist doesn't safeguard against pex by-version caching, may provide stale dists when used Feb 8, 2018
@kwlzn
Copy link
Member

kwlzn commented Feb 8, 2018

I started an initial scaffold project for the setuptools plugin last week and will bang on it some more this week time permitting - but this ticket should really be concerned with the python_dist-side consumption of that (if viable) to ensure cache safety. the plugin code itself is likely to live in a separate thirdparty dist due to the egregious monkey-patching required.

for the plugin approach I suggested, the API I'm thinking of would be something along the lines of:

$ EXTRA_VERSION_IDENTIFIER=xxx python setup.py bdist_wheel

where the value of EXTRA_VERSION_IDENTIFIER would be suffixed onto the version as local version identifier (package-1.0+xxx) or appended to an existing local version identifier if present (package-1.0+twtrxxx). fully commanding the version seemed like it'd be tricky, since we'd have to somehow parse the existing version first to do any suffixing given the calling context. let me know if this sounds like a reasonable interface to consume.

one other important aspect to point out here is that for this to work end to end, the setuptools plugin requirement would also need to be implicitly injectable into the running SetupPyRunner context (as opposed to being able to rely on setup_requires as most setuptools plugins do). seems like it that part might be slightly tricky.

@kwlzn
Copy link
Member

kwlzn commented Feb 8, 2018

the other alternative here of course would be to improve the pex resolver cache for this case if you want to look into that. that could potentially be simpler and a better net payoff - but I know we have other uses for the setuptools plugin in e.g. pants release land, so don't mind hacking up a plugin either way.

@CMLivingston
Copy link
Contributor Author

CMLivingston commented Feb 8, 2018

How would the pex resolver cache be changed to solve this problem? Seems like that route would require a special ttl /not caching at all based on specific criteria?

Regarding the API above, that looks sensible to me, however that approach would require modifying the InstallerBase class in the pex codebase to accept params for prepending to the install command. However, if there was any way we could add EXTRA_VERSION_IDENTIFIER after bdist_wheel, we could append that to the command for SetupPyRunner.

Another easy solution to integration with py_dist build task is to write the plugin such that it looks for an environment variable to append and if it is there, it does it, and if not, no-op. That way, we could simply set os.environ before the call to SetupPyRunner and pop that variable from the env afterwards.

@kwlzn
Copy link
Member

kwlzn commented Feb 9, 2018

to be clear: the EXTRA_VERSION_IDENTIFIER=xxx bit above is demonstrating the use of an environment variable in the shell and is not part of the command being executed. e.g.:

[omerta ~]$ env | grep XXX
[omerta ~]$ XXX=1 env | grep XXX
XXX=1

for the resolver cache, there are two sides there: build-time and run-time resolve caching.

it's not clear to me if the build time resolver cache is a factor here based on how dists are directly included in the pex vs resolved via file:// find-links repos. you could confirm this by testing whether or not the correct python_dist wheel always wound up in a pex/chroot with mods to the python_dist code run over run.

for the runtime resolver cache, since the dists being cached are local (one is in ~/.pex and one is in e.g. ./currently_executing.pex) it should be easy enough to fingerprint these somehow (ideally as cheaply as possible) and include the fingerprint as part of the cache key/cache metadata. I think both resolver modes key into the same resolver cache, so figuring out how to align those and still net the benefit of fingerprinting on the runtime side (and potentially for file:// find-links on the build-time side) would be cool. this approach would have the added benefit of applying forward to make all pex resolution more robust - python_dist or not - and thus would be a more general solution that needn't know specifics of anything to do with pants.

ultimately, both appending a local version identifier to the dist or improving the cache key for the pex resolver cache attack the same fundamental problem of keying into distinct cache paths - just from two different angles. I'm mostly convinced that the pex resolver path would be a better net win overall (and definitely more maintainable/less hacky).

before targeting any one solution tho, a good first step here would be to characterize the current bug
via repro and/or failing integration test and expand the description - afaik, this is all still theoretical?

@CMLivingston
Copy link
Contributor Author

CMLivingston commented Feb 9, 2018

I am clear on the shell env variable, I was saying that to do that approach, I think it would require changing the InstallerBase internals to prepend that to the cmdline in the SetupPyRunner context.

This is not theoretical; I bumped into this a few times testing py_dist where a bad pex was built and on [run], pants errors out saying no compatible dists could be found (“missing the following dependencies for target: blah==1.0 blah==2.0”), and when the same task re-runs with working dists, it still errors out, leading me to believe that this is a build time caching issue. It only crops up when two dists have the exact same name and version, like in our basic test cases. Furthermore, the cached wheel sits in ~/.cache/pants/..../my.whl, leading me to believe that this is likely pants caching and not pex resolver...maybe. I’m unsure because I would think that if it is the pex resolver, it would be in the ~/.pex directory. In any case, I will get a repro together to expose the bug.

@kwlzn
Copy link
Member

kwlzn commented Feb 9, 2018

I am clear on the shell env variable, I was saying that to do that approach, I think it would require changing the InstallerBase internals to prepend that to the cmdline in the SetupPyRunner context.

you would only prepend the env var to the commandline if you were calling it from the shell interactively as a user - the snippit I sent is just a way of documenting the CLI with a shell usage example. in the code, you'd just set the env var in a contextmanager before calling SetupPyRunner assuming no env scrubbing takes place there:

from pants.util.contextutil import environment_as
with environment_as(EXTRA_VERSION_IDENTIFIER='xxx'):
  SetupPyRunner(...)

This is not theoretical; I bumped into this a few times testing py_dist where a bad pex was built and on [run], pants errors out saying no compatible dists could be found (“missing the following dependencies for target: blah==1.0 blah==2.0”), and when the same task re-runs with working dists, it still errors out, leading me to believe that this is a build time caching issue. It only crops up when two dists have the exact same name and version, like in our basic test cases. Furthermore, the cached wheel sits in ~/.cache/pants/..../my.whl, leading me to believe that this is likely pants caching and not pex resolver...maybe. I’m unsure because I would think that if it is the pex resolver, it would be in the ~/.pex directory. In any case, I will get a repro together to expose the bug.

if it's happening under [run] then it could be caused by either the build or runtime resolve, afaict. either way, these sorts of details would be good to include when filing bugs - and the mystery/uncertainty here is precisely why starting from a repro is a great idea.

cosmicexplorer added a commit that referenced this issue Jul 17, 2018
…we don't use the first cached one (#6022)

### Problem

I made #5479 because the dist resolved for a `python_dist()` would not change after changing any of the source files, and this persisted even after a `clean-all`. This issue is more thoroughly explained in #5449 (which I literally didn't see before making this PR, but describes almost exactly what this PR does, it seems -- great minds think alike?). We were building the `python_dist()` targets each time they were invalidated, but caching the package locally in `~/.cache/pants/` after we built it the first time and using that instead, because the version string was the same for both the first and subsequent builds of the `python_dist()` target.

### Solution

- Move argv generation for setup.py invocations into `BuildLocalPythonDistributions`.
- Append the `python_dist()` target's fingerprint to the version string using `--tag-build`. This conforms to the "local" version specifier format as per [PEP 440](https://www.python.org/dev/peps/pep-0440/#local-version-identifiers). This tagged version is then used in the synthetic `PythonRequirement` stitched into the build graph for the local dist.
- Add an integration test `test_invalidation()` using `mock_buildroot()` (which fails on master, but not in this PR) where we run a binary depending on a `python_dist()`, rewrite the contents of a source file to change the output, then run pants again to verify that the new dist is resolved.

*Separately:*

- Made `PythonDistribution` subclass `PythonTarget` to avoid duplicated logic. It's not clear what the original reason for not subclassing it was at first, but it seems to be not be in effect anymore.
- Changed the `dir` field name for the `Manager` namedtuple in `mock_buildroot()` to be `new_buildroot` so it doesn't get highlighted as a builtin in certain text editors.

### Result

When we resolve local dists in any python requirements task, we get the dist corresponding to the target we just built. Resolves #5449.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants