New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add the target fingerprint to the version of each local dist so that we don't use the first cached one #6022

Merged
merged 24 commits into from Jul 17, 2018

Conversation

Projects
None yet
6 participants
@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jun 26, 2018

Problem

I made #5479 because the dist resolved for a python_dist() would not change after changing any of the source files, and this persisted even after a clean-all. This issue is more thoroughly explained in #5449 (which I literally didn't see before making this PR, but describes almost exactly what this PR does, it seems -- great minds think alike?). We were building the python_dist() targets each time they were invalidated, but caching the package locally in ~/.cache/pants/ after we built it the first time and using that instead, because the version string was the same for both the first and subsequent builds of the python_dist() target.

Solution

  • Move argv generation for setup.py invocations into BuildLocalPythonDistributions.
  • Append the python_dist() target's fingerprint to the version string using --tag-build. This conforms to the "local" version specifier format as per PEP 440. This tagged version is then used in the synthetic PythonRequirement stitched into the build graph for the local dist.
  • Add an integration test test_invalidation() using mock_buildroot() (which fails on master, but not in this PR) where we run a binary depending on a python_dist(), rewrite the contents of a source file to change the output, then run pants again to verify that the new dist is resolved.

Separately:

  • Made PythonDistribution subclass PythonTarget to avoid duplicated logic. It's not clear what the original reason for not subclassing it was at first, but it seems to be not be in effect anymore.
  • Changed the dir field name for the Manager namedtuple in mock_buildroot() to be new_buildroot so it doesn't get highlighted as a builtin in certain text editors.

Result

When we resolve local dists in any python requirements task, we get the dist corresponding to the target we just built. Resolves #5449.

@cosmicexplorer cosmicexplorer requested review from CMLivingston , benjyw and kwlzn Jun 26, 2018

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

Added a few comments.

This is a good starting point but this problem should really be solved at the pex resolver cache level by something along the lines of my PR at pantsbuild/pex#453.

I’d be interested in @kwlzn’s take on the refactoring of the python_dist target to inherit from PythonTarget. There was a reason we avoided this but I cannot recall at the moment.

Overall, looks good, just wondering whether this is the right thing to be doing/the right UX change we want to introduce.

from setuptools import setup, find_packages
from distutils.core import Extension


c_module = Extension(str('c_greet'), sources=[str('c_greet.c')])
cpp_module = Extension(str('cpp_greet'), sources=[str('cpp_greet.cpp')])

public_version = '1.0.0'
local_version = os.getenv('_SETUP_PY_LOCAL_VERSION')

This comment has been minimized.

@CMLivingston

CMLivingston Jun 26, 2018

Contributor

This places responsibility of this existence on the author, so we would need to document this or throw a warning if it is not present.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 26, 2018

Contributor

It would probably be super easy to just check the version when we make the local wheels product and make sure it has a PEP 440 "local" identifier at the end -- then, we allow the possibility of using setup.py projects generated in a different way if we ever need to, but still avoid the name clashing issue!

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 26, 2018

Contributor

Also, as was very effectively and thoroughly discussed in #5449, and as I have thought about before, there are many more ways to potentially make this part of the setup.py more ergonomic/natural (including making temporary files that the setup.py could import, for example?). I think the env var is a simple and effective answer right now but I'm 100% ready to go deeper into setuptools as need be.

This comment has been minimized.

@jsirois

jsirois Jun 26, 2018

Member

TIL the egg_info setup command. Maybe this helps?

$ ./pants setup-py --run='egg_info -b+bob sdist bdist bdist_wheel' src/python/pants:pants-packaged
...
$ ls -1 dist/pantsbuild.pants-1.9.0.dev0/dist/
pantsbuild.pants-1.9.0.dev0+bob.linux-x86_64.tar.gz
pantsbuild.pants-1.9.0.dev0+bob-py2-none-any.whl
pantsbuild.pants-1.9.0.dev0+bob.tar.gz

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 28, 2018

Contributor

@jsirois I didn't see that comment until now -- thanks! It would be utterly fantastic to be able to avoid making any changes to the setup.py to support this. Will look into using that now.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 28, 2018

Contributor

Of course it works -- added in 1fd8db1 so we can do this without any changes to setup.py at all! I should have probably expected this to exist and looked beforehand, so thanks again for mentioning it!

"""
if not 'setup.py' in sources:

This comment has been minimized.

@CMLivingston

CMLivingston Jun 26, 2018

Contributor

Why strip these out? Are you planning on adding setup.py checking back or is it happening elsewhere?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 26, 2018

Contributor

This should be the exact same as the previous target, just without unnecessarily repeated code, which I think should be a goal. It doesn't seem like too gross of a refactoring because it's limited to a single file. We call the superclass constructor (PythonTarget), which already does all of the checking that was removed from this file.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 26, 2018

Contributor

I think I remember vaguely that we don't subclass PythonTarget due to some method in pex_build_util.py using PythonTarget to mean something that we don't want to include python_dist() in. I don't know if that is still the case with the changes to that file. Will look into and leave a comment if this is ok or revert the change / do something else if not.

This comment has been minimized.

@CMLivingston

CMLivingston Jun 26, 2018

Contributor

Agreed on the redundancy, but I think that PythonTarget will not check for setup.py presence as of right now, so it would be good to add that back in to this one.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jun 28, 2018

Contributor

I don't understand -- we are still checking if setup.py is in sources, just doing it before we initialize anything else. I can move it back to after the super constructor call -- it's not clear which ordering would be desired here, I just usually try to do argument validation first.

This comment has been minimized.

@CMLivingston

CMLivingston Jul 17, 2018

Contributor

Hmm, weird, I must have read this wrong. This looks good to me regarding setup.py checking.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jun 26, 2018

That's what everyone said on #5479, which is why I closed it, but I don't think it's correct, and I don't think this particular issue is the same as the one resolved in pantsbuild/pex#453. I actually wrote another pex PR based on pantsbuild/pex#453 first, but after spending several hours inserting debug statements into various places, realized that does not solve the issue. EDIT: to clarify, using pantsbuild/pex#453 will end up ensuring the python requirement is updated after a clean-all and a clean build (which does not happen on master currently), but this PR will ensure that local dists are updated when invalidated, as expected (and they are already being built whenever invalidated, just not selected as the python requirement satisfying fasthello==1.0.0 in targets that depend on the local dist).

If you build the python_dist() into a wheel without somehow mangling the version, pants will helpfully place it in ~/.cache/pants, and then it will prefer that version forever (over anything in .pants.d/pyprep/build-local-dists until you delete it by hand. I can't see a way to avoid this besides modifying the version string as we do here, because if we don't, we will constantly be requesting e.g. fasthello==1.0.0 to try to get the local dist. I think this is not what we want for local dists, because it will ping pypi, and name collision with a remote package is not something I want to worry about when naming my packages -- we specifically want "the version of the package fasthello that I just built", and it seems like that should be encoded in our requirement strings with explicit fingerprints, in terms of correctness. It so happens that this version string mangling can be done idiomatically with PEP 440. There may be a way to e.g. delete the cached requirement that pex ends up preferring, but that sounds like a hack and I think this PR is the right solution for this particular problem.

@cosmicexplorer cosmicexplorer changed the title add the target fingerprint to the version of local dists so that python_dist() is usable add the target fingerprint to the version of each local dist so that we don't use the first cached one Jun 26, 2018

@CMLivingston

This comment has been minimized.

Copy link
Contributor

CMLivingston commented Jun 26, 2018

pantsbuild/pex#453 should fix this issue though because of what it is doing regarding the cache key strategy. Because that PR gets the size of the wheel and incorporates it into a cache key, when Pants uses the the pex resolver it will not find it in the cache (different code, same version), crawl the repo attached to the PythonRequirement object (the .pants.d location for build-local-dists), and use that one. The cache could also be made more robust by using actual hashing of the wheel contents. Afaict, invalidation is taken care of by Pants detecting changed sources, so wouldn’t pex/#453 solve the problem?

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jun 27, 2018

I have a PR that I can put up that hashes the wheel contents (a two line change from pantsbuild/pex#453), which I haven't yet because I wanted to get this in first. If you attempt to use pants with pantsbuild/pex#453, you will find that it does not solve the problem. This is why I decided to make this PR in the first place, because I branched off of pantsbuild/pex#453 and found that it did not solve the problem. If you use pantsbuild/pex#453, it will stop using the first built version of the python_dist() after a clean-all, but not before then, like it should be, for the reasons described above.

You can test a development version of pex with pants by deleting the pex line from 3rdparty/python/requirements.txt, adding the equivalent version of pex as a python requirement to 3rdparty/python/BUILD, then adding the line pip install -e /path/to/your/pex/checkout at this line in pants_venv -- you will be using your development version of pex, and can view the results for yourself.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jun 28, 2018

I'm going to implement @jsirois's suggestion to use the egg_info command and figure out how to make the contextmanager I added to PantsRunIntegrationTest that rewrites file contents work (it's currently failing in CI), so you can hold off on review for now.

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-pex-pydist-caching branch 2 times, most recently from 1fd8db1 to 298ea5e Jun 28, 2018

@stuhood

stuhood approved these changes Jul 2, 2018

Copy link
Member

stuhood left a comment

Looks great, thanks.

I suspect that Chris's comments on the python_dist arguments were a misunderstanding, so you're good to land this.

interpreter compatibility for this target, using the Requirement-style
format, e.g. ``'CPython>=3', or just ['>=2.7','<3']`` for requirements
agnostic to interpreter class.
:type sources: :class:`twitter.common.dirutil.Fileset` or list of strings. Must include

This comment has been minimized.

@stuhood

stuhood Jul 2, 2018

Member

Given that this changes the behaviour of sources, you might want to merge from master and re-test post #5908

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-pex-pydist-caching branch from 298ea5e to aaa9cbf Jul 2, 2018

@cosmicexplorer cosmicexplorer added this to the 1.9.x milestone milestone Jul 3, 2018

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-pex-pydist-caching branch from aaa9cbf to 14e2f0f Jul 3, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 3, 2018

So the OSX shard continues to fail in CI, although I've cleared caches twice. I can't seem to repro this locally and suspect that (as was mentioned earlier) it may require pantsbuild/pex#453 (or a modified version of it). That should be done anyway, so I will put up a new pex PR to take that over from @CMLivingston (we discussed this previously) and come back to this one.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 15, 2018

This was rebased against master but otherwise unchanged from the last time -- #6104 fixed the issue I was banging my head against the wall about. There was a lint error which I just fixed, which was because I removed the use of str from python_distribution.py. One of the unit test shards timed out, which is weird but seems unrelated, so crossing fingers for green this time.

setup(
name='ctypes_test',
version='0.0.1',
version='1.0.0',

This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

1.0.0 typically means stable, so this may not be yet? (your call)
A good reference would be https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/project_info/tasks/export.py#L51-L63

so maybe 0.1.0 if there is backward incompatibility?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Hm, I hadn't thought about the significance of this at all, mostly changed it to mimic the other python_dist() targets. I don't think it's stable yet, I was planning to switch the target names from ctypes_compatible_cpp_library() to just cpp_library() at that time as well. I'll revert this change.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Thanks a lot for the reference!

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

There is backwards incompatibility, but I don't think anyone would be relying on the broken behavior this PR fixes, except accidentally, so I think 0.0.1 is probably the right move here.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Reverted the version change in f88e427!

@@ -53,9 +56,25 @@ def test_pants_run(self):
# Check that text was properly printed to stdout.
self._assert_native_greeting(pants_run.stdout_data)

@staticmethod
def rewrite_hello_c_source(orig_content):
return re.sub('"Hello from C!"', '"Hello from C?"', orig_content)

This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

Since the usage of this function is fairly contained in the particular test below, I'd recommend moving it into the test itself. @staticmethod or not isn't important.

This comment has been minimized.

@cosmicexplorer
self.assertIn('Hello from C!\n', unmodified_pants_run.stdout_data)

c_source_file = '{}/c_greet.c'.format(self.fasthello_install_requires)
with self.rewrite_file_content(c_source_file, self.rewrite_hello_c_source):

This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

non-blocking: since you are modifying the content of the repo, it might create surprises if later tests happen to run against the touched content. It's best to create a target on the test run, but if it's hard to do, it would be good to restore it.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Yes, and I've had the file stuck in the intermediate "rewritten" state at least once while testing -- I'm not sure what caused that since it is wrapped in a try but I'll see if I can change the implementation to make a target for the test (and make sure that that implementation fails on current master so it's still a valid test). I have a hunch this won't be that big of a change, and for the reasons you mentioned I would prefer not to leave the rewrite_file_content method in the integration test base class.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Ended up using mock_buildroot() from PantsRunIntegrationTest, which is much more clear anyway -- see the new test here. Thanks for pointing this out!

@@ -8,6 +8,7 @@
import c_greet
import cpp_greet


This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

can leave this file untouched.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Fixed in a7d2f23!

@@ -23,6 +23,7 @@ def get_generated_shared_lib(lib_name):
asdf_c_lib = ctypes.CDLL(asdf_c_lib_path)
asdf_cpp_lib = ctypes.CDLL(asdf_cpp_lib_path)


This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

can leave this file untouched.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Fixed in a7d2f23!

@@ -4,5 +4,6 @@

from __future__ import absolute_import, division, print_function, unicode_literals


This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

can leave this file untouched.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Fixed in a7d2f23!

@@ -7,6 +7,7 @@
import c_greet
import cpp_greet


This comment has been minimized.

@wisechengyi

wisechengyi Jul 15, 2018

Contributor

can leave this file untouched.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Fixed in a7d2f23!

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-pex-pydist-caching branch from 4f8e9dc to a7d2f23 Jul 15, 2018

@jsirois
Copy link
Member

jsirois left a comment

The PR description could use a major rewrite at this point.


setup_py_command = (
egg_info_snapshot_tag_args + bdist_whl_args + platform_args + dist_dir_args)
return cls(source_dir, setup_py_command, **kw)

This comment has been minimized.

@jsirois

jsirois Jul 15, 2018

Member

This all seems very specific to your use case. Any reason not to just move all this logic there and call the constructor directly?

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

You're totally right. Will do that now.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Moved to _generate_snapshot_bdist_wheel_argv() of BuildLocalPythonDistributions in 2aa1a5f!

@@ -115,6 +117,26 @@ def _retrieve_single_product_at_target_base(self, product_mapping, target):
single_product = all_products[0]
return single_product

# TODO: This snapshot version tagging is only performed for python_dist() targets, but the
# setup.py invocation command line is generated in SetupPyRunner in setup_py.py. Could this

This comment has been minimized.

@jsirois

jsirois Jul 15, 2018

Member

And I think we agree on the problem I mentioned above!

This comment has been minimized.

@cosmicexplorer

cosmicexplorer Jul 15, 2018

Contributor

Indeed! Made the change in 2aa1a5f -- also moved some comments here into docstrings.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 15, 2018

Rewrote the OP to reflect what's currently happening in this PR, and also to be nicer (I think I had a wild headache, still not an excuse -- I'm sorry about that and will make an effort to recognize it in the future). I may have made a mistake when moving setup.py argv generation into BuildLocalPythonDistributions (but maybe not), so will watch CI until it's green.

I'm VERY pumped that this is finally close to being mergeable.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 15, 2018

The osx shard is failing again, but with a different error:

fatal: No annotated tags can describe '2e40fec1211ec356da0f0f0d4b99714fda8ef8db'.
                     E   	However, there were unannotated tags: try --tags.
                     E   	Failed to execute PEX file, missing macosx_10_12_x86_64-cp-27-cp27m compatible dependencies for:
                     E   	fasthello-test

I can investigate this -- the error message is more revealing this time. There is another failure on Linux that I can repro locally (edit: fixed in 09442b8).

The other failures appear to be spurious timeouts (ECONNRESET). @wisechengyi I added back two of the whitespace modifications (adding another newline before a def) in 2ef7bf7 because they were failing the lint shard -- the rest will stay.

@jsirois

This comment has been minimized.

Copy link
Member

jsirois commented Jul 16, 2018

@cosmicexplorer the most useful debug is the edit .travis.yml to 1 shard or better ./pants ... -- -vsk<failing test> as the .travis.yml script. Make sure the underlying failing run is using PEX_VERBOSE=9 and capture the full log.

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 16, 2018

Will do!

@cosmicexplorer cosmicexplorer force-pushed the cosmicexplorer:fix-pex-pydist-caching branch from f5f5b9d to 5f108db Jul 17, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 17, 2018

Rebased after #6141 landed and running through CI again. I had forgotten the point about how to identify the use of the apple-provided python interpreter and why that was a concern, so thank you for elaborating on that. We shall be able to see hopefully that #6141 makes things work as suspected -- will dig in if not.

cosmicexplorer added some commits Jun 25, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Jul 17, 2018

TRAVIS IS GREEN (without me having to do anything at all)!!! @CMLivingston @kwlzn if you could take another look at this that'd be neat -- I don't think there is anything of concern I missed but since this went through several iterations since last time (in response to changing conditions, namely the resolution errors getting fixed elsewhere in pants) I'd feel more comfortable if you could take a quick look again. Thanks.

I also want to specifically shout out @jsirois and @kwlzn for responding to a lot of questions I had on this PR in this thread and also on slack earlier when I didn't really know what was happening at all and was mega frustrated -- I really appreciated that.

@CMLivingston
Copy link
Contributor

CMLivingston left a comment

LGTM!

@cosmicexplorer cosmicexplorer merged commit dfd7f73 into pantsbuild:master Jul 17, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment