New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add python_app support #5704

Merged
merged 3 commits into from May 8, 2018

Conversation

Projects
None yet
6 participants
@traviscrawford
Copy link
Member

traviscrawford commented Apr 16, 2018

Pants supports the "app" concept - a binary + other loose files - but currently only has the jvm_app implementation. Here we add python_app which is given a python_binary and bundles to include in an archive. This enables users to easily create a pex along with other files needed in production and create a deployable archive.

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented Apr 16, 2018

A couple questions for the reviewer:

  • currently there is no App base class, and Bundle logic lives in the JVM backend. I looked around and did not see a clear place to put the App/Bundle bases classes - any recommendation?
  • Is updating the Legacy engine target adaptors still the right way of doing this? I know there's a lot of movement with the engine, so want to double-check updating the legacy code is right here.

If you like this general approach I can finish cleaning this up. My use-case I'd like to make a pex that includes Apache Airflow, plus our internal plugins, as well as our Airflow configs which are required to be loose files on the filesystem. I'd like to bundle this all up with pants to simplify deploys.

@traviscrawford traviscrawford force-pushed the traviscrawford:travis/python_app branch from fe7d2c9 to f04ff1b Apr 17, 2018

@cosmicexplorer

This comment has been minimized.

Copy link
Contributor

cosmicexplorer commented Apr 17, 2018

currently there is no App base class, and Bundle logic lives in the JVM backend. I looked around and did not see a clear place to put the App/Bundle bases classes - any recommendation?

If you were to ask me, my first suggestion might be in a new directory src/python/pants/backend/common or maybe .../util, as this functionality does seem very backend/-specific. The benefit of making a src/python/pants/backend/common/ dir might be that it could be structured the same as any specific backend, with targets/tasks/subsystems directories, but e.g. only with base classes and mixins (such as the App target introduced here).

Is updating the Legacy engine target adaptors still the right way of doing this? I know there's a lot of movement with the engine, so want to double-check updating the legacy code is right here.

A valid concern and I'm also not sure if there are intricacies I'm missing, but I think you're doing it correctly here (as in, I have done these exact changes when making new targets recently). I would want someone else with more experience with the v2 engine to chime in to be sure.

@traviscrawford traviscrawford force-pushed the traviscrawford:travis/python_app branch 3 times, most recently from ed3f982 to bccd755 Apr 17, 2018

@traviscrawford traviscrawford changed the title WIP: Add python_app support Add python_app support Apr 17, 2018

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented Apr 19, 2018

Here's an example use-case for this feature: https://github.com/traviscrawford/airflow-pex-example#python_app-example

Notice how this includes a PyPi package, along with some plugins in the PEX. Then, we also include the loose files that Airflow requires on the filesystem.

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented Apr 20, 2018

This is ready for review 🙏

@stuhood stuhood requested review from stuhood , jsirois and benjyw Apr 24, 2018

@stuhood
Copy link
Member

stuhood left a comment

Thanks! Sorry for the delayed review.

@@ -229,10 +50,6 @@ def __init__(self,
Class-Path set to those jars.
:param string archive: Create an archive of this type from the bundle.
"""
if archive and archive not in Archive.TYPE_NAMES:

This comment has been minimized.

@stuhood

stuhood Apr 24, 2018

Member

It looks like the archive, binary, basename, bundles, etc arguments should be lifted into the baseclass, as they are/should-be shared between subclasses. deployjar is I think a JVM-specific concept though, so can stay.

targets_to_bundle = self.context.targets(PythonApp.is_python_app)
dirutil.safe_mkdir(self._outdir)
for target in targets_to_bundle:
bundle_dir = os.path.join(self.workdir, target.id)

This comment has been minimized.

@stuhood

stuhood Apr 24, 2018

Member

Rather than doing this manually, would recommend setting the self.create_target_dirs property. Example:
https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/tasks/provide_tools_jar.py#L38-L41

Then you can use vt.results_dir as the unique directory for a target.

This comment has been minimized.

@benjyw

benjyw Apr 25, 2018

Contributor

I concur, but also, you're not invalidating here, so you have no vt at all... See the jvm backend's bundle_create.py for an example of how to only act on invalid targets (but still set up products and populate dist/ for all targets).

This comment has been minimized.

@traviscrawford

traviscrawford Apr 30, 2018

Member

Updated to use create_target_dirs.

Currently the logic is pretty similar, though there are slight differences between JVM/Python bundle logic. This is definitely an area for refactoring in the future. At this time I tried to be minimally invasive with the JVM bundle path.

for filename in filenames:
return os.path.join(basedir, filename)

def _symlink_bundles(self, app, bundle_dir):

This comment has been minimized.

@stuhood

stuhood Apr 24, 2018

Member

Can this be moved out as a static function somewhere? Or would it be useful to have a BundleTaskMixin to provide it?

@benjyw
Copy link
Contributor

benjyw left a comment

Great change, thanks for the contribution! Just a few comments.

@@ -0,0 +1,259 @@
# coding=utf-8

This comment has been minimized.

@benjyw

benjyw Apr 25, 2018

Contributor

I don't love pants.backend.common for this, as pants.backend is where backends go, not base classes useful to backends. Right now every dir under src/python/pants/backend represents a backend, and this would no longer be true with this change.

I really think this should live under pants.build_graph. We already have language-neutral target-y things there, like resources.py.

return combine_hashes(map(BundleField._hash_bundle, self))


class App(Target):

This comment has been minimized.

@benjyw

benjyw Apr 25, 2018

Contributor

How about AppBase, to emphasize that you're expected to subclass it.

The docstring should also be explicit about that. E.g., A base class for deployable application targets. or something.

return {'globs': globs}

@classmethod
def _binary_target_type(cls):

This comment has been minimized.

@benjyw

benjyw Apr 25, 2018

Contributor

This should probably be called binary_target_type, as the leading underscore strongly implies "you can change this without checking for uses outside this file", but subclasses must implement this, so that implication would be wrong in this case.

targets_to_bundle = self.context.targets(PythonApp.is_python_app)
dirutil.safe_mkdir(self._outdir)
for target in targets_to_bundle:
bundle_dir = os.path.join(self.workdir, target.id)

This comment has been minimized.

@benjyw

benjyw Apr 25, 2018

Contributor

I concur, but also, you're not invalidating here, so you have no vt at all... See the jvm backend's bundle_create.py for an example of how to only act on invalid targets (but still set up products and populate dist/ for all targets).

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented Apr 26, 2018

Thanks for the feedback. I’ll make the suggested changes.

@baroquebobcat

This comment has been minimized.

Copy link
Contributor

baroquebobcat commented Apr 26, 2018

Could you call out the creation of an App superclass in the title / summary somehow? I think that's an important part of what this patch is doing.

@traviscrawford traviscrawford force-pushed the traviscrawford:travis/python_app branch from bccd755 to 9a08d83 Apr 30, 2018

@traviscrawford traviscrawford changed the title Add python_app support Add python_app support and AppBase helper Apr 30, 2018

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented Apr 30, 2018

Since this change was getting pretty large, I extracted out some of the refactoring into #5772 as a no-op refactor-only change. After that's merged to master, I'll rebase and we can review adding python_app separately. Thought that would be easier to review, and limit the regression risk while moving a bunch of stuff around.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Apr 30, 2018

Give us a ping when you want us to take another look at this, after #5772 is in.

@traviscrawford traviscrawford force-pushed the traviscrawford:travis/python_app branch from 23fa1a4 to c4b3b68 May 1, 2018

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented May 1, 2018

This is rebased against master and ready for review.

@jsirois
Copy link
Member

jsirois left a comment

Thanks Travis!

@@ -34,6 +36,7 @@
def build_file_aliases():
return BuildFileAliases(
targets={
PythonApp.alias():PythonApp,

This comment has been minimized.

@jsirois

jsirois May 1, 2018

Member

s/:/: /

pex_archives = self.context.products.get(self._PEX_ARCHIVES)
for basedir, filenames in pex_archives.get(target.binary).items():
for filename in filenames:
return os.path.join(basedir, filename)

This comment has been minimized.

@jsirois

jsirois May 1, 2018

Member

Why is the 1st basedir + filename ok to pick here? If the product is in-practice singleton it might be good to assert and/or comment about this. Longer term, this should switch product type to something that naturally represents exactly 1 pex.

This comment has been minimized.

@traviscrawford

traviscrawford May 2, 2018

Member

AFAICT PythonBinary targets always have exactly one pex product, so I think this is safe. I'll add an assert and update the comments.

That said, long-term I would love to have a generic bundle task that can produce a deployable archive with multiple binaries of potentially different languages. We already have a use-case where we bundle a jvm_app, then (with an external script) add a pex to the archive before deploying.

This comment has been minimized.

@cosmicexplorer

cosmicexplorer May 2, 2018

Contributor

@traviscrawford do you think the deployable archive could have a language-agnostic entry point (...doesn't seem ridiculous), or do you mean the output would remain some existing archive format (as in, we only produce language-specific archives like pex, but we can nest in other bundles)?

This comment has been minimized.

@traviscrawford

traviscrawford May 4, 2018

Member

Imagine you want to create an archive.tar.gz that looks like:

./bin/foo.pex
./bin/bar.pex
./bin/baz.jar
./etc/foo.ini
./etc/bar.cfg
./etc/baz.xml

Where it has multiple binaries, multiple "extra files", etc. That feels like something Pants could bundle up for you, rather than having the script multiple invocations of Pants to make all the artifacts, and to archive things up at the end.

This "fancy bundling" is not included in this change, but directionally I think the additional flexibility would be really valuable for users.

class BundleMixin(object):

@staticmethod
def register_common_options(register):

This comment has been minimized.

@jsirois

jsirois May 1, 2018

Member

Why a custom method instead of Tasks register_options? This at least deserves a comment that mixees need to ensure it is called even if they don't have their own options to register.

This comment has been minimized.

@traviscrawford

traviscrawford May 2, 2018

Member

Will update with a comment - this seems better than copy/paste of the options to ensure all bundles have consistent options. This seems like an opportunity to refactor into BundleBase in the future, especially to simplify adding bundles for more languages, or a language-neutral bundle target.

This comment has been minimized.

@benjyw

benjyw May 2, 2018

Contributor

I think the question is, why is this a separately-named method, and not an override of the regular register_options. Since all register_options implementations must call super(...).register_options, subclasses can always add their own options too.

This comment has been minimized.

@traviscrawford

traviscrawford May 4, 2018

Member

Updated so BundleMixin now subclasses TaskBase like other mixin's do, so now setting options works like you would expect. In PythonBundle we no longer even have a register_options method because we get it from the mixin, and in BundleCreate we get the common options by calling super, and add a custom option.

bundle_archive_product.add(
vt.target, os.path.dirname(archive_path)).append(os.path.basename(archive_path))

if vt.target in self.context.target_roots: # Always publish bundle/archive in dist

This comment has been minimized.

@jsirois

jsirois May 1, 2018

Member

The bundling above happens for any PythonApp.is_python_app in the active BUILD graph, even interior nodes. The publishing though only acts on root nodes. This seems like it would be confusing from a user standpoint. Is this intended?

This comment has been minimized.

@jsirois

jsirois May 1, 2018

Member

This points out that a test would be great.

This comment has been minimized.

@traviscrawford

traviscrawford May 2, 2018

Member

Looking at https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/tasks/bundle_create.py we see:

        # For root targets, create symlink.
        if vt.target in self.context.target_roots:
          self._store_results(vt, bundle_dir, archive_path, app)

Which is the behavior I followed for Python bundles. My thinking is having the same logic will simplify bringing these codepaths even closer together over time. I think keeping this as-is would be best for users, but that's not a strongly held view if you'd like me to change it.

This comment has been minimized.

@stuhood

stuhood May 8, 2018

Member

Yea, the current behaviour is somewhat intentional. We create the product for all transitive bundles, but then only expose it in dist for the roots.

traviscrawford added some commits Apr 14, 2018

Add python_app support for creating bundles
In Pants, `*_app` targets are conceptually a binary plus "other files", such
as configuration files, data files, etc. that - when `bundle`'d - creates a
deployable archive (e.g.: zip, tar.gz). Here we add support for `python_app`
for deploying a `python_binary` plus some other files.
Update BundleMixin to subclass TaskBase
Here we update `BundleMixin` to subclass `TaskBase` so `register_options`
works as expected. That is, we always get the base options, and you can add
additional options if you call super, then add your own options.

@traviscrawford traviscrawford force-pushed the traviscrawford:travis/python_app branch from 518ff1b to 22e396a May 4, 2018

@traviscrawford traviscrawford changed the title Add python_app support and AppBase helper Add python_app support May 4, 2018

@traviscrawford

This comment has been minimized.

Copy link
Member

traviscrawford commented May 8, 2018

I believe all the review comments are addressed and this is ready for another look 🙏

@stuhood

stuhood approved these changes May 8, 2018

Copy link
Member

stuhood left a comment

Thanks a lot Travis!

bundle_archive_product.add(
vt.target, os.path.dirname(archive_path)).append(os.path.basename(archive_path))

if vt.target in self.context.target_roots: # Always publish bundle/archive in dist

This comment has been minimized.

@stuhood

stuhood May 8, 2018

Member

Yea, the current behaviour is somewhat intentional. We create the product for all transitive bundles, but then only expose it in dist for the roots.

@stuhood stuhood merged commit 50a2a45 into pantsbuild:master May 8, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented May 8, 2018

@traviscrawford : If you have time to follow up with doc edits for this, it would be awesome. But either way, thanks!

@traviscrawford traviscrawford deleted the traviscrawford:travis/python_app branch May 12, 2018

stuhood added a commit that referenced this pull request May 13, 2018

Document python_app target in Python readme. (#5816)
Support for `python_app` targets was added in #5704 - here we add documentation in the Python readme.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment