New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A convenient mechanism for fetching binary tools via subsystems #5443

Merged
merged 12 commits into from Feb 13, 2018

Conversation

Projects
None yet
4 participants
@benjyw
Copy link
Contributor

benjyw commented Feb 6, 2018

Registration creates an option to set the tool's version,
and provides a convenient mechanism for retrieving
the tool lazily as needed.

Provides a convenient migration path from existing tool
version options to this standardized one.

This is similar to what we already do for JVM tools.
It will allow us to optionally eagerly fetch all binary
tools, e.g., to build a warm Docker image, or to enable
working offline.

This change switches just one usage to the new mechanism.
The others will be switched in followup changes.

Switch binary tool consumption to a registration model.
Registration creates an option to set the tool's version,
and provides a convenient mechanism for retrieving
the tool lazily as needed.

Provides a convenient migration path from existing tool
version options to this standardized one.

This is similar to what we already do for JVM tools.
It will allow us to optionally eagerly fetch all binary
tools, e.g., to build a warm Docker image, or to enable
working offline.

This change switches just one usage to the new mechanism.
The others will be switched in followup changes.

@benjyw benjyw requested review from jsirois , illicitonion and stuhood Feb 6, 2018

@illicitonion
Copy link
Contributor

illicitonion left a comment

Very nice! Thanks!

Is there possibly a test file your forgot to add? reset_registered_tools references being used in tests, but isn't actually called anywhere...

@@ -0,0 +1,98 @@
# coding=utf-8
# Copyright 2015 Pants project contributors (see CONTRIBUTORS.md).

This comment has been minimized.

@illicitonion

illicitonion Feb 6, 2018

Contributor

nit: 2018

This comment has been minimized.

@benjyw

benjyw Feb 6, 2018

Contributor

D'oh!

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 6, 2018

reset_registered_tools, and its reference to tests, was copy-pasted from similar code in jvm_tool_mixin. Possibly I can get rid of it, but I'd rather do that after I switch all consumers over, and verify that their tests pass.

@illicitonion
Copy link
Contributor

illicitonion left a comment

Sounds good :)

"""Needed only for test isolation."""
BinaryToolMixin._binary_tools = {}

_binary_tools = {} # name -> BinaryTool objects.

This comment has been minimized.

@stuhood

stuhood Feb 6, 2018

Member

This bit of global mutable state is unfortunate, because it will make it more challenging to move to v2. JVM bootstrap is maybe not the best thing to emulate in that regard.

It seems like there is a relatively clear path to avoiding putting mutable state in Tasks/Subsystems though: what about require tool registrations to be declarative at the register.py level? Or could we perhaps require this mixin to exist on a registered Optionable (ie, either a Task/Subsystem)?

If this were an operation outside of the task model, the only question would be how to trigger it... a bare invoke of ./pants --bootstrap-tools would be a bit odd.

The downside of that would be that it would not be a native inhabitant of the "v1 task model" (or v2, for that matter), but it would seem that "static external binary tools" are a bit of a special case in terms of bootstrapping, regardless of implementation.

This comment has been minimized.

@benjyw

benjyw Feb 7, 2018

Contributor

I don't like the idea of doing this in register.py. That seems like a weird bit of out-of-band state to have to maintain. For example, if I write a base class that uses a binary tool then every author of every subclass has to know to do something special in register.py, because of an implementation detail of a base class that may change in the future. This really should be done in the Task code.

We can require this to be mixed into Task or Subsystem (and ~do, since it must be mixed in on something that has a get_options()). Not sure how that helps though? So at eager bootstrap time we iterate over all registered tasks and type-test them for the mixin? That seems feasible.

I can imagine a ./pants warmup or something that does the eager bootstrapping.

This comment has been minimized.

@stuhood

stuhood Feb 7, 2018

Member

So at eager bootstrap time we iterate over all registered tasks and type-test them for the mixin? That seems feasible.

Yea, that's basically what I was thinking.

It's just the mutable state inside the Subsystem that is problematic.

This comment has been minimized.

@benjyw

benjyw Feb 8, 2018

Contributor

We do need to do something with the map from logical name to BinaryTool instance somewhere in register_options, which is a classmethod. So this information has to live in the class, not on the task instance. But I think it's OK if it lives in the task class (rather than the BinaryToolMixin base class)? We can guarantee that this state is not mutated after register_options completes.

This comment has been minimized.

@benjyw

benjyw Feb 8, 2018

Contributor

@stuhood Have a look at the latest commit and see if that addresses your concerns?

@mjschmidt

This comment has been minimized.

Copy link

mjschmidt commented Feb 6, 2018

getting on this thread so I get email updates ( :

@stuhood

stuhood approved these changes Feb 9, 2018

Copy link
Member

stuhood left a comment

This is better, thanks... it still has the mutable state, but this seems maybe easier to make idempotent?

I guess my last question is just: is there a way to make this declarative/static, so that you don't need to do anything during the register method? Honestly, it feels like it would be possible to do for all options:

def my_options(cls):
  """Returns a list of `register` kwargs dicts representing the options for the Task. ... Or an actual `Option` class?"""
  return super(...).my_options() + [
      Option('--version', advanced=True, fingerprint=True, help=...),
    ]
@stuhood

This comment has been minimized.

Copy link
Member

stuhood commented Feb 9, 2018

Oh! Here's an entirely different suggestion (take it or leave it, but I think it's good!): what if you declared dependencies on binary tools by declaring a dependency on a scoped instance of "BinarySubsystem" (names, whatever):

def subsystem_dependencies(..):
  return super(..) + BinarySubsystem.scoped("protoc")

...and then added a way to get all registered instances of a particular Subsystem in order to accomplish the introspection you're trying to do? This has the added benefit of being much, much more declarative. And I love Subsystems (prefer composition over inheritance, etc!)

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 9, 2018

Declarative options may be a good idea, but you're not seriously proposing that I introduce them in this change, I hope... :) That would be a maaaaaassssive change, that would have to be done separately. I can't even begin to tease out the implications.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 9, 2018

But the scoped subsystem might work. Let me chew on that.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 10, 2018

OK, based on my preliminary sketch, your subsystem idea is extremely sound. It has several advantages, including the ability to subclass the generic binary tool subsystem if you need extra options for some specific binary (beyond --version). I will get that working and update this PR.

benjyw added some commits Feb 11, 2018

WIP
An easy way to generate subsystems representing binary tools.
The subsystem registers an option to set the tool's version,
and provides a convenient mechanism for retrieving the tool
lazily as needed.

Provides a convenient migration path from existing tool
version options to this standardized one.

This change switches just one usage to the new mechanism.
The others will be switched in followup changes.
@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 11, 2018

OK, new implementation is radically different, so PTAL. @stuhood, your subsystem idea was an excellent one!

Don't deprecate the old select_binary yet.
It causes an integration test to fail.

@benjyw benjyw changed the title Switch binary tool consumption to a registration model. A convenient mechanism for fetching binary tools via subsystems Feb 11, 2018

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 11, 2018

Best to view the entire diff rather than individual commits, as the implementation changed drastically.

benjyw added some commits Feb 12, 2018

return binary_util.select_binary(self.get_options().supportdir,
self.get_options().version,
'protoc')
return Protoc.global_instance().select(context=self.context)

This comment has been minimized.

@illicitonion

illicitonion Feb 12, 2018

Contributor

Is there a way we can better guarantee that the subsystem dependency is actually declared? (e.g. adding a subsystem_dependencies instance method which returns the instantiated versions of the classes returned by subsystem_dependencies, maybe as a dict<class,instance>)

I can't think of a nice way of doing this, and I suspect no way of trying would be more likely to offer the guarantee because there's nothing stopping you from poking at the global instance anyway, but if you have any ideas, I'd be interested...

This comment has been minimized.

@benjyw

benjyw Feb 12, 2018

Contributor

That's a general gotcha with subsystems, but not one I want to tackle here.

This comment has been minimized.

@stuhood

stuhood Feb 12, 2018

Member

It will be a thing that we can solve by supporting Subsystems natively in v2... will be one of the first things we need to do, in fact.

benjyw added some commits Feb 12, 2018

Get rid of the options_scope/name duplication.
We can always reintroduce it if we need to.
@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 12, 2018

@stuhood When you can, let me know what you think of this new implementation. I like it a lot better.

@stuhood
Copy link
Member

stuhood left a comment

Awesome. Thanks Benjy!

It doesn't look like scoping the Subsystem has a downside, so might be worth considering.

support_dir = 'bin/protobuf'
default_version = '2.4.1'

replaces_scope='gen.protoc'

This comment has been minimized.

@stuhood

stuhood Feb 13, 2018

Member

Can this be a literal deprecated_options_scope?

Also, the whitespace around = here is inconsistent.

This comment has been minimized.

@benjyw

benjyw Feb 13, 2018

Contributor

deprecated_options_scope means that all options can be read from the other scope. In the specific case of protoc that would work, because the old option was also named --version, but that need not be the case in general. Once I've migrated all our uses, if it turns out we can get away with deprecated_options_scope then I'll remove this machinery. Note that even if not, this is a just a temporary aide to migrating to this new scheme. We can delete it once that's achieved.

extra_version_option_kwargs = None

@classmethod
def register_options(cls, register):

This comment has been minimized.

@stuhood

stuhood Feb 13, 2018

Member

Should this call super?

This comment has been minimized.

@benjyw

benjyw Feb 13, 2018

Contributor

Good call.

This comment has been minimized.

@benjyw

benjyw Feb 13, 2018

Contributor

No pun intended...

old_opts = context.options.for_scope(self.replaces_scope)
if not old_opts.is_default(self.replaces_name):
version = old_opts.get(self.replaces_name)
return BinaryUtil.Factory.create().select(

This comment has been minimized.

@stuhood

stuhood Feb 13, 2018

Member

Can this portion be memoized by calling into a helper method for the selected version?

This comment has been minimized.

@benjyw

benjyw Feb 13, 2018

Contributor

Done, although I'm not sure there's much benefit, since almost no work is happening if a binary has already been fetched.

from pants.util.process_handler import subprocess


class ProtobufGen(SimpleCodegenTask):

@classmethod
def subsystem_dependencies(cls):
return super(ProtobufGen, cls).subsystem_dependencies() + (BinaryUtil.Factory,)
return super(ProtobufGen, cls).subsystem_dependencies() + (Protoc,)

This comment has been minimized.

@stuhood

stuhood Feb 13, 2018

Member

Can we think of a reason not to scope this under the current task? Might someone want a different protoc here than in another task?

The "effect all consumers" option would end up looking the same, afaik. But you'd also be able to affect one consumer:

# affect all consumers
--protoc-version=
# affect one consumer
--protoc-gen-protoc-version=

This comment has been minimized.

@benjyw

benjyw Feb 13, 2018

Contributor

No good reason not to, except that we tend to use global instances everywhere unless we know we shouldn't. But we can start breaking that habit here...

@stuhood
Copy link
Member

stuhood left a comment

Awesome. Thanks!

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 13, 2018

Thank you for the inspiration! Subsystems was the way to go.
We should get rid of the task/subsystem distinction and just have a bunch of Optionables floating around doing work...

Fix an indentation bug.
Damn you python.

@benjyw benjyw merged commit 7cdea9a into pantsbuild:master Feb 13, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@benjyw benjyw deleted the benjyw:register_binary_utils branch Feb 13, 2018

@mjschmidt

This comment has been minimized.

Copy link

mjschmidt commented Feb 13, 2018

Will this solve issue #5443, very much a pants newbie and still learning about the tool.

@benjyw

This comment has been minimized.

Copy link
Contributor

benjyw commented Feb 13, 2018

@mjschmidt your issue link points back to this PR...

@mjschmidt

This comment has been minimized.

Copy link

mjschmidt commented Feb 13, 2018

lol... #5253 this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment