Controllable pre/post task arguments #135

Closed
bitprophet opened this Issue Apr 17, 2014 · 30 comments

Projects

None yet

5 participants

@bitprophet
Member

I.e.:

@task
def pretask():
    pass

@task(pre='pretask')
def realtask(arg):
    pass

This will fail because at the bottom of Executor.execute() we build a task list and then give all of them the CLI arguments. I.e. pre/post tasks are given the same input as the "main" task.

The API we would need to fix this is probably:

@task
def pretask():
    pass

@task(pre={'pretask': []})
def realtask(arg):
    pass

Or, with a pretask that has an interesting signature:

@task
def pretask(myarg):
    pass

@task(pre={'pretask': ['foo']})
def realtask(arg):
    pass

Would be equivalent to calling pretask('foo') prior to calling realtask().

Allowing a dict for keyword args would also make sense; however the full implementation would want both an args list and a kwargs dict, which would then both need wrapping in a tuple or list. That gets pretty fugly pretty fast:

@task(pre={'pretask': ([], {'kwarg': 'value'}})

Another possibility (this is actually what nate-pycon on IRC originally thought of) is to reuse the parser and say e.g.:

@task(pre=['pretask --kwarg=value'])

However I think even the most complex pure-Python version above is preferable, because:

  • it's not sufficient to be the ONLY way of doing this, because forcing a serialize/deserialize step when pure Python is much faster/easier;
  • if we make it just another alternative, that's almost definitely crossing my "too many ways to do it -> confusing and overly complex code" threshold
@bitprophet
Member

See also #120 and #123 though they're only related in that we might as well fix them when we're in this part of the code.

@bitprophet
Member

Should also make sure that the default behavior (i.e. @task(pre=['justthetaskname']) or @task('pretaskhere') calls the subtask with zero arguments.

@patcon patcon referenced this issue in gratipay/gratipay.com Apr 17, 2014
Open

Convert scripts into invoke tasks #2294

1 of 4 tasks complete
@xarg
xarg commented Apr 20, 2014

There is also a possibility to avoid using any args, kwargs in the decorator (I would actually prefer this one).

@task
def pretask(myarg):
    pass

@task('pretask')
def realtask(arg):
    pass

In order to pass myarg it should be supplied at the run time. Like so:

inv realtask --arg 123 --myarg 456

If you really need to execute a task from within another task with arguments, then it would be better to just call that task directly:

@task
def realtask(arg):
    execute('pretask', 123, 'abc', kwarg1=345, kwarg2='hello'))
    ...
    execute('posttask', 123)

I also find the above preferable than the form below:

@task(pre={'pretask': ([123, 'abc'], {'kwarg1': 345, 'kwarg2': 'hello'})}, post={'posttask': ([123])})
 def realtask(arg):
@patcon
patcon commented Apr 20, 2014

Forgive me the silly question, but why use execute() over simply running the function directly?

@xarg
xarg commented Apr 20, 2014

@patcon execute would preserve the context in which the task is run. However this function doesn't exist in invoke. I took inspiration from fabric's execute: http://docs.fabfile.org/en/1.8/api/core/tasks.html?highlight=execute#fabric.tasks.execute

@patcon
patcon commented Apr 20, 2014

Ah gotcha. Haven't gotten into contexts yet, but that makes sense :)

@matleh
matleh commented May 8, 2014

Is there another benefit besides deduplication of pre-tasks to using

@task('pretask')
def realtask():
    pass

as opposed to

def realtask():
    pretask()

?
Because, if that is the only benefit, wouldn't it be possible to have a run_task method on the Context which only runs the given task if it has not been run during that invocation? That would render the pre parameter unnecessary and also make the parameter passing between tasks and their pre-tasks explicit.

Any of the suggested solutions break as soon as a pre-task depends on one of the params of the real-task....

@bitprophet
Member

@humanfromearth Some thoughts:

  • Yes, executing (either directly or via some analogue to execute()) pre/post tasks "manually" within the task body can work in simple cases - totally true.
  • However, there are use cases where users will want the task body executed >1 time, while the pre/post tasks only run 1 time - that use case doesn't work well with the above.
    • Counterpoint: continue using execute() as is done with Fabric 1, with a top level task running once, and internally calling pre tasks, another "real" main task, and then post tasks.
    • However, it still feels like something the core API should be able to handle, if it can be done in a non crazy fashion.

Unrelated thought I had while rereading my original description: we do not need to do the complex args-and-kwargs version of the API, because all "args" can always be called as "kwargs" instead (there's no such thing as a truly positional-only argument in Python.)

So we could possibly implement that API as just ("just"...) two levels of dicts:

@task(pre={'othertask': {'somearg': 'someval'}})
def maintask(): pass

I think that this level of complexity is also a good argument for allowing additional decorators besides @task as it would allow making this less complex/ugly. E.g.:

@pre('othertask')
@task
def maintask(): pass

@pre('othertask', arg='val') # equiv to othertask(arg='val')
@task
def maintask2(): pass

Food for thought.

@bitprophet
Member

@matleh See my above response, it addresses your question as well to some degree (re: why not call pre-tasks within task bodies.)

@bitprophet
Member

I am working on this now, FTR.

@bitprophet bitprophet added a commit that referenced this issue May 8, 2014
@bitprophet bitprophet Failing test proving re #135 e46b366
@bitprophet
Member

Refactored somewhat and at least fixed it so pretasks are given no args by default.

Kicking around the ideas above about the best api for this: https://gist.github.com/bitprophet/16a273d07772207e6f9d

@bitprophet
Member

So far people on IRC seem to be leaning towards option 2, i.e. @pre, which is also what I had been leaning towards.

@shazow also reminded me that we don't need a decorator to get call semantics, i.e. this is possible:

@task(pre=[call(positional, kw=arg), ...])

Will need to think more & get more feedback.

@xarg
xarg commented May 9, 2014

@bitprophet agreed that it would useful for multiple executions of the same tasks. And I also like @pre and maybe @post? Seems more natural. I also hope that execute('task') will be an option for tasks such as:

@task
def deploy():
    execute('update', *some_args)
    execute('compile', **kw)
    execute('push', x=1, y=2)
    execute('restart')
    execute('migrate')

Maybe even: s/execute/invoke/

@matleh
matleh commented May 9, 2014

@bitprophet "However, there are use cases where users will want the task body executed >1 time, while the pre/post tasks only run 1 time." Could you give an example of such an use-case?

I am just beginning to do some experiments with Invoke. I am currently using buildout and "normal" argparse based command line scripts and am looking for some better alternative.

Invoke looks nice, but what I have tried and read so far, these declarative pre-tasks for me seem to work in the most simple cases (clean being the always used example) but cause more head aches and complexity as soon as the use cases get a bit more involved.

For one thing, you have to declare the pre-task using the name it has in the Collection - but that name for me lives on a different layer of abstraction (they are CLI names, but not code-names, so to say). I might include tasks from multiple files and maybe even multiple sub-projects and then I do not want to use the CLI names of the tasks to declare their dependencies, because these CLI names might change and depend on where I call Invoke.

Currently I see no way to depend on another task without that task also be exposed to the CLI. Sure, I can just call it as a normal function, but then the dependencies of that task are not called.

Then, only in very simple use cases do the pre-tasks require no arguments or require arguments that are independent of the arguments of the main-task. How to declare these argument-dependencies between tasks? How to pass arguments to pre-tasks on the CLI? As was already discussed in this issue, these declaration quickly become very complex and ugly - whereas a method call is clear and requires little extra explanation. When a sub/pre/post-task of some main-task needs an argument, the main-task has to pass it and maybe expose it over its own signature.

What about deduping. Currently, there is a global switch to turn deduping on or off - but in the real world, that probably depends on the individual tasks and their dependencies. So that either needs to be part of the dependency declaration - or of that sub-task-invokation method call. The latter gives the flexibility to be able to decide during run-time, whether deduping is required or not.

Overall, I can not see the real benefit of the declarative sub/pre/post-task style. To my (very limited) understanding, most of these shortcomings (or of the required complexity to work around these shortcomings) would not exist with an explicit sub-task-invokation method call style.

@bitprophet
Member

@matleh, thanks for the feedback - here's some responses:

  • FYI, Invoke is still comparatively new and is undergoing a lot of dev right now, which is why a bunch of these features are basic in form. They'll continue getting fleshed out as people like you bang on it & try fitting it to their use cases.
  • Re: use case for "pre tasks -> main task N times -> post tasks", it's less useful for purely local tasks, but is crucial for mixed local/remote use cases such as deploying to servers. I've had many users request a way to do this in Fabric 1, e.g. "build deploy artifacts locally, once; distribute + install on N servers; perform cleanup at end."
    • This is doable with the "explicit subroutines" methodology, and is how Fabric 1 requires it to be done. It's possible we'll just require that here too - but it's a common enough pattern I wanted to see if we could create a useful API for it.
    • It's possible both options will be offered - a goal here is to find an appropriate middle ground between convenience and having too many APIs.
  • Typically, anything called a "task" (vs a non-@task-decorated function, aka a subroutine) is intended to be usable from the CLI. But I'm open to allowing tasks to be made not visible/callable on the CLI, eg @task(hidden=True).
  • Can you clarify your point about CLI-oriented vs code-oriented names, if it's not covered by the visibility point above? (A detailed example would be great.)
@bitprophet
Member

More notes to myself about the APIs:

  • Using @pre opens things up to ordering confusion - will they be applied in the order given? What if @task is given before or after them?
    • Given that @task is really just generating a Task object (i.e. any mutation of the input would be a temporary transmission mechanism and not the end result as it is in Fab 1), it would need to occur last in the chain - but due to how decorators work this means it'd appear first / at the "top".
  • There's also IMO a high chance of confusion re: @pre('string1', 'string2') - users might either read it, or try to write it, as being like @task(pre=['string1', 'string2']) - i.e. a list of multiple pretask names.
  • This means I lean more towards shazow's suggestion, @task(pre=[call('string1', 'string2'), ...]), as the ordering is easier to handle but we still preserve call semantics instead of requiring tuples/dicts/both.
    • I'm not sure I love the name call since it could imply verb instead of noun. But other obvious names (invocation, task, etc) are too verbose or confusing.
@shazow
shazow commented May 10, 2014

This is crazy but there's something beautiful about putting tasks inside of a task decorator. :) Adding an example in case it grows on you:

@task
def bar(...):
    ...

@task
def baz(...):
    ...

@task(pre=[task('bar', *args, **kw), task('baz', *args, **kw)])
def foo():
    ...

# Or even...
@task(pre=[task(bar, *args, **kw), task(baz, *args, **kw)])
def foo():
    pass

# Maybe this is more sane...
@task(pre=[bar.task(*args, **kw), baz.task(*args, **kw)])
def foo():
    pass

/me goes back in his hole.

@bitprophet
Member

@shazow I thought about it earlier and actually don't think it's possible - how would def task be able to tell whether it's being called as a parameterized decorator, vs as a pre-task description? I don't see a reliable way to do this. See e.g. https://github.com/pyinvoke/invoke/blob/a398256e35125b0eb6a7216a8dd490eb76d381e3/invoke/tasks.py#L256

I also feel like it'd be too confusing for novice/intermediate users :( it's clever, but that's not always good...

@shazow
shazow commented May 10, 2014

@bitprophet Yar don't know if you could do the first few unambiguously, but the last scenario would work. You can add a .task callable member to the decorated function.

Also nice thing about passing around function references instead of strings is you avoid typo bugs (but you do have to be extra-careful about circular dependencies).

Either way, using a different semantic name would work just as well but without the overloaded ambiguity. ¯_(ツ)_/¯

@bitprophet
Member

I really don't like the bar.task idea, sorry :( that's going even farther in the "confusing to the average user" direction (also is rather verbose.)

Do you have any other ideas re: the name for this hypothetical callable? :) adding to my own brainstorm of "short words evoking a reference to a function call": ref, do, use. Still think call is winning.


You're not wrong about the general "callables vs strings" debate, though, I'm always very on the fence about that topic. Primary reason for strings, besides circular dependencies, is an eye on advanced use cases (touching on @matleh's post above) where we may need/want identifiers like foo.bar.

Counterpoint is that if we continue down the road the namespacing API has gone (where the emphasis is on importing & 'binding' callables) we could reasonably expect folks to ensure tasks they want to reference in pre/post lists, are imported locally. This would also work much better in nontrivial use cases where tasks and subcollections are bound in multiple locations, or shuttled around in ways the original definer can't be aware of.

As far as I'm aware, this sort of referential API always ends up caving to reality and allowing either-or (see e.g. Django's model references: https://docs.djangoproject.com/en/1.6/ref/models/fields/#foreignkey) so I'm thinking we probably must do the same here, though likely in a limited mode where we only allow strings to reference locally bound names (i.e. they no longer have anything to do with the focus task's encompassing collection.) This'd be to get around circular dependencies - pushing users to use callables otherwise.

@bitprophet
Member

One too-clever idea could be to make Task objects' __call__ return a promise-like object (with direct, real execution being e.g. task_obj.call()), so the pre/post API looks like:

@task
def build():
    # ...

@task
def clean(which):
    # ...

@task(pre=[clean('all'), build()])
def deploy():
    # ...

This looks nice, but has the huge downside of being confusing 2x over:

  • Non-newbie Pythonistas reading the above will wonder why the heck we're actually-calling tasks at declaration time: normally, calling callables does not yield a delayed action.
  • Just about everybody and their uncle will try calling tasks directly (e.g. using task_obj() within the body of other tasks) and then wonder why the heck nothing is actually happening, and then have to be told "no no you need to use task_obj.call()! Duh!" which is horrible.

I'm putting this out there just for shits - again, it's far too clever. (Experience talking. I can see the support requests rolling in already...)

@matleh
matleh commented May 13, 2014

Refering to you last comment with the idea of tasks returning promise-like objects - do you know doit (www.pydoit.org)? There the task just returns the task-definition. Doit has some interesting concepts...

@matleh
matleh commented May 13, 2014

@bitprophet: Let me try to clarify my point about CLI and code names:

I am working on a bigger product, which consists of several python packages. Each package of course has some functionality and might define it's own tasks. Then there is some kind of meta-package, which pulls these packages together and combines them into a product. (Depending on the product, the number of involved packages varies.) This meta-package has it's own tasks and also uses and/or exposes the tasks of the individual packages.

foo-package

@task
def clean():
    pass

@task(pre=['clean'])
def build():
    pass

meta-package:

from foo import tasks
ns = Collection("foo", Collection.from_module(tasks))

Now, I want to be able to run invoke in the foo-package and also in the meta-package.
When I run it in foo, clean has the CLI name "clean" but when I run it in meta it has the CLI name "foo.clean" and the pre-declaration of "foo.build" breaks...

To my understanding, the function name of a task and the name under which it is exposed to the CLI live on two different layers. For me, the CLI name is just user-interface and should not be used from inside the "business logic", because what is exposed to the user and how it is named there might vary and change independently from the semantics of the "business logic".

Does that make any sense?

@bitprophet
Member

@matleh Yes, that maps to what I originally figured you probably meant - thanks a lot for verifying this!

Pretty sure that (as mentioned above) moving to preferring the task objects themselves, only using strings to get around dependency problems, will make that problem a lot easier to manage. Specifically, you'd say e.g. @task(pre=[clean]) - or to be more "correct" and/or if one were parameterizing the pre-task, @task(pre=[call(clean)]).

@bitprophet
Member

Implementing the next step of this now (allowing task objects instead of names; then to be followed by call for args) - ran into a problem which is that the shorthand form of @task(pre, tasks, here) is now indistinguishable from the non-parameterized form of @task because in both cases the first argument is a task/callable!

Hopefully solvable by just changing the "am I being called as a straight decorator" test to exclude callable objects that are already Tasks; otherwise, may need to limit the shorthand to strings only. Which is already veering into "too many ways to do it" territory :(

@bitprophet
Member

Yup that did the trick for now. Need to re-add ability to use strings again (meaning new tests), then implement call, then...done for now?

@bitprophet
Member

Actually, as I wrote out the docs for "why you might need to use string task references" I realized that once we implement recursive task resolution (#120), circular task dependencies would cause an infinite loop.

I can't think of other obvious reasons why one would need to use a string instead of an object; maybe some lazy eval scenario where one wants to refer to "the main task's collection's current task named "foo"" but that seems reasonably far-fetched and not worth considering for what is effectively a convenience/wrapper API.

Leaving strings out is backwards incompatible but otherwise makes things simpler for both conceptually and implementation-wise. Gonna go with that for now and see if anybody comes up with compelling counterarguments. Hooray pre-1.0 breaking changes...

@matleh
matleh commented May 16, 2014

Just another thought: is the pre-declaration really the dominant use as a task parameter that it justifies to have the shorthand form? I think it would help readability and forward-compatibility to make all task parameters explicit.

I find it a bit hard to understand, why 'pre' is more important than 'post' or 'name' or that really cool and and breath-taking 'fluzzibux' parameter that will be implemented someday... :)

@bitprophet
Member

Pre-tasks definitely come up as more common than post-tasks in all the use cases I've seen, yea; and given the history of this tool space (notably make), if anything gets to live in the *args space, it would be "things required for this task to execute".

I agree that the cleanest approach is to force explicit kwargs for everything, and have already been waffling about removing the *args functionality, but it's simple enough to have in, and enough people have already asked for and/or started using it, that I think it's worthwhile for now.

@bitprophet
Member

Cleaned up more old references to string-based pre-tasks. Need to doc/test/implement call so this ticket's actual subject is satisfied, then changelog, then done.

@bitprophet bitprophet added a commit that referenced this issue May 26, 2014
@bitprophet bitprophet Changelog re #135, closes #135 c38ed4e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment