Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task dependencies / before/after hooks #452

Closed
bitprophet opened this issue Oct 13, 2011 · 22 comments
Closed

Task dependencies / before/after hooks #452

bitprophet opened this issue Oct 13, 2011 · 22 comments
Milestone

Comments

@bitprophet
Copy link
Member

1.3 features execute() which allows for meta-tasks which call other tasks and honors their individual @hosts, etc settings. This solves many of the more complicated multi-task invocation problems people have run into.

However, there are some arguments in favor of having a flipped-around version of a meta-task, namely specifying that Task A should always execute Task B before or after it runs. (Versus creating a Task C consisting solely of execute(Task B); execute(Task A).) This is commonly referred to as "dependencies" or a "call chain", as seen in Rake/Capistrano/etc.

One such argument is just that "good" refactored design of tasks means you may often want to call 'subtasks' stand-alone, so "just put your pre-requisite in a meta-task" doesn't help. E.g. setting up an "environment" (in the staging/production sense) before executing.

I was -0 on this before because technically you can solve this solely with execute, but in more complex cases that starts to require a "runner" task that takes the real task to run as an argument. At that point, decorating a number of subtasks with "requires X to be run beforehand" is more Pythonic.


Closely related: the idea of having a queue instead of a loop in fabric.main.main, as mentioned in #391. In such a system, dependency decorators might act by simply loading the referenced items into the queue before/after the decorated task.

@ghost ghost assigned bitprophet Oct 13, 2011
@sartian
Copy link

sartian commented Dec 18, 2011

Our development team has migrated our search engine content build cycle from Ant to using Fabric. We love Fabric, but miss the dependency based tasks of Ant. (thats pretty much the only thing we miss)

If we were to build some data for deployment to the search cluster for indexing, running it currently looks something like this:

fab build production deploy

The script downloads data from a feed and transforms it (build) for loading into the search engine (deploy) as part of our production cluster (production)

We'd love it if we could just normally say:

fab production deploy

And the build task would be automatically triggered to check whether the feed has been updated and the content needs to be rebuilt.

Our scripts are smart enough to do delta builds when small parts of the feed change, but Fabric lacks a elegant way to trigger other tasks as prerequisites.

The loop is an interesting suggestion. I'll ponder an approach moving ahead to imagine what a solution would look like for us and if its generalizable. I suspect a decorator could accomplish the task. A rough pass (needs a better name) might be:

@task 
def extract():
   # extract datasets from data feed

@chain(extract)
@task 
def transform():
   # transform extracted datasets into documents for later indexing

@chain(transform)
@task 
def build():
  # using this approach, the other tasks do the heavy lifting, the only
  # code needed in build is the pre-deploy steps

@chain(build)
@task 
def deploy():
    # some deployment code in here

So, if we call deploy, it will call build - build shouldn't do anything unless needed, so it does make the code more complicated I think.

Hrm. This makes me think of a pipeline routing or flow based programming issue.

Maybe a better approach is to provide a chain or pipeline decorator that accepts multiple tasks for a default task:

@pipeline(extract, transform, load)
@task(default=True)
def all():
    # do entire cycle

I'll mull this over and see if I can think of an elegant approach without adding unwarranted complexity.

@sartian
Copy link

sartian commented Dec 18, 2011

Maybe its just a different kinda task, a @flowtask(default=True, subtasks=[extract, transform, load])

@bitprophet
Copy link
Member Author

@sartian -- thanks for the comments. I haven't sat down to mull over my personal API preferences (tho I did just update the description to be in line with my current thoughts re: necessity of this feature overall) but it might end up as simple as just a decorator taking a *args list of names-or-objects.

In my memory, this is usually implemented in one of two ways: the subtask saying "I should be called before/after tasks A, B, ..., Z", or (probably better/more intuitive, if sometimes requiring more LOC) the "main" tasks saying "Before calling me, call tasks X, Y, ...".

Offhand, I think this might work best as another set of kwargs to @task instead of being its own decorator: less constructs/LOC, builds on the "new task related features are only implemented for new-style tasks" philosophy, etc.


So to transform your example:

@task 
def extract():
   # extract datasets from data feed

@task(requires=[extract]) 
def transform():
   # transform extracted datasets into documents for later indexing

@task(requires=[transform])
def build():
  # using this approach, the other tasks do the heavy lifting, the only
  # code needed in build is the pre-deploy steps

@task(requires=[build])
def deploy():
    # some deployment code in here

### example part 2

@task(default=True, requires=[extract, transform, load])
def all():
    # do entire cycle

Possibly doing the usual "I can take a single non-iterable value too" option so base-cases like the above could drop the wrapping square-brackets.


More thoughts/concerns:

  • When an iterable is in play, this should call the required subtasks in the order given; users will need to remember this to prevent silly mistakes. Probably obvious, but I could see some people expecting it to somehow magically resolve to a different order.
  • Should the require'd subtasks' own required tasks be honored?
    • In the first part of the example, you're relying on that behavior to build a real chain
    • It feels like the most sensible/unsurprising default
    • Could complicate implementation and/or behavior in complex setups (i.e. resolving loops, multiple mentions of the same task, etc.)
    • In the second (or at least in other situations) I could see folks not wanting things double-called -- i.e. one is saying "I just want these and only these called beforehand."
    • Not sure if it's better to have multiple kwargs to allow control over this, or to pick one default behavior and stick with it.
  • requires is implicitly a "before" option -- what's a good inverse option? follows?
  • Global before/after hooks (meaning fire something before, or after, the entire session)?
    • Module-level executable code is already an implicit global-before hook, but probably good to have an explicit one.
    • What about global+every hooks (meaning fire something before or after each task)?
    • Basically, think unit test frameworks.

@bitprophet
Copy link
Member Author

More questions about how this should/would work (thanks @sjmh):

  • Do pre/post hook'd tasks run once per session or once per invoke of their related task?
    • It feels like the right answer is "once per session", as in, they operate on the "tasks to execute" list instead of acting like an implicit meta-task bundling the related tasks + main task into one unit.
    • I.e. if mytask @requires(subtask), it behaves like fab subtask mytask.
    • Goal being to encode what people do nowadays by slapping "pretasks" into the fab invocation, which set things up for the main task to run.
      • Thus, @require'd tasks typically would run once per session and set up hosts/roles for the main task, which would then run N times.
      • Versus all linked tasks running N times.
    • Other use cases might disagree and prefer one invoke of subtask per invoke of mytask, though (meaning 20 hosts => 20 calls to subtask and mytask each). Should poll.
  • How to get args to @require'd subtasks?
    • Copy/clone the args to the main task?
      • Confusing and runs into problems with our "function args == CLI args"
    • No args allowed?

@theladyjaye
Copy link

based on #628
Since this is the heart of the issue I will just keep the conversation here.

Your critique re: decorator vs kwarg is valid (again #628). I had originally implemented that with kwargs and only moved to the decorator because I didn't like how it looked:

    @task(depends=[1,2,3]) or: @task(depends=(1,2,3))

But I do think you are right for consistency's sake that it should be available as a kwarg. Would you be opposed to having the @Depends decorator basically be a shortcut?

Effectively providing 2 ways to do the same thing. The decorator would just set the attr on the Task class.

I can also appreciate not wanting to have 2 ways to handle the same problem as well. Lemme know.

@bitprophet
Copy link
Member Author

Yea, I'm torn on having both, it's convenient but also bloats the API (which is never a great argument re: a single addition, but add up a bunch of "more than one way to do it"...)

Another worry is how to handle somebody giving both options on a single task (unlikely but possible):

@task(alias='foo', depends=('a', 'b'))
@depends('c')
def mytask(): ...

What should the dependency list be here? If @task wins over @depends, it's a+b; if vice versa, it's c; if additive, it'd be all 3. Or throw an error: "task already has dependency list!". Which of these would the average Pythonista or newbie
expect?


For now, especially because I'm going to revisit this in Invoke soon and that'll replace this implementation for Fab 2 (eventually...and it's entirely possible I'll make the same decision made here) I'll let you decide what you think is best.

If you go with both, I think the error message is the safest route to take -- any automatic merging/overriding feels like it'd be confusing to somebody, especially given how decorator application order isn't super intuitive.

@gilgamezh
Copy link

I was thinking to implement after/before hooks and in the first search i found this issue. any resolution?

@bitprophet
Copy link
Member Author

@gilgamezh Invoke has been worked on a lot recently and is going to be out soon, with some experimental APIs for this kind of thing. (In fact I was just discussing it with another user on IRC yesterday.) Please keep an eye on the Twitter/mailing list/etc as we'll announce once Invoke has a 0.9.x release and is ready for feedback. Thanks!

@gilgamezh
Copy link

Great! thanks.

@andresriancho
Copy link

Was this included in any recent release / dev branch?

@bitprophet
Copy link
Member Author

@andresriancho Kinda-sorta, Invoke has it, but Fab 1.x doesn't. Fab 2 will be based on Invoke (or Invoke can be used by itself if you're not using Fab's SSH kit). See the roadmap for details.

@andresriancho
Copy link

Great, nice to see that 2.x is on it's way 👍

@manasapte
Copy link

do we have a hook to execute a function just once before iterating over all the hosts and executing a task on them? I wanna print out some summary when a task is called, like "I am going to execute this on etc. is there a way to do this right now?

@bitprophet bitprophet removed the 2.x label Aug 4, 2014
@bitprophet
Copy link
Member Author

This is implemented in Invoke already (happened last year-ish) so it'll definitely be available for Fabric 2.0 users.

@wardi
Copy link

wardi commented Nov 15, 2017

@bitprophet should this be working on the v2 branch right now? I can use @task(pre=...) with invoke but it doesn't seem to work when I run from the fabric script. Should I be using fabric only as a library?

@bitprophet
Copy link
Member Author

@wardi Think that's a bug/TODO, just confirmed it on my end, suspect it has to do with Fabric 2 having a custom executor subclass. Should be relatively easily fixed, will see if I can bang it out real quick.

@bitprophet
Copy link
Member Author

bitprophet commented Nov 20, 2017

Ah yea - seeing the code in question (& its comments) reminds me why this was deferred, it's a more complex question in Fabric than in regular Invoke, because of host parameterization.

Ye olde "task foo depends on task bar, how do?" is, in Invoke, pretty easy: just call bar, then foo`.

In Fabric, if foo is being run with a host list of A,B,C, do we run 'bar, foo-A, foo-B, foo-C' or do we run 'bar-A, foo-A, bar-B, foo-B, bar-C, foo-C'? (And the right answer there will differ between use cases as well.)

So right now, Fabric 2 decided to not even try answering that question and ignores pre/post.

There's a bunch of existing commentary about this in pyinvoke/invoke#461 and I even have a branch open with unfinished work towards a call-graph, pure-dependency-based setup. But barring completion of that, I'll have to decide on the most-applicable interpretation of the above problem as it applies to pre/post, and apply that to Fabric 2's executor class so it works for at least some use cases instead of none 🙃

@bitprophet
Copy link
Member Author

bitprophet commented Nov 20, 2017

FWIW, if I had to just "do it well enough" today, I'd go for the "pre-tasks once, main task N times across hosts, post-tasks once" version of the scenario, since the other scenario can be effected well-enough by users calling their pre/post tasks manually within the main task's body. I'm curious if that's what you (@wardi) would have expected? Or were you even using --hosts?

Of note, this also brings pyinvoke/invoke#261 into play (especially as pertaining to the hosts-list factor, since it's not even technically part of a task's signature but is a sort of metadata.) Complicated!

@wardi
Copy link

wardi commented Nov 20, 2017

@bitprophet thanks! Ignoring pre/post seems to make sense. Refuse to guess and all that.

I've gone the simple route and just call the pre function and implement my own check based on a context variable to see if it has already been run in the same session.

@bitprophet
Copy link
Member Author

bitprophet commented Nov 20, 2017

@wardi I've actually just pushed a change to the v2 branch that implements what I said I would do, above, re: pre -> main-host-1 -> ... -> main-host-N -> post. Since I would've had to do it anyways if I wanted to defer the "real" fix, and I had the patient on the table from simply looking at the problem & writing tests, I figured why not spend the extra 5 minutes.

It's still subject to the other problems at the Invoke level re: pre/post not getting all the args/kwargs from the main task, and so forth, but at least now the behavior is functionally identical instead of just not existing. (This also applies to "running fab as drop-in replacement for inv", i.e. running local-only tasks. Which was definitely 'broken' beforehand.)

Let me know how that works for you, if you've time to pull it down and check it out!

@wardi
Copy link

wardi commented Nov 22, 2017

not exactly, I was hoping that I could use

@task
def this_first(c): ...

@task(this_first)
def thing1(c): ...

@task(this_first)
def thing2(c): ...

to do

$ fab thing1 thing2
this_first
thing1
thing2

My specific "this_first" uses are "have the user enter their ssh password to set up c.config" and "pull all local git repos before they are deployed to various targets (in procedures given by thing1 and thing2)"

But as I said, I'm fine with implementing this in plain python instead of using pre/post if this isn't a common use case.

@bitprophet
Copy link
Member Author

Ah yea, that's part of the other outstanding issues re: real dependency graph tracking. The current setup is old and very literal/naive. So, keep an eye on pyinvoke/invoke#461!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants