Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EarlyTask plugins #1562

Closed
felixfontein opened this issue Dec 31, 2014 · 71 comments
Closed

EarlyTask plugins #1562

felixfontein opened this issue Dec 31, 2014 · 71 comments

Comments

@felixfontein
Copy link
Contributor

Hi,

I'd like to have a plugin category EarlyTask, for tasks which are executed before the site is rendered (i.e. an analogue to LateTask). I personally need that for a plugin (or better, combination of plugins) I wrote, currently I used the Task plugin class but it happens that some tasks are run after page compiling, while my page compiling plugin needs their result -- and so it fails.

Does anyone mind if I add something like that? Or would it be better to have a general priority system, so you can assign a task a priority (usual tasks could get 10, and late tasks 100, so you could add a task with priority 2 and one with priority 7 to ensure that the one with priority 2 appears in the task list before the one with priority 7 and before all regular rendering tasks and late tasks)?

(I have the vague feeling that I already read something about EarlyTasks somewhere here, but I cannot remember where. So it's probably not my own idea :) )

Cheers,
Felix

@Kwpolska
Copy link
Member

#1553

@felixfontein
Copy link
Contributor Author

Yes, that was it. Thanks for linking it!
So the main question is probably: just add another category, or add a simple priority system so that there's only one category (Task, with LateTask being a special case of Task with higher priority)?

@felixfontein
Copy link
Contributor Author

(Ah, I forgot the fine prints: the EarlyTask plugins should run before any posts are scanned. For my case, that doesn't matter, but maybe someone wants something to be run after posts are scanned but before posts are rendered. A general priority system could run everything with negative priority numbers before posts are scanned, and everything else afterwards, and rendering the site could have priority 10 or so so that it is possible to squeeze something between post scanning and site rendering.)

@Kwpolska
Copy link
Member

Creating a category is a total of five lines of code across two files (class EarlyTask(Task): pass in plugin_categories.py; a modified import and plugin load directive in nikola.py).

A priority system would be a huge mess, would require a lot of changes everywhere and giving out actual priorities to each and every task to make it flexible.

I’m pretty sure scan_posts is called by the first Task that gets loaded and wants it.

And I just created EarlyTask.

@Kwpolska Kwpolska added this to the v7.3.0 milestone Dec 31, 2014
@Kwpolska Kwpolska self-assigned this Dec 31, 2014
Kwpolska added a commit that referenced this issue Dec 31, 2014
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@punchagan
Copy link
Member

Shouldn't EarlyTasks be added to the default tasks loaded by the NikolaTaskLoader?

@Kwpolska
Copy link
Member

@punchagan Probably. I'll do it tomorrow.

Chris Warrick https://chriswarrick.com/
Sent from my Galaxy S3.
On Dec 31, 2014 10:08 PM, "Puneeth Chaganti" notifications@github.com
wrote:

Shouldn't EarlyTasks be added to the default tasks
https://github.com/getnikola/nikola/blob/master/nikola/__main__.py#L243
loaded by the NikolaTaskLoader?


Reply to this email directly or view it on GitHub
#1562 (comment).

@punchagan
Copy link
Member

@Kwpolska Sure. I was trying to use this feature, and have something that works. You can probably review, (fix) and merge it. (Tomorrow).

Happy New Year! 🎆

@Kwpolska
Copy link
Member

Where is it?

Chris Warrick https://chriswarrick.com/
Sent from my Galaxy S3.
On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com
wrote:

@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this
feature, and have something that works. You can probably review, (fix) and
merge it.


Reply to this email directly or view it on GitHub
#1562 (comment).

Kwpolska added a commit that referenced this issue Dec 31, 2014
hat tip @punchagan

Issue #1562

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

The loader should have been fixed. I managed to do it in 2014 and on my
phone.

Chris Warrick https://chriswarrick.com/
Sent from my Galaxy S3.
On Dec 31, 2014 10:39 PM, "Chris Warrick" kwpolska@gmail.com wrote:

Where is it?

Chris Warrick https://chriswarrick.com/
Sent from my Galaxy S3.
On Dec 31, 2014 10:38 PM, "Puneeth Chaganti" notifications@github.com
wrote:

@Kwpolska https://github.com/Kwpolska Sure. I was trying to use this
feature, and have something that works. You can probably review, (fix) and
merge it.


Reply to this email directly or view it on GitHub
#1562 (comment).

@punchagan
Copy link
Member

Woops, didn't see your question. I had referenced the PR in this commit, but no email notifications for them. :)

Thanks!

@Kwpolska
Copy link
Member

You should've posted the commit sha here, that way I would notice. Either
way, it's now solved (and our solutions are the same in code with just a
small difference when it comes to English)

Chris Warrick https://chriswarrick.com/
Sent from my Galaxy S3.
On Dec 31, 2014 11:08 PM, "Puneeth Chaganti" notifications@github.com
wrote:

Woops, didn't see your question. I had referenced the PR in this commit,
but no email notifications for them. :)

Thanks!


Reply to this email directly or view it on GitHub
#1562 (comment).

@punchagan
Copy link
Member

Yep. Thanks!

@felixfontein
Copy link
Contributor Author

Cool, everything's already done! :)
A priority system won't be exactly complicated, since most things can have the same priority -- there are still doit's dependencies to handle that. But then, that was just a suggestion, I'm perfectly happy with EarlyTask as it is now. Thanks!
Well, and of course, a Happy New Year to you all!

@schettino72
Copy link
Member

Probably too late but i will chime in anyway :)

doit has 2 distinct phases. task-creation and task-execution.

task-creation (generate task metadata) is usually done in doit using the task_xxx functions ina dodo.py module. But in Nikola it is done through the Nikola site object. Since in Nikola you can add more task-creators through plugins LateTask was created to make sure the Nikola site object was set-up before executing task-creators from plugins...

So LateTask means late-generated-task not late-executed-task. LateTask is nikola concept and doit has no idea about it.

task-execution ordering must be handled by doit using the task property task_dep.

A "priority system" may make sense for Nikola since the global Nikola site object is modified by many plugins, but it doesnt make sense for doit itself.

@felixfontein
Copy link
Contributor Author

What do you think? Would it be hard to modify DoitNikola (which inherits from DoitMain) and NikolaTaskLoader (inherited from TaskLoader) to get the following behavior:

  • The set of tasks are split up into three (or more) categories: early, usual, late.
  • For all commands (except build), the task loader will behave as it is now, i.e. generate tasks for early, then for usual, then for late, and return a big list containing all tasks.
  • The build command behaves differently, by first creating only the early tasks and running them all, then creating the usual tasks, running them all, and finally creating the late tasks, and running them all.
    • For this, it might be possible to call the inherited run method of DoitMain three times, each time with a different task loader (one only giving the early tasks, one only giving the usual tasks, etc. -- it could also be the same task loader with a enum ALL, EARLY, USUAL, LATE)
    • Maybe it would also be better to create three DoitMain objects, one for each set of tasks.

What do you think?

Cheers,
Felix

@ralsina
Copy link
Member

ralsina commented Jan 4, 2015

@felixfontein it's perhaps cleaner and less work to create three "metatasks" and have mt1 have all EarlyTasks in its task_dep, then have mt2 have all Tasks and mt1 on its task_dep and then have mt3 have all LateTasks and mt2 in its task_dep

It can be trivially extended to an enumeration etc.

Kwpolska added a commit that referenced this issue Jan 4, 2015
Issues #1553, #1562

This reverts commits:
 * 629726e
 * 0c11952
 * e748644
@Kwpolska
Copy link
Member

Kwpolska commented Jan 4, 2015

I’m reverting the current failed implementation of EarlyTask in 0ce8d72.

@Kwpolska Kwpolska reopened this Jan 4, 2015
@felixfontein
Copy link
Contributor Author

@ralsina: That does not solve the problem that during creation of the usual tasks, the generated posts haven't been created yet (and so no tasks to handle them, i.e. to render their pages, include them in indices and tag pages, etc. can be created resp. are created incorrectly). For that, the tasks can only be created when the previous tasks have been completely processed.

@schettino72
Copy link
Member

@schettino72: Actually, I like the first idea a lot. I mean, it won't be 'worse' than it is now (concerning the order of execution and task generation).

@felixfontein created pydoit/doit#20 with some further thoughts about it. Hopefully you can assign yourself to implement it :)

@felixfontein
Copy link
Contributor Author

I'll try :) Though not today anymore...

@felixfontein
Copy link
Contributor Author

Ok, I now rewrote parts of the code to have each task plugin's tasks generated by one delayed task loader. Also, the delayed task's name equals the task plugin's name, whence nikola build <task_name> works again.

One thing I noticed: since all tasks of stage 2 (f.e.) depend on the waiting task of stage 1, and that waiting task depends on all tasks of stage 1, building one specific task in stage 2 via nikola build <task_name> triggers a build of all tasks of stage 1. This could be helped by adding a modified version of task_dep to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?

@schettino72
Copy link
Member

This could be helped by adding a modified version of task_dep to doit, which is only used to determine the order of execution, but not which tasks have to also be built before a specified task can be build. @schettino72: what do you think about this?

Do you mean a setup-task?
uhmm. The docs need an example without a teardown.

Maybe a delayed task should create an implicit setup-task instead of a task_dep...
It is a trivial change, can you try it?

@felixfontein
Copy link
Contributor Author

No, a setup-task will be executed when this task is executed. A wait-for dependency should not be executed (except of course it is manually specified on the command line, or it also appears as a proper dependency of another task to be executed), it should only participate in determining the execution order resp. when to start executing a task.

@schettino72
Copy link
Member

@felixfontein give me an example please. dodo.py format and what happens when you run it. better create an issue on doit tracker or we gonna hijack this issue (again).

@felixfontein
Copy link
Contributor Author

Take the following dodo.py file:

def task_a_start():
    return {
        'basename': 'a_start',
        'actions': None,
    }

def task_a1():
    return {
        'basename': 'a1',
        'task_dep': ['a_start'],
        'actions': ['echo A1'],
    }

def task_a2():
    return {
        'basename': 'a2',
        'task_dep': ['a_start'],
        'actions': ['echo A2'],
    }

def task_a_wait():
    return {
        'basename': 'a_wait',
        'task_dep': ['a1', 'a2'],
        'actions': None,
    }

def task_b_start():
    return {
        'basename': 'b_start',
        'task_dep': ['a_wait'],
        'actions': None,
    }

def task_b1():
    return {
        'basename': 'b1',
        'task_dep': ['b_start'],
        'actions': ['echo B1'],
    }

def task_b2():
    return {
        'basename': 'b2',
        'task_dep': ['b_start', 'a2'],
        'actions': ['echo B2'],
    }

def task_b_wait():
    return {
        'basename': 'b_wait',
        'task_dep': ['b1', 'b2'],
        'actions': None,
    }

There are two stages, a and b. To ensure that b is executed when a is done, a_wait depends on all a tasks, all b tasks depend on b_start, and b_start depends on a_wait. There's also a dependence between b2 and a2.

I would like this last dependence (of b_start on a_wait) to be a wait-for dependence, so that if I run doit b1, only a1 (and a_start) are executed. (And if I run doit b2, only a2 and b2 and the corresponding _start tasks are executed.)

@felixfontein
Copy link
Contributor Author

I think for first discussing on how to do this (because it has to do a lot with this feature) it's ok to discuss it here, but as soon as we know what we want we can continue to discuss it in the doit tracker. Hope that's ok for you :)

@schettino72
Copy link
Member

@felixfontein thanks for the example. I guess I understand your problem

In my opinion this problem only arises when using "phases" that doit has really no support for, so maybe a patch on Nikola is more appropriate.

Can you define better what triggers the change of behaviour in these wait-for dependency? Is it when any task is specified in the command line? Sounds too tricky to me...

And how can you test/trigger this before pydoit/doit#20 being implemented?

Anyway I gave it a try here:
https://github.com/schettino72/nikola/compare/getnikola:earlytask_impl...earlytask?expand=1
Luckily I added pos_args in the signature of load_tasks even that I didnt know any use for it up to now :)

@felixfontein
Copy link
Contributor Author

Hmm, a wait-for instead of task_dep could also be of interest if you want to process tasks in parallel, but some tasks need a resource which cannot be used in parallel (maybe some external device, like a DVD writer). For such a setup, you need a mechanism to make a second task to be not executed until a first task is done, but you don't want an explicit dependency so you can build each one individually.

Yes, I know that this sounds a bit far fetched, but at least it shows such a feature could in theory be used in a more general setting.

Anyway, there's no behavior difference for wait-for for special situations; it should always behave the same way: if two tasks a and b are scheduled to be executed, and b wait-for a, then a is not executed before b is done. So if b is specified as a task to be executed (either via command line or as a default task), this does not trigger a check of a's dependencies (to determine whether it should be executed) like a task_dep does. It only ensures that if a is actually executed, b will only be executed when a is done.

@felixfontein
Copy link
Contributor Author

(Your try is a hack which works fine if all tasks specified on the command line are within one stage, but if they are not, tasks from a later stage might be generated before an earlier stage finished execution.)

@felixfontein
Copy link
Contributor Author

Ok, I got an idea where this could be quite useful. Assume that you want to record audio samples, maybe for a study. Every sample (recorded as a .wav file) should be converted to different formats (say .ogg and .mp4) afterwards. So you create a recording task for every sample to record, and tasks to create .ogg and .mp4 files (which depend on the recording task). Since the encoding can be done in parallel, you want to run doit with -n2. But you cannot record two things at the same time, so you need to introduce dependencies between the sample recordings.

If you have three recordings, a, b and c, you could use task_dep to get a chain a -> b -> c. But now, if you only want to do recording b (for example because you noticed the recording has too much background noise), you want to run doit run b -n2. But since there's a task dependence, doit will by default also execute task a. So you end up doing two recordings, even though you needed only one.

Here you would prefer to use a wait-for dependency between a, b and c, and not a task_dep.

(Even if you don't want to do the encoding part and thus don't need parallel execution, having such a wait-for dependency makes sense to protect against failures when once executing the tasks in parallel -- which might happen if your dodo.py is included in a larger dodo.py which does a lot more.)

@schettino72
Copy link
Member

@felixfontein the audio sample example is a different case... this is an example where you need contention based on resource utilization for parallel scheduling. This has been raised before, it is feasible to be implemented in doit. But using wait-for would be a poor solution because it can handle just 1 resource being shared.

if two tasks a and b are scheduled to be executed, and b wait-for a, then a is not executed before b is done.

I guess you need to understand a bit more how doit works internally. A task is schedule to be executed in 2 situations:

  1. the task is specified in the command line or default_tasks
  2. the task is a dependency of a previously scheduled task

The problem is that in 2) this happens at run time while tasks are being executed. In other words, doit does not pre-compute the whole task dependency tree before it starts its execution. This has some advantages: being fast (dont compute parts of the tree that are not used), and allowing some dynamic modification of the "tree" (like calc_dep and delayed task creation).

To implement A wait-for B doit would have to finish all other scheduled tasks to make sure that none of the scheduled tasks would have a real task_dep to B. But since you might have multiple uses of wait-for even that would not guarantee that no further scheduled task would not have a task_dep on B. Thats kind of the same problem as you pointed in my hacky patch.

The other option would be to pre-compute the whole DAG, but again given the very dynamic nature of doit you would still have no guarantee that a "skipped" wait-for task would not be scheduled later by a third task.

So I guess this wait-for could only be implemented if there was no support for dynamic changes in the task dependency-tree. Or do you have an idea on how an implementation would work?

I guess it does not solve your problem but doit has a "--single" flag for the run command (or build in Nikola) that ignores task_dep. Sometimes useful to avoid rebuilding a lot of stuff when trying some changes.

@felixfontein
Copy link
Contributor Author

Having --single is probably enough for most use-cases in Nikola (the need to avoid earlier stages to check for tasks to make is somewhat special anyway, I think). So maybe we can just ignore this thing. If I get a good idea how this could be solved/implemented while working on #20 in the doit tracker I'll try that out; if not let's leave it as it is. (Or does anyone objects to this?)

@ralsina ralsina modified the milestones: v8.0.0, 7.8.1 Aug 29, 2016
@Kwpolska Kwpolska modified the milestones: v7.8.1, v7.8.2 Oct 13, 2016
@Kwpolska Kwpolska modified the milestones: v7.8.2, v7.8.3 Jan 8, 2017
@Kwpolska
Copy link
Member

That’s not going to happen any time soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants