Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delay creation of tasks until after another task is executed #14

Closed
5 tasks done
schettino72 opened this issue Jan 5, 2015 · 23 comments
Closed
5 tasks done

delay creation of tasks until after another task is executed #14

schettino72 opened this issue Jan 5, 2015 · 23 comments
Milestone

Comments

@schettino72
Copy link
Member

This is required when loading dodo.py you dont have enough information to create some of your tasks metadata. see: getnikola/nikola#1562

WIP branch: https://github.com/pydoit/doit/tree/delayed-task-creation

Sample testing code: https://gist.github.com/schettino72/9868c27526a6c5ea554c

TODO:

  • support delayed task creation for run command
  • make sure commands list, clean, etc create all tasks
  • fix multi-processing
  • make sure implicit dependencies (file_dep -> another_task.target) is respected
  • tests
@felixfontein
Copy link
Contributor

Parallel processing seems to work fine with threads, it only fails with processes. As far as I understood, the problem is that the updated self.task from the main process isn't available in the child processed, which results in a KeyError when trying to do task = self.tasks[recv_task.name].

(Which can result in another problem: if task was never assigned before, the exception handler in execute_task_subprocess tries to access task.name and bails out. This can be solved by assigning task = None in the beginning of the function, and checking task for being None in the exception handler, and if its not there, emitting a dict which doesn't contain name.)

@felixfontein
Copy link
Contributor

I guess using shared memory to store self.tasks could be useful (maybe via https://docs.python.org/library/multiprocessing.html#module-multiprocessing.sharedctypes).

(Maybe it would also be better to refactor MRunner and MThreadRunner so that such implementation details result in less clutter?)

@schettino72
Copy link
Member Author

I updated my branch. Only problem left is multiprocessing... I just figured out what to do :)

Actually the main process sends the task to the executing process, but it sends an incomplete task without actions or any other attribute that contains a callable. This was done because it is not possible to pickle closures. I dont think this sharedctypes can handle closures.

So I will change the code to send the full task in case it is late created task and if has a closure give an error. So closures will be forbidden only in this specific case...

@felixfontein
Copy link
Contributor

Aren't most tasks in nikola using closures (such as in actions), or did I understood something incorrectly?

@felixfontein
Copy link
Contributor

Please only forbid closures when using processes and not threads. After all, there they work fine :)

@schettino72 schettino72 added this to the 0.27 milestone Jan 11, 2015
@felixfontein
Copy link
Contributor

@schettino72: is it possible to create another delayed task during delayed task generation? Delayed task creation wants me to return dicts, and it seems I cannot return a Task(..., loader=DelayedLoader(...)) object.

@schettino72
Copy link
Member Author

@felixfontein ok. I will add support for that... please check 4e32bd3

note that the semantics of creating a task directly is not the same as passing dicts (mainly regarding to task_dep and group_task)

when i have time a create a proper patch and merge this.

@felixfontein
Copy link
Contributor

It still doesn't work. I think the reason is this line:
5cca99c#diff-4804aaab12636cb475f8a46e607c1ce7R398
It sets the loader of newly created tasks to DelayedLoaded (which equals False). Replacing that with

                if nt.name == this_task.name:
                    nt.loader = DelayedLoaded

seems to work better.

@felixfontein
Copy link
Contributor

With that change, I still have a problem: I get an exception in line 103 of runner.py, when doit tries to access node.task (at that point, node equals hold on). This is reproduced by the newest version of my test program: https://github.com/getnikola/nikola/blob/earlytask_experiments/test.py

@felixfontein
Copy link
Contributor

I don't have time to debug this further right now, I'll try again tomorrow...

@felixfontein
Copy link
Contributor

Ok, I think I found the source -- though I don't yet know how to fix it. The problem seems that the node for the task creating more tasks is destroyed, while it contains information which nodes are waiting for it. This information seems to not be recovered anywhere and gets lost, whence the nodes waiting for the task generation wait forever.

@schettino72
Copy link
Member Author

@felixfontein i will take a look in your example...

@schettino72
Copy link
Member Author

Are you using doit master branch? you should be using that. I run your example there was no hanging or exceptions...

what is the problem with it? is there any missing task. i still didnt follow the logic of your example...

@schettino72
Copy link
Member Author

yes if a task uses loader attribute all other arguments (but the name will be discarded when the "real" task is created). but your code seems ok in this respect.

@schettino72
Copy link
Member Author

And you should revert your change that sets the loader to DelayedLoaded (False) only for the task the matches the name. This probably break some stuff.

@felixfontein
Copy link
Contributor

I was using the create-task-instance branch. And without the change, only some tasks are executed since half of them aren't generated.

The proper output should be:

GENERATE 1
.  copy_1:dest/1.a
.  copy_1:dest/2.a
.  copy_1:dest/2.b
.  level_1_wait

GENERATE 2
.  copy_2:dest/1.b
.  copy_2:dest/2.e
.  level_2_wait

GENERATE 3
.  copy_3:dest/1.c
.  copy_3:dest/2.c
.  level_3_wait

GENERATE 4
.  copy_4:dest/1.d
.  copy_4:dest/2.d
.  copy_4:dest/2.f
.  level_4_wait

.  level_4_done

.  level_3_done

.  level_2_done

.  level_1_done

(I never got further than level_4_done, it throws an exception with the change.)

@schettino72
Copy link
Member Author

thanks. i am taking a look...

@schettino72
Copy link
Member Author

I got a fix :) your example works for me now...
223ff87

@felixfontein
Copy link
Contributor

Cool! Works great now :) I also added code (remotely) similar to loader.load_tasks() so that list and clean work again.

@felixfontein
Copy link
Contributor

The only thing which isn't working at the moment is specifying default tasks. If I add

            DOIT_CONFIG['default_tasks'] = ['level_1_done', 'level_2_done', 'level_3_done', 'level_4_done']

(or similar) in case cmd.execute_tasks is True (Nikola uses ['render_site', 'post_render']), doit complains that it doesn't know level_3_done (and level_4_done if it would come so far; both aren't a surprise since they aren't generated yet, not even as an empty hull). Any idea how to get around that?

@schettino72
Copy link
Member Author

@felixfontein please give me details to your problems with list and clean and point to your code changes... nevermind I saw your changes already

doit complains that it doesn't know level_3_done (and level_4_done if it would come so far; both aren't a surprise since they aren't generated yet, not even as an empty hull). Any idea how to get around that?

When I first implemented this feature I didnt think people would use it to create completely dynamic tasks, now it can... but I would recommend you to always use "empty hull" for tasks. I dont know how to solve your problem other that creating an empty hull.

@schettino72
Copy link
Member Author

All changes are now merged into master. Please use master instead of create-task-instance.

I am ready to make a new release but I will wait for further feedback from you...
so let me know when you think it is good enough for a release (no hurry).

@felixfontein
Copy link
Contributor

Thanks a lot for implementing that feature and merging it into master!

I'll try to find out what precisely is needed in Nikola and how I can implement that. That'll take a bit of time (in particular because I don't have too much spare time at the moment), but if I find something on the way I'll tell you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants