Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processors #424

Merged
merged 13 commits into from
Sep 21, 2019
Merged

Processors #424

merged 13 commits into from
Sep 21, 2019

Conversation

dbarrosop
Copy link
Contributor

@dbarrosop dbarrosop commented Aug 8, 2019

Processors implement a set of functions that are called at different stages:

  1. When a task starts - before any of the hosts starts. The event would receive the Task object.
  2. When a task completes - after all of the hosts are completed. The event would receive the Task and the AggregatedResult objects.
  3. When a host starts - for each host, right before starting it. The event would receive the Task and the Host objects.
  4. When a host completes - for each host, right after completing it. The event would receive the Task, the Host and the MultiResult objects.

This allows you to process those events very easily and do things like:

  1. Your own logging/tracing
  2. To add observability
  3. Stream results
  4. Print results
  5. etc...

The idea is that you can have as many processors as you want and all of them will be called.

As an example I re-implmented print_results using a processor. As you can see the implementation is a bit cleaner.

Another benefit is that because the functions are called when the events happen you don't have to wait until all the hosts are completed to get feedback, you can start processing results as they are made available.

@ktbyers
Copy link
Collaborator

ktbyers commented Aug 8, 2019

@dbarrosop Can you expand a little bit on the meaning of when a host starts/host completes (that meaning is pretty obvious for a task, but I didn't really follow it for a host)?

@dbarrosop
Copy link
Contributor Author

Ok, added further clarifications to the description. Obviously all of that will be properly documented :)

@dbarrosop dbarrosop marked this pull request as ready for review August 17, 2019 14:08
@dbarrosop
Copy link
Contributor Author

This one is ready, if nobody objects I will merge it next week. I will also release a new version of nornir.

Copy link
Collaborator

@ktbyers ktbyers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

A few minor things (I am fine with the current PR though):

  1. The "Processor" term definitely had me scratching my head for a bit on what this was about and for a while I thought it was more oriented towards a task replacement pattern (instead of what it is).

You frequently used the term "Events" so I wonder if something like "Event Processor" or "Results Processor" might make sense for at least the documentation.

  1. Should we use an ABC for the Processor class?

"source": [
"The first thing you probably noticed is that we got all those messages on screen printed for us. That was done by our processor `PrintResult`. You probably also noticed we got the `AggregatedResult` back but we didn't even bother saving it into a variable as we don't needed it here.\n",
"\n",
"Now, let's see if `SaveResultToDict` did something to the dictionary `data`L"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra trailing L at end of sentence here.

"cell_type": "markdown",
"metadata": {},
"source": [
"As you can see, performing various actions on the results becomes quite easy thanks to the processors. You still get the result back but thanks to this plugins you may not needed them anymore.\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor spelling should be "need" instead of "needed".

@ogenstad
Copy link
Collaborator

Nice work!

I think it would be nice to also support processors for sub-tasks, currently it's tied to the Nornir object and not for Task.run().

We talked on Slack about maybe moving what's currently the host processors to Task.start().

I'm struggling a bit with the names task_started, task_completed, host_started and host_completed.

I think I would like to replace the current names for host_X -> task_X and have the current task_X be something like run_X or execution_X.

For sub tasks I think it would be fine to only have the start/complete processors be for the main .run(), but then have a start/complete for each task and subtask (I hope this makes sense :) )

What are your thoughts?

@ktbyers
Copy link
Collaborator

ktbyers commented Aug 24, 2019

@ogenstad Good points.

I wonder if we should just map it closer to Nornir concepts/naming i.e. something like nornir_start, nornir_complete, task_start, task_complete (where task here is task object so if you did subtasks you would have multiple of these as you referenced above).

@dbarrosop
Copy link
Contributor Author

dbarrosop commented Sep 6, 2019

Subtasks issue fixed.

Re naming, I like the current naming, let me explain why. I think the problem is that we are very ambiguous in our language when it comes to nornir, we call everything a task but we need to distinguish between a task (the concept of it, the instructions we want to perform) and an actual run/instance of a task on a given host. So I think we need to clarify that and follow suit. My suggestion is to cal them:

  1. task to the concept to the actual concept, i.e., the code/plugins. This aligns with how we call plugins already.
  2. task_instance (tai for short) for the individual execution of a task on a given host.

(2) has the problem that we will have to rename some parts of the code, for instance, task signatures will change to:

def my_task(tai):
     ....

Thoughts?

@ogenstad
Copy link
Collaborator

ogenstad commented Sep 6, 2019

That sounds good to me, my main issue was that the word host was used for what is above described as a task_instance. I think task_instance is much more clear and also works better with sub-tasks.

@ktbyers
Copy link
Collaborator

ktbyers commented Sep 6, 2019

I am still worried this naming is going to be confusing.

Say for example, we have:

nr.run(task=foo)
def foo(task):
    task.run(whatever)

And here nr contains host1 and host2, then we would have:

Task Start Event shortly after nr.run is executed.
Task Instance Start Event (host1-foo)
Task Instance Start Event (host2-foo)
Task Instance Start Event (host1-whatever)
Task Instance Start Event (host2-whatever)
Task Instance Completed Event (host1-whatever)
Task Instance Completed Event (host2-whatever)
Task Instance Completed Event (host1-foo)
Task Instance Completed Event (host2-foo)
Task Completed Event i.e. all hosts are done executing

And here by our new terminology we have two Tasks i.e. foo and whatever, but we only have one Task Start processor event.

Just let me know if I misunderstood this, however. It is definitely possible :-)

I think in practice when explaining this to other people...I would pretty much say, Task Start actually means execution starting or nr run starting.

This stems in part from the dual run methods i.e. the one at the nr.run level and the one at the task instance level.

I am pretty much fine with any of the proposed names, however, as long as the changes don't break existing code...David confirmed to me earlier they don't so we should be good on that front.

I also don't want to drag on too long so we should probably just decide and move on.

@dbarrosop
Copy link
Contributor Author

And here by our new terminology we have two Tasks i.e. foo and whatever, but we only have one Task Start processor event.

That's a good point and it basically highlights we need a better definition of task so let me try again taking into consideration the point you just raised:

  1. A task plugin is a function that takes a task instance as first argument and any extra arguments the task requires, for instance my_task(tai, my_arg1, my_arg2) and returns a Result. A task plugin can call other task plugins via tai.run, these calls are referred as subtasks.
  2. A task is an instruction given to a Nornir object to run a task plugin. This is done via the the Nornir.run method. Note that, as mentioned in the previous point, a task plugin can call other task plugins via the tai argument, however, these are not considered a task, they are considered a subtask.
  3. A task instance is the actual execution of a task on a given host. If a task plugin contains subtasks, they don't spawn a new task instance and they are considered to be part of the parent task plugin.

This leads to the following observation though, to make this consistent with the processors we need to either revert to the previous behavior when we didn't call any processors on subtasks or add a new event specifically for the subtasks. I am going to implement the latter in the meantime but feel free to keep with the discussion, I think the lack of definitions plus the will to make things simpler via utility functions (like task.run) have lead to a few poor decisions that makes things a bit confusing and I'd love to make things clearer so we can decide what we need to change for a nicer API (thinking about nornir 3).

@dbarrosop
Copy link
Contributor Author

Ok, check the latest version. Needs some cleaning on the docs side (waiting to confirm we like this), you can see in test_processor.py how it works and how you can build a datastructure describing the execution of the task.

@ktbyers
Copy link
Collaborator

ktbyers commented Sep 10, 2019

This terminology if fine with me.

I will probably end up referring to item 2 as the "primary task" (or some similar adjective to try to make the distinction clearer) as I think there will be quite a bit of confusion on the term "task" (at least until "task instance" and "tai.run()" becomes a more common pattern).

@dbarrosop dbarrosop merged commit 200caca into develop Sep 21, 2019
@dbarrosop dbarrosop deleted the processors branch September 21, 2019 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants