Tutorial 5. Actors vs Nodes

tmcphillips edited this page Feb 15, 2013 · 13 revisions

###Workflows use nodes

Here is a summary of what we have learned in previous tutorials. RestFlow lets you specify workflows by listing and wiring together the nodes that comprise the workflow. Each node represents a step in the computation you wish to perform. Data flows through the nodes when the workflow is run. Data enters nodes through inflows, and exits nodes through outflows. Flow expressions associated with inflows and outflows tell RestFlow how to route data from node to node. RestFlow triggers nodes, and nodes step when triggered if they have sufficient data to compute a new set of outputs.

But how do the nodes in a workflow know how to perform the particular step in a computation they are responsible for? So far, all workflow nodes in our examples have been of type Node. So what makes the nodes do different things? The answer is that workflow nodes delegate the work of computing new data items to actors.

###Nodes use actors

Below is the definition of the EmphasizeGreeting component from the hello5.yml example at the end of Chapter 3. Note the line that associates a reference to a StringConcatenator component with the actor property of the node.

  - id: EmphasizeGreeting
    type: Node
    properties:
      actor: !ref StringConcatenator 
      inflows:
        stringOne: /messages/greeting/
        stringTwo: /messages/emphasis/
      outflows:
        concatenatedString: /messages/emphasizedGreeting/

As with the list of nodes in the Workflow component that refer to node components using the !ref operator, this property also uses !ref. So StringConcatenator is another component. It is defined in the actors.yaml file referenced in the imports section of hello5.yaml. Here is the definition of StringConcatenator in common/groovy/actors.yaml:

- id: StringConcatenator
  type: GroovyActor
  properties:
    step: |
      concatenatedString = stringOne + stringTwo; 
    inputs:
      stringOne:
        type: String
      stringTwo:
        type: String
    outputs: 
      concatenatedString:

Note first that this component is of type GroovyActor. The specification of what computation a GroovyActor performs is included within the actor component definition itself, and that specification is written in the Groovy programming language.

Take a look at the properties given for StringConcatenator. They are step, inputs, and outputs. The step property provides the code (written in Groovy) that is executed by the actor each time the node steps. The inputs property provides a list of variables that hold values provided by the node to the actor. Finally, the outputs property lists variables that the actor uses to return values it computes to the invoking node. Here, the inputs listed are stringOne and stringTwo, and the only output is concatenatedString. The step property provides Groovy code that does the work of adding stringOne to stringTwo and storing the result in concatenatedString.

The next thing to notice is that the names of the variables listed under inputs and outputs match the names of the inflows and outflows of the EmphasizeGreeting component, respectively. This is how RestFlow knows what (node) inflows and outflows to associate with which (actor) input and output variables, and why it can correctly route data from a stepping node to the actor underlying it, and from the actor back to the node.

###Writing your own actor

One difference between nodes and actors is that each node plays a role in no more than one workflow. If you want to reuse a node in a different workflow, you must copy its definition. In contrast, actors can be used by as many nodes as you please, in the same or in different workflows. Actors are reusable, and for this reason it often is useful to define them in files outside the specification of particular workflows.

Create a new file named myActors.yaml. In it, paste the following text:

  components:

  - id: MultipleConcatenator
    type: GroovyActor
    properties:
      step: |
        concatenatedString = stringOne;
        for (i in 1..count) {
          concatenatedString += stringTwo;
        }
      inputs:
        stringOne:
        stringTwo:
        count: 1
      outputs:
        concatenatedString:

Note that this definition of a new MultipleConcatenator actor is identical to that of the StringConcatenator actor except for value of the step property. The one line of Groovy code that comprises the value of the step property of StringConcatenator has been replaced with four lines of Groovy in MultipleConcatenator. And there is a new input variable, count, that itself has a default value of 1. Default values are used by actors when no node parameters or inflows supply values for particular input variables to the actor at run time. The input variables stringOne and stringTwo are not given default values; these values must be provided by the node.

To use the new actor in your workflow, edit hello5.yaml and add a line referring to myActors.yaml line to the import section:

imports:

- classpath:/common/groovy/actors.yaml
- classpath:/common/directors.yaml
- file:myActors.yaml

Then change the definition of the EmphasizeGreeting node to use MultipleConcatenator instead of StringConcatenator:

  - id: EmphasizeGreeting
    type: Node
    properties:
      actor: !ref MultipleConcatenator
      inflows:
        stringOne: /messages/greeting/
        stringTwo: /messages/emphasis/
      outflows:
        concatenatedString: /messages/emphasizedGreeting/

Save the modified workflow spec as hello6.yaml and then run it. It should give the same output as hello5.yaml:

$ restflow -f hello6.yaml 
Hello World!
Good Afternoon, Cosmos!!
Good night, and good luck!!!
$

However, if you add a parameter to EmphasizeGreeting overriding the default value given to count in the definition of MultipleConcatenator as so:

- id: EmphasizeGreeting
  type: Node
  properties:
    actor: !ref MultipleConcatenator
    constants:
      count: 3 
    inflows:
      stringOne: /messages/greeting/
      stringTwo: /messages/emphasis/
    outflows:
      concatenatedString: /messages/emphasizedGreeting/

and save the new workflow as hello7.yaml, then you should see this output instead:

$ restflow -f hello7.yml 
Hello World!!!
Good Afternoon, Cosmos!!!!!!
Good night, and good luck!!!!!!!!!
$

Each of the emphasizers provided to MultipleConcatenator have been applied three times, rather than once. You have created a custom actor which is compatible with StringConcatenator (when the default count value of 1 is applied), but that can append the second string to the first any specified number of times.

###Actors vs Nodes

Nodes and actors both are essential carrying out the steps of a workflow. And each node is associated with an actor. How then are actors different from nodes, and why distinguish between them?

The short answer is that actors encapsulate the algorithms or automate the external programs required to carry out a step in a workflow. Actors do this without any concern for how these computational capabilities will be used in the context of a workflow. Nodes, on the other hand, specify the specific roles the steps implemented in actors play in particular workflows while delegating the specific computations performed to actors.

The following table highlights these and other key differences between nodes and actors in RestFlow.

Nodes Actors
Unique. A particular node can be employed in only one place in one workflow.* Reusable. Multiple instances of the same actor can be used by many nodes in the same or different workflows.
Specific. Signifies the role played by an actor in a particular workflow. General. Is completely unaware of the role played in a particular workflow.
Connected. Interacts directly with data flowing through the workflow, exchanging data with upstream and downstream nodes. Isolated. Each running instance of an actor exchanges data directly only with the one node that refers to it. It is unaware of other nodes, actors, or the workflow as a whole.
Abstract. Independent of how the computational step it represents is performed. Concrete. Directly or indirectly defines the implementation of the computational step performed.
Technology-neutral. Works independently of the technology used to implement the computation it represents. Technology-specific. Depends on specific technologies to implement a computation.
Flow-based. Employs data flows to interact with other nodes in the workflow. Variable-based. Employs variables to interact with code implementing computation.
Data-source configurable. Configurable to accept data via parameters or inflows depending on the role of the node in the workflow. Data-source independent. Receives data in a uniform fashion regardless of whether the corresponding node receives data from parameters, inflows, etc.
Pluggable. Different actors with the same signature (input and output variable names) can be swapped in easily. Node-independent. Can be plugged into any node that provides compatible inputs.
Customizable. Parameters and inflows can be used to override all or just a few of the default values provided by an actor. Standardizable. Default values for input variables can be used to standardize the use of algorithms or external programs.
\* _The exception is when workflows are employed as actors (subworkflows) in other workflows. In such cases, a single node definition can correspond to multiple nodes in a running workflow._

The reason RestFlow distinguishes between nodes and actors is that while each pair of attributes listed above is contradictory (e.g., abstract vs concrete, flow-based vs variable-based, etc), all of these attributes are extremely useful when automating scientific workflows. Distinguishing nodes and actors provides you with both sets of features. For example, it is very useful to provide input data to an actor via constants in one workflow, and via inflows in another. On the other hand, implementation of a particular actor is much simpler if it does not itself need to explicitly support both constants and inflows as sources for every input. Decoupling the concrete implementation of computations (actor code) from how data is provided to them (node configuration) gives you the best of both worlds.

###Inlining actors

Despite the advantages of distinguishing between nodes and actors conceptually, there are times when it is useful to declare a workflow node and corresponding actor at the same time. This can be done by embedding, or inlining, the actor definition within the node declaration.

The node below declares a node and the actor implementing its functionality simultaneously:

  - id: MultiplyByThree
    type: Node
    properties: 
      actor: !inline
        type: GroovyActor
        properties:
          step: product=a*3
          inputs: 
            a:
          outputs: 
            product:
      inflows:
        a: /values/
      outflows: 
        product: /products/three/

The MultiplyByThree node accepts values from the /values/ inflow, multiplies each by three, and emits the tripled values to the /products/three/ outflow. Rather than referring to an actor defined elsewhere using the !ref operator, this node includes the actor definition in the value of the actor property using the !inline operator. Note that unlike actors defined separately from a node, this actor has no id. As a result, there is no way to refer to this actor from other nodes, and consequently the actor cannot be reused (except by copying and pasting the definition). Nevertheless, inlining is especially useful when the function implemented by the actor is either trivial or unlikely to be of use in other workflows.

Moreover, inlined actors need not declare their input and output variables. The labels of the parameters, inflows, and outflows of the containing node can serve this function. The preceding example can be reduced accordingly:

  - id: MultiplyByThree
    type: Node
    properties: 
      actor: !inline
        type: GroovyActor
        properties:
          step: product=a*3
        inflows:
          a: /values/
        outflows: 
          product: /products/three/

The use of the !inline operator is not limited to embedding terse actor definitions within nodes. It can be used anywhere !ref can be used, i.e. to define any component in place rather than to refer to one defined elsewhere. This allows you to structure your workflows as you see fit, and is especially useful for rapid development and experimentation with new actors and workflows. Inlined components can be factored out and defined separately once they have been finalized and tested.

###The GroovyActorNode type

Joint declaration of nodes and Groovy actors can be simplified any further. The GroovyActorNode type is defined in common/types.yaml, which can be referenced in the imports section of a workflow:

  imports:
  - classpath:/common/groovy/actors.yaml
  - classpath:/common/directors.yaml
  - classpath:/common/types.yaml

GroovyActorNode allows a Node declaration to set properties on an embedded GroovyActor using Java's nested bean notation. For example, the MultiplyByThree node above can be reduced to the following:

  - id: MultiplyByThree
    type: GroovyActorNode
    properties: 
      actor.step: product=a*3
        inflows:
          a: /values/
        outflows: 
          product: /products/three/

Similarly, the definition of the Groovy StringConcatenator actor referred to by the EmphasizeGreeting node in hello5.ayml (see first two boxes on this page) can be declared as follows:

  - id: EmphasizeGreeting
    type: GroovyActorNode
    properties:
      actor.step: concatenatedString = stringOne + stringTwo
      inflows:
        stringOne: /messages/greeting/
        stringTwo: /messages/emphasis/
      outflows:
        concatenatedString: /messages/emphasizedGreeting/

###Making your own types for simplified inlining

The GroovyActorNode type declaration in the types.yml file consists of just 5 lines:

  - id: GroovyActorNode
    type: Node
    properties:
      actor: !inline
      type: GroovyActor

This simply says that components of type GroovyActorNode are instances of Node that embed a GroovyActor. When a component is declared to be of type GroovyActorNode, the Spring engine instantiates a GroovyActor and assigns it to the Node before the properties of the component are set. This simple yet powerful capability of Spring allows the workflow designer to easily declare other types to simplify workflow design.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.