Tutorial 4. Workflow Traces

tmcphillips edited this page Feb 14, 2013 · 2 revisions

One of the advantages of constructing scientific software using data flows is that the system easily can have access to all of the data that passes between workflow nodes during a run. Similarly, having the system explicitly trigger workflow nodes makes it possible to know the order in which nodes stepped during a run, as well as what data were operated on during particular steps. Records of events that occur during a workflow run are collectively referred to as the workflow trace.

To see what RestFlow records include the -t option to restflow when you run a workflow. RestFlow will then print out the trace at the end of the run, following any terminal output the run produced. Including the -t option when running the hello5 workflow listed at the end of Tutorial 3. Extended Example gives the following:

$ restflow -t -f hello5.yaml 
Hello World!
Good Afternoon, Cosmos!!
Good night, and good luck!!!
*** Node step counts ***
HelloWorld: 1
HelloWorld.CreateEmphasis: 3
HelloWorld.CreateGreeting: 3
HelloWorld.EmphasizeGreeting: 3
HelloWorld.RenderGreeting: 3
*** Published resources ***
/messages/emphasis/1: !
/messages/emphasis/2: !!
/messages/emphasis/3: !!!
/messages/emphasizedGreeting/1: Hello World!
/messages/emphasizedGreeting/2: Good Afternoon,
/messages/emphasizedGreeting/3: Good night, and
/messages/greeting/1: Hello World
/messages/greeting/2: Good Afternoon, Cosmos
/messages/greeting/3: Good night, and good luck
$

The trace dump, which follows the output of the workflow, is divided into two sections. The first lists the workflow nodes and indicates how many times each stepped during the run. There are probably no surprises here.

The second section of the trace dump lists every data item that was published (output) to a data flow. Note that each published data item has a unique id that is formed by appending an integer to the associated outflow expression. This highlights the difference between variables and flows. If /messages/greeting/ were associated with a variable, it would take on three different values during the workflow run. Instead, three immutable data items are created in the /messages/greeting/ flow, and these are given the different id's: /messages/greeting/1, /messages/greeting/2, and /messages/greeting/3. Each also is given its own (permanent) value. This not only makes it easier to write concurrent applications (see Dataflow Programming Concepts for further discussion of this point), but makes it possible to identify and refer to every piece of data the workflow operated on during a run.

Note also that the published data items are not listed in the order in which they were produced during the run. Instead, they are listed alphabetically by id. This is because nodes generally can run concurrently. Ordering this part of the trace by creation time stamp, for example, could easily result in the trace contents having a different order each time the workflow is run, even when operating on the same data, and make it harder overall to find data in the trace.

We suggest you enable the trace dump feature while working through the tutorials.

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.