# Execution of SoS workflows

The content of this chapter is largely applicable to batch mode but we represent it in Jupyter notebook for easy reproduction of results. If you are not familiar with Jupyter notebook, please refer to chapter [Notebook Interface](../documentation/Notebook_Interface.html) for details. For the impatience, the magics

* `%sandgox` execute the cell in a temporary directory
* `!cmd` execute shell command `cmd`
* `%run` run the cell as if in command line with specified options
* `%rerun` rerun the last executed script (cell content without magics)
* `%set` set options that would be included in each `$run` command

## logging level

SoS uses a logging system to output all sorts of information during the execution of workflows. The amount of output can be controlled by logging level, which can be `error` (0), `warning` (1), `info` (2), `debug` (3), and `trace` (4). The default logging level for SoS is `info` in batch mode and `warning` in interactive mode.

For example, logging at `info` level would produce message indicating the steps executed and input output files, but nothing but warning and errors at the `warning` level.

In [1]:
%sandbox
!touch a.txt

%run -v2
[10]
[20]
input: 'a.txt'
[30]
[40]

INFO: Running [32mdefault_20[0m: 
INFO: Running [32mdefault_10[0m: 
INFO: Running [32mdefault_30[0m: 
INFO: Running [32mdefault_40[0m: 


In [2]:
%sandbox
!touch a.txt

%run -v1
[10]
[20]
input: 'a.txt'
[30]
[40]

## `dryrun` mode

The `dryrun` mode is used to check for syntax errors of a SoS script without actually executing any of the actions. It can be specified with option `-n`. For example, running the following script in dryrun mode would produce an error message

In [3]:
%sandbox --expect-error
%run -n
[10, skip=False]
sh:
   echo "I am command echo"

File contains parsing errors: <string>
	[line  2]: [10, skip=False]
sh:
   echo "I am command echo"
Invalid statements: SyntaxError('invalid syntax', ('<string>', 1, 10, '[10, skip=False]\n'))
Sandbox execution failed.

Of course the action `sh` would not be executed even after the script is fixed

In [4]:
%run -n
[10: skip=False]
sh:
   echo "I am command echo"

## Change system $PATH

There are cases where you would like to use a specific version of programs for your workflow but do not want to change the system `$PATH` because of its global effect. In this case you can prepend pathes to these executables to `$PATH` using option `-b`.

The following example first cretes a executable `ls` in `tmp` with an `echo` command. Using the option `-b tmp`, the `tmp` directory is prefixed to the system `$PATH` before the workflow is executed. The consequence is that this fake `ls` supersedes the system `ls` when `ls` is called in `step_10` of the workflow.

In [5]:
%sandbox
!mkdir tmp
!echo "#!/bin/bash" > tmp/ls
!echo "echo This is fake ls" >> tmp/ls
!chmod +x tmp/ls

%run -b tmp
[10]
sh:
    ls

This is fake ls


The `-b` option has a default value `~/.sos/bin`, so any command under `~/.sos/bin` would be executed (before system command with the same name) even if the executables are not under system `$PATH`. This feature allows you to create commands that would only be used inside SoS-scripts, and more interestingly, allows you to create executable or install programs on-the-fly.

For example, step 20 of the following workflow depends on an executable `lls` that is not a system executable.

In [6]:
%sandbox
!rm -f ~/.sos/bin/lls

[install_lls: provides=executable('lls')]
run:
    echo "#!/bin/bash" > ~/.sos/bin/lls
    echo "echo This is lls" >> ~/.sos/bin/lls
    chmod +x ~/.sos/bin/lls

[20]
depends: executable('lls')
run:
    lls

This is lls


because `lls` is created under `~/.sos/bin`, it would be immediately available to SoS after the `install_lls` step. This works for any program as long as you can create a symbolic link under `~/.sos/bin` after its installation.

You can always disable this behavior by setting option `-b` without value.

## Runtime signature

One of the most annonying problems with the development and execution of workflows is that it can take very long times to execute then. What makes things worse is that we frequently need to re-run the workflow with different paremeters and even different tools -- it can be really time-consuming to re-execute the whole workflow repeatedly, but it is also very error-prone to repeat selected steps of a workflow. 

SoS addresses this problem by using <font color='red'>runtime signatures</font> to keep track of <font color='red'>execution units</font>, namely the input, output, and dependent targets, and related SoS variables of a piece of workflow. SoS tracks execution of statements at the step level for each [input group](../documentation/SoS_Step.html) and saves runtime signatures at a folder called `.sos` under the project directory. The runtime signatures are used to

1. Avoid repeated execution of identical units, and
2. Keep track of workflow related files for project management

This tutorial focuses on the first usage. The second one would be described in detail in [Project Management](Project_Management.html).

### `ignore` mode

SoS workflows can be executed in batch mode and in interactive mode using the SoS kernel in Jupyter notebook or qtconsole. Because the SoS kernel is mostly used to execute short statements in SOS and other kernels, runtime signatures are by default set to `ignore` in interactive mode (and to `default` in batch mode).

Let us create a temporary directory and execute a workflow that take a bit of time to execute. This is done in the default `ignore` signature mode of the Jupyter notebook

In [7]:
%sandbox --dir tmp

!rm -rf .sos/.runtime
![ -d temp ] || mkdir temp

In [8]:
%sandbox --dir tmp

parameter: size=1000
[10]
output:  "temp/result.txt"
sh:
    dd if=/dev/urandom of=${output} count=${size}

[20]
output:  'temp/size.txt'
with open(output[0], 'w') as sz:
    sz.write("${input}: ${os.path.getsize(input[0])}\n")

1000+0 records in
1000+0 records out
512000 bytes transferred in 0.035019 secs (14620568 bytes/sec)


temp/result.txt: 512000


Now, if we re-run the last script, nothing changes and it takes a bit of time to execute the script.

In [9]:
%sandbox --dir tmp
%rerun

1000+0 records in
1000+0 records out
512000 bytes transferred in 0.034253 secs (14947543 bytes/sec)


temp/result.txt: 512000


### `default` mode

Now let us switch to `default` mode of signature by running the script with option `-s default`. When you run the script for the first time, it would execute normally and save runtime signature of the steps.

In [10]:
%sandbox --dir tmp
%rerun -s default

1000+0 records in
1000+0 records out
512000 bytes transferred in 0.034355 secs (14903249 bytes/sec)


temp/result.txt: 512000


but both steps would be ignored. Here we use `-v2` to show the `ignored` message. This time we use magic `%set` to make option `-s default` persistent so that we do not have to specify it each time.

In [11]:
%sandbox --dir tmp
%set -s default
%rerun -v2

sos options is set to "-s default"


INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature
INFO: Running [32mdefault_20[0m: 
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m due to saved signature
INFO: Workflow default (ID=e9251d43927ed624) is executed successfully.


temp/result.txt: 512000


However, if you use a different parameter (not the default `size=1000`), the steps would be rerun

In [12]:
%sandbox --dir tmp
%rerun -v2 --size 2000

INFO: Running [32mdefault_10[0m: 
2000+0 records in
2000+0 records out
1024000 bytes transferred in 0.068127 secs (15030770 bytes/sec)
INFO: Running [32mdefault_20[0m: 
INFO: Workflow default (ID=96aa8738ded2179e) is executed successfully.


temp/result.txt: 1024000


The signature is at the step level so if you change the second step of the script, the first step would still be skipped. Note that the step is independent of the script executed so a step would be skipped even if its signature was saved by the execution of another workflow. The signature is clever enough to allow minor changes such as addition of spaces and comments.

In [13]:
%sandbox --dir tmp
%run --size 2000 -v2
parameter: size=1000
[10]
output:  "temp/result.txt"
# added comment
sh:
    dd if=/dev/urandom of=${output} count=${size}

[20]
output:  'temp/size.txt'
with open(output[0], 'w') as sz:
    sz.write("Modified ${input}: ${os.path.getsize(input[0])}\n")

INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m due to saved signature
INFO: Running [32mdefault_20[0m: 
INFO: Workflow default (ID=e53cef26d0601b1b) is executed successfully.


Modified temp/result.txt: 1024000


### `assert` mode

The `assert` mode is used to detect if anything has been changed after the execution of a workflow. For example,

In [14]:
%sandbox --dir tmp
%set -s assert
%rerun --size 2000 -v2

sos options is set to "-s assert"


INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with matching signature
INFO: Running [32mdefault_20[0m: 
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with matching signature
INFO: Workflow default (ID=e53cef26d0601b1b) is executed successfully.


Modified temp/result.txt: 1024000


Now if you change one of the output files, sos would fail with an error message.

In [15]:
%sandbox --expect-error --dir tmp
!echo "aaa" >> temp/result.txt
%rerun --size 2000 -v2

INFO: Running [32mdefault_10[0m: 
Failed to process step output: "temp/result.txt" (Signature mismatch: File has changed temp/result.txt)
Sandbox execution failed.

Interestingly, SoS would not complain if you completely remove the intermediate file.

In [16]:
%sandbox --dir tmp
!rm temp/result.txt
%rerun --size 2000 -v2

INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with matching signature
INFO: Running [32mdefault_20[0m: 
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with matching signature
INFO: Workflow default (ID=e53cef26d0601b1b) is executed successfully.


Modified temp/result.txt: 1024000


This is a particular feature of SoS, in that you can remove any intermediate file without affecting the re-execution of the workflow, as long as the intermediate file is not needed to re-generate a later output.

### `force` mode

The `force` signature mode ignores existing signatures and re-run the workflow. This is needed when you would like to forcefully re-run all the steps to generate another set of output if outcome of some steps is random, or to re-run the workflow because of changes that is not tracked by SoS, for example after you have installed a new version of a program.

In [17]:
%sandbox --dir tmp
%set
%rerun --size 2000 -s force

sos options "-s assert" reset to ""


2000+0 records in
2000+0 records out
1024000 bytes transferred in 0.068262 secs (15001056 bytes/sec)


Modified temp/result.txt: 1024000


### `build` mode

The `build` mode is somewhat opposite to the `force` mode in that it creates (or overwrite existing signature if exists) with existing output files. It is useful, for example, if you are adding a step to a workflow that you have tested outside of SoS (without signature) but do not want to rerun it, or if for some reason you have lost your signature files and would like to reconstruct them from existing outputs.

In [18]:
%sandbox --dir tmp
%rerun --size 2000 -s build -v2

INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with signature constructed
INFO: Running [32mdefault_20[0m: 
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with signature constructed
INFO: Workflow default (ID=e53cef26d0601b1b) is executed successfully.


Modified temp/result.txt: 1024000


This mode can introduce erraneous files to the signatures because it does not check the validity of the incorporated files. For example, SoS would not complain if you change parameter and replace `temp/result.txt` with something else.

In [19]:
%sandbox --dir tmp
!echo "something else" > temp/result.txt
%rerun -s build -v2

INFO: Running [32mdefault_10[0m: 
INFO: Step [32mdefault_10[0m (index=0) is [32mignored[0m with signature constructed
INFO: Running [32mdefault_20[0m: 
INFO: Step [32mdefault_20[0m (index=0) is [32mignored[0m with signature constructed
INFO: Workflow default (ID=2b077801357ee5a0) is executed successfully.


Modified temp/result.txt: 1024000


In [20]:
# cleanup
!rm -rf tmp