# SoS Syntax

## Overview of SoS scripts

A SoS **script** defines one or more **workflows**, and each workflow consists of one or more **steps**. 

![workflow](../media/workflow.png)

Although the input and output can be more general, each step typically has its **input**, **output**, and **dependents** files, it executes a **step process** that consists of one or more Python statements and SoS actions (special python functions). Part or all the step process, called **tasks**, can be executed and monitored externally.

![sos_step](../media/sos_step.png)

A SoS script contains **comments**, **statements**, and one or more SoS **steps**. A SoS **step** consists of a **header**
with one or more step names and optional options. The body of a SoS step consists of optional **comments**, 
**statements**, **input**, **output**, **depends** files, **parameter** definitions, followed by step **process**. The following figure 
shows a sample script that defines a workflow with two steps:

![sample_script](../media/sample_script.jpg)

## Formal definitions of terminology & grammar

* **Script**: A SoS script that defines one or more workflows.
* **Workflow**: A sequence of processes that can be executed to complete certain task.
* **Step**: A step of a workflow that perform one piece of the workflow.
* **Target**: Objects that are input and result of a SoS step, which are usually files, but can also be objects such as an executable command (with variable locations), and a SoS variable.
* **Step options**: Options of the step that assist the definition of the workflow.
* **Step input**: Specifies the input files of the step.
* **Step output**: Specifies the output files and targets of the step.
* **Step dependencies**: Specifies the files and targets that are required by the step.
* **Step process**: The process that a step executes to complete specified work, specified as one or more Python statements. 
* **Task**: Part or all step process that will be executed and monitored outside of SoS. These are usually resource intensive jobs that will take long time to complete.
* **Action**: SoS or user-defined Python functions. They differ from regular Python functions in that they may behave differently in different running mode of SoS (e.g. ignore when executed in dryrun mode).

More formally defined, the SoS syntax obeys the following grammar, given in extended Backus-Naur form (EBNF):

```
Script         = {comment}, {statement}, {step};
comment        = "#", text, NEWLINE
assignment     = name, "=", expression, NEWLINE
```

with SoS steps defined as

```
step           = step_header,
                 {comment}, {{statement}, [input | output | depends ]},
                 [process, NEWLINE, {script} ]
step_header    = "[", section_names, [":", names | options], "]", NEWLINE
parameter      = "parameter", ":", assignment
input          = "input", ":", [expressions], [",", options], NEWLINE
output         = "output", ":", [expressions], [",", options], NEWLINE
depends        = "depends", ":", [expressions], [",", options], NEWLINE
task           = "task", ":",  [options]
action         = func_format | script_format
func_format    = name, "(", [options], ")"
script_format  = name, ":", [options], NEWLINE, script 
section_names  = section_name, ",", section_name
section_name   = name, "(", text, ")"
names          = name, {",", name}
workflow       = name, ['_', steps], {"+", name, ['_', steps}
assignment     = name, "=", expression, NEWLINW
expressions    = expression, {",", expression}
options        = option, {"," option}
option         = name, "=", expression
```

Here `name`, `expression` and `statement` are arbitrary [Python](http://www.python.org) names, expression and statements with added SoS features. **SoS requires Python 3 and does not support Python 2.x specific syntax**

## File structure

A complete SoS script would have a **header**, followed by a **global section** (without section header), and one or more SoS **sections** with header. SoS **pre-processors** can be used to include other scripts or exclude parts of the scripts conditionally. None of the parts is required so an empty script is a valid SoS script.

### File header

A SoS script usually starts with lines

```python
#!/usr/bin/env sos-runner
#fileformat=SOS1.0
```

The first line allows the script to be executed by command `sos-runner` if it is executed as an executable script. The second line tells SoS the format of the script. The `#fileformat` line does not have to be the first or second line but should be in the first comment block. SOS format 1.0 is assumed if no format line is present.

### Global definitions

Python functions, classes, variables can be defined or imported (using Python `import` statement) before any SoS step is defined. These definitions usually contains variables such as version and date of the script, paths to various resources, and utility functions that will be used by later steps. **These definitions are visible to all steps of workflows and are assumed to be readonly** (except for [`parameters`](Command_Line_Options.html) defined by the `parameter:` keyword.

SoS defines the following variables before any variables are defined

* **`SOS_VERSION`**: version of SoS command.
* **`CONFIG`**: A dictionary of configurations specified by the global sos configuration file (`~/.sos/config.yaml`), local configuration file (`./config.yaml`) and command line option `-c config_file`. 

Please refer to sections [command line options](Command_Line_Options.html) and [configuration files](Configuration_Files.html) for the use of `parameter` keyword and `CONFIG` variable.

### SoS Sections

A **step** refers to a step of a SoS workflow and is defined by a **section** in a SoS script. A SoS script can define multiple workflows from multiple sections. A section can define multiple steps of one or more workflows.

A section starts with header in the format of

```
[names: options]
```

The header should start with a `[` from the beginning of a line and end with a `]`. It can contain one or more names with optional section options. Please refer to [workflow specification](Workflow_Specification.html) for the specification of workflows from sections.

SoS uses [Python 3](http://www.python.org) expressions and statements. If you are unfamiliar with Python, you can learn some basics of Python, usually in less than half a day, by reading some Python tutorials (e.g. [the official python tutorial](https://docs.python.org/3/tutorial/)). This [short introduction](https://docs.python.org/3/tutorial/introduction.html) is good enough for you to get started.

A section can have arbitrary Python statements and SoS-specific statements that define the input, output, and dependent targets, and external tasks of the step. These statements starts with keywords `input:`, `output:`, `depends:`, and `task:`. Please refer to [SoS step](SoS_Step.html) for more details about these statements.

### Script style function

SoS allows you to write SoS `action` (basically a Python function) that accept a script (string) as the first parameter in a special script format. For example,

```sos
R('''
pdf('${input}')
plot(0, 0)
dev.off()
''', workdir='result')
```

can be written as

```sos
R:     workdir='result'
pdf('${_input}')
plot(0, 0)
dev.off()
```

**The script is a string without quotation marks** and the normal string interpolation will take place. You can also indent the script (add leading white spaces to all lines) and write the action as

```sos
R:  workdir='result'
   pdf('${_input}')
   plot(0, 0)
   dev.off()
```

The latter is much preferred because it avoids trouble if your script contains strings such as `[1]` and `option:` (and be treated as SoS directives), and more importantly, allows starting a new statement from a non-indented line. For example, `check_command('dot')` would be considered part of a R script in

```sos
R:  workdir='result'
pdf('${_input}')
plot(0, 0)
dev.off()

check_command('dot')
```

but a separate action in 

```sos
R:  workdir='result'
   pdf('${_input}')
   plot(0, 0)
   dev.off()

check_command('dot')
```

Although the script format is more concise and easier to read, it is limited to actions that accept a string as its first parameter and cannot return value or be used within `try/except` of `if/else` statements.

One final difference between SoS and regular Python 3 syntax is that SoS is more lenient on the use of mixed tab and spaces for indentation. Although it is highly recommended that you use all spaces for indentation, SoS will give an warning and treat tabs as 4 spaces during execution.

## SoS precessors

### Preprocessor `%include`

SoS allows you to include variables and steps of another script into the current script using preprocessor `%include`. The `%include` statement should be used before any other SoS statements and follows the same syntax as the python `import` keyword. For example, the following statements are allowed

```
%include alignment
%include alignment as ag
%include alignment, calling
%include alignment as ag, calling as ca

%from alignment include *
%from alignment include var1, workflow1
%from alignment include workflow1 as wf
```

Similar to python `import` statement, variables and workflows included using the `%include ... as ...` syntax can be accessed using `module.name` (e.g. `alignment.var1`, `ag.var1` (for `include alignment as ag`)), whereas variables and workflows included using the `from ... include ...` syntax are available directly (e.g. `var1`, `workflow1`, and `wf` (for `workflow1 as wf`).

For example

```
#!/usr/bin/env sos-runner
#fileformat=SOS1.0

%include alignment
%include call
[10]
sos_run('alignment.default + call.default')
```

would execute two workflows defined in `alignment.sos` and `call.sos`. The files should be in SoS search path to be included.

### Preprocessors `%if .. %elif ..%else .. %endif`

SoS allows you to conditionally include part of the script using macos `%if`, `%elif`, `%else`, and `%endif`. The condition in `%if cond` and `elif cond` should be valid python expression. The condition should not expand multiple lines and should not have trailing `:`. The conditions are evaluated at parsing time so no SoS variables are allowed. For example, you can use

```
%if sys.platform == 'darwin'

[step]
Mac OSX step

%else

[step]
Linux step

%endif
```

to define platform specific steps.

### Preprocessor `%set_options`

You can set global SoS options using preprocessor `%set_options`. Currently SoS only support option `sigil`, which sets global default sigil. You can either set it to a valid sigil (e.g. `'{ }'`, `'[ ]'`), or `None` if you would like to disable string interpolation. Please refer to section [string interpolation](String_Interpolation.html) for details of this option.