# String Interpolation

## Introduction

Unlike Python format string, SoS string interpolation **does not require any prefix**, and is **applied to only double quoted strings** (`" "`, `""" """`, `r" "`, and `r""" """`). Single quoted strings (`' '`, `''' '''`, `r' '`, and `r''' '''`) are not interpolated.

Although configurable, the default sigil for SoS string interpolation is `'${ }'`, which means by default any expression between `${` and `}` would be evaluated by SoS. For example, expressions `resource_path`, `sample_names[0]` and `sample_names` would be replaced by their values in variables `ref_genome`, `title`, and `all_names`, but not in `single_quoted` because the string literal is quoted by single quotes. For convenience, we use a magic `%preview` of the SoS kernel to display the values of variables after the evaluation of the cell content.

In [1]:
%preview ref_genome title all_names single_quoted

resource_path = '~/.sos/resources'
ref_genome    = "${resource_path}/hg19/refGenome.fasta"

sample_names  = ['A', 'B', 'C']
title         = "Sample ${sample_names[0]} results"
all_names     = "Samples ${sample_names}"

single_quoted = '${sample_names} is not interpolated'

'~/.sos/resources/hg19/refGenome.fasta'

'Sample A results'

'Samples A B C'

'${sample_names} is not interpolated'

SoS actions specified in **script format** is assumed to be in raw tripple quote and will be interpolated. For example, variable `num` is passed from SoS to a shell script in the following example

In [2]:
import random
num = random.randint(1, 6)
run:
    echo "Random number is ${num}"

Random number is 2


because the code is equivalent to

In [3]:
import random
num = random.randint(1, 6)
run(r"""
echo "Random number is ${num}"
""")

Random number is 4


## String representation of Objects

SoS evaluate an expression and returns the string representation of the value.

If the value is of simple Python types such as string, boolean, and numbers, the standard Python representation of the value (`repr(obj)`) will be returned.

In [4]:
"${2**10}"

'1024'

In [5]:
user = 'Bob'
"${\"Hi, \" + user}"

'Hi, Bob'

For objects with an iterator interface (e.g. Python `list`, `tuple`, `dict`, and `set`), SoS join the string representation of each item by a space (or comma with `,` conversion flag). For example,

  * List of strings will be converted to a string by joining strings with a space or comma.
  * Dictionary of strings will be converted to a string by joining dictionary keys with no guarantee on the order of values.

In [6]:
names = ['James', 'Bob', 'Kathy']
"${names}"

'James Bob Kathy'

In [7]:
salary = {'James': 20, 'Bob': 25, 'Kathy': 18}
"Employees: ${salary}"

'Employees: Bob James Kathy'

It is worth noting that the step input and output variables (`input`, `output`, `depends`, and its looped version `_input`, `_output`, and `_depends`) are always list of targets. However, if the list contains only one filename, `"${input}"` would be the same as `"${input[0]}"`.

SoS string interpolation supports all string format and conversion specification as in the [Python string format specifier](https://docs.python.org/3/library/string.html#formatspec). That is to say, you can use `: specifier` at the end of the expression to control the format of the output. For example

In [8]:
"${1/3. :.2f}"

'0.33'

In [9]:
filename = 'test.sos'
"${filename:>20}"

'            test.sos'

SoS also extends the conversion operators of the standard Python string format string to give you more control on the string representation of objects, particularly file and directory names. The conversion operators should be specified after a `!` character.

SoS currently supports the following convertors:


| convertor | effect | input | output |
| :----------| :----- | :----- | :-------|
| `s`         | `str()`  | `${file1!s}` | `file 1.txt` |
| `r`         | `repr()`  | `${file1!r}` | `'file 1.txt'` |
| `q`         | `quoted()` | `${file1!q}` | `'file 1.txt'`|
| `e`         | `replace(' ', '\\ ')` | `${file1!e}` | `file\ 1.txt`|
| `a`         | `abspath(expanduser())` |  `${file2!a}` | `/path/to/user/SoS/test.sos` |
| `b`         | `basename())` |  `${file2!b}` | `test.sos` |
| `d`         | `dirname())` |  `${file2!d}` | `/path/to/user/SoS/` |
| `n`         | `splitext()[0]` | `${file2!n}` | `~/SoS/test` |
| `u`         | `expanduser()` | `${file2!u}` | `/path/to/user/SoS/test.sos`|
| `,`         | `','.join()` | `${files!,}` | `a.txt,b.txt`|


here we assume

```
file1='file 1.txt'
file2='~/SoS/test.sos'
files=['a.txt', 'b.txt']
```

For example, if we need to create a file called `Bon Jovi.txt` and run

In [10]:
%sandbox --expect-error
filename = 'Bon Jovi.txt'
run:
    echo "test" > ${filename}
    cat ${filename}

test Jovi.txt


cat: Jovi.txt: No such file or directory
Failed to process statement run(r"""echo "test" > ${filena...name}""")\n (RuntimeError): Failed to execute script (ret=1).
Please use command
	``/bin/bash \
	  /var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp0q6rm6bx/.sos/interactive_0_0_2b5a234c``
under "/private/var/folders/ys/gnzk0qbx5wbdgm531v82xxljv5yqy8/T/tmp0q6rm6bx" to test it.


We would get two files `Bon` and `Jovi.txt` because the command executed was actually

```
    echo "test" > Bon Jovi.txt
    cat Bon Jovi.txt
```

To avoid such problems, you can quote the filename using the `q` (quoted) convertor

In [11]:
run:
   echo "test" > ${filename!q}
   cat ${filename!q}

test


Depending on how your scripts handle filenames, it can be handy to pass filenames to scripts in expanded format. For example, it would be perfectly OK to pass `~/a.txt` to a shell script, but a `u` convertor should be added if you are passing the filename to a script that does not understand `~` in filenames. SoS makes it easy for you to pass filenames in different forms to underlying scripts. For example,

In [12]:
%preview name filename basefilename expanded parparname
file = '~/sos/examples/update_toc.sos'
name = "${file!n}"
filename = "${file!b}"
basefilename = "${file!bn}"
expanded = "${file!u}"
parparname = "${file!ddb}"

'~/sos/examples/update_toc'

'update_toc.sos'

'update_toc'

'/Users/bpeng1/sos/examples/update_toc.sos'

'sos'

The last example is pretty interesting because it applies three converters and gets the name of grand-parent directory using an equivalence of `basename(dirname(dirname(file)))`.

Finally, the `,` converter can be used to output Python sequences with items separated by comma instead of space. For example, if you are passing a Python list as R literals, you can pass them as follows:

In [13]:
salary = {'James': 20, 'Bob': 25, 'Kathy': 18}
R:
    employee = c(${salary!,r})
    print(employee)

[1] "Bob"   "James" "Kathy"


Here the `r` converter quotes the strings, and `,` converter joints the strings by `,`.

In [14]:
"${salary!,r}"

"'Bob','James','Kathy'"

Although the SoS format specifiers are convenient to use, you are not limited to these rules and can define your own ways to present objects. For example

In [15]:
def r_list(obj):
    return 'c(' + ','.join('{!r}'.format(x) for x in obj) + ')'

"${r_list(salary)}"

"c('James','Kathy','Bob')"

## Inclusion of sigil in string

If you need to stop SoS from interpolating some expressions in a string, you can 

1. Use single quotes, or
2. precede the SoS sigil with a backslash (`\`)

For example, the following script includes a shell script with shell variable `${file}` that is not interpolated by SoS:

In [16]:
[10]
title = "Sample ${sample_names[0]} results"
run:
    echo ${title}
    for file in a.txt b.txt c.txt
    do
        echo Processing \${file} ...
    done


Sample A results
Processing a.txt ...
Processing b.txt ...
Processing c.txt ...


## Alternative sigil

If your SoS script contains long bash, perl, or other scripts in which `${ }` are frequently used, it can be tedious and error prone to backquote all sigils in these script. In this case you can assign an alternative sigil to the steps using a `sigil` section option.

For example, the example above could be written as 

In [17]:
[10: sigil='%( )']
title = "Sample %(sample_names[0]) results"
run:
    echo %(title)
    for file in a.txt b.txt c.txt
    do
        echo Processing ${file} ...
    done

Sample A results
Processing a.txt ...
Processing b.txt ...
Processing c.txt ...


to use an alternative sigil for this particular step using a step option `sigil`.

You can define any sigil as long as it has a left sigil and a right sigil separated by a space. You can even use sigils with identical left and right sigil (e.g. `# #`), although the latter is more prone to errors