# Shell-task design

## Command-line template

Define a shell-task specification using a command template string. Input and output fields are both specified by placing the name of the field within enclosing `<` and `>`. Outputs are differentiated by the `out|` prefix.

In [5]:
from pydra.design import shell
from pydra.engine.helpers import list_fields

test_file = "./in.txt"
with open(test_file, "w") as f:
    f.write("this is a test file\n")

# Define the shell-command task specification
Cp = shell.define("cp <in_file> <out|out_file>")

# Parameterise the task spec
cp = Cp(in_file=test_file, out_file="./out.txt")

# Print the cmdline to be run to double check
print(cp.cmdline)

# Run the shell-comand task
cp()

[outarg(name='out_file', type=<class 'fileformats.generic.fsobject.FsObject'>, default=EMPTY, help_string='', requires=[], converter=None, validator=None, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=1, sep=None, allowed_values=None, container_path=False, formatter=None, path_template='out_file'), arg(name='executable', type=typing.Union[str, typing.Sequence[str]], default='cp', help_string="the first part of the command, can be a string, e.g. 'ls', or a list, e.g. ['ls', '-l', 'dirname']", requires=[], converter=None, validator=<min_len validator for 1>, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=0, sep=None, allowed_values=None, container_path=False, formatter=None)]


TypeError: cp.__init__() got an unexpected keyword argument 'in_file'

If paths to output files are not provided in the parameterisation, it will default to the name of the field

In [None]:
cp = Cp(in_file=test_file)
print(cp.cmdline)

By default, shell-command fields are considered to be of `fileformats.generic.FsObject` type. However, more specific file formats or built-in Python types can be specified by appending the type to the field name after a `:`.

File formats are specified by their MIME type or "MIME-like" strings (see the [FileFormats docs](https://arcanaframework.github.io/fileformats/mime.html) for details)

In [None]:
from fileformats.image import Png

TrimPng = shell.define("trim-png <in_image:image/png> <out|out_image:image/png>")

trim_png = TrimPng(in_image=Png.mock())

print(trim_png.cmdline)

## Adding options

Command line flags can also be added to the shell template, either the single or double hyphen form. The field template name immediately following the flag will be associate with that flag.

If there is no space between the flag and the field template, then the field is assumed to be a boolean, otherwise it is assumed to be of type string unless otherwise specified.

If a field is optional, the field template should end with a `?`. Tuple fields are specified by comma separated types.

Varargs are specified by the type followed by an ellipsis, e.g. `<my_varargs:generic/file,...>`

In [None]:
Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> "
            "-R<recursive> "
            "--text-arg <text_arg?> "
            "--int-arg <int_arg:int?> "
            "--tuple-arg <tuple_arg:int,str?> "
        ),
    )

## Specifying defaults

Defaults can be specified by appending them to the field template after `=`

In [7]:
Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> "
            "-R<recursive=True> "
            "--text-arg <text_arg='foo'> "
            "--int-arg <int_arg:int=99> "
            "--tuple-arg <tuple_arg:int,str=(1,'bar')> "
        ),
    )

fields = {f.name: f for f in list_fields(Cp)}
print(f"'--int-arg' default: {fields['int_arg'].default}")

'--int-arg' default: 99


## Specifying other field attributes

Additional attributes of the fields in the template can be specified by providing `shell.arg` or `shell.outarg` fields to the `inputs` and `outputs` keyword arguments to the define

In [3]:
Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> <out|out_file:file?> "
            "-R<recursive> "
            "--text-arg <text_arg> "
            "--int-arg <int_arg:int?> "
            "--tuple-arg <tuple_arg:int,str> "
        ),
        inputs={"recursive": shell.arg(
            help_string=(
                "If source_file designates a directory, cp copies the directory and "
                "the entire subtree connected at that point."
            )
        )},
        outputs={
            "out_dir": shell.outarg(position=-2),
            "out_file": shell.outarg(position=-1),
        },
    )

## Callable outptus

In addition to outputs that are specified to the tool on the command line, outputs can be derived from the outputs of the tool by providing a Python function that can take the output directory and inputs as arguments and return the output value

In [None]:
import os
from pathlib import Path
from fileformats.generic import File


def get_file_size(out_file: Path) -> int:
    result = os.stat(out_file)
    return result.st_size


ACommand = shell.define(
    name="a-command <in_file:file> <out|out_file:file>",
    outputs=[
        shell.out(
            name="out_file_size",
            type=int,
            help_string="size of the output directory",
            callable=get_file_size,
        )
    ],
)

## Dataclass form

Like with Python tasks, shell-tasks can also be specified in dataclass-form by using `shell.define` as a decorator

In [None]:
from fileformats.generic import FsObject, Directory
from pydra.utils.typing import MultiInputObj

@shell.define
class Cp:

    executable = "cp"

    in_fs_objects: MultiInputObj[FsObject]
    recursive: bool = False
    text_arg: str
    int_arg: int | None = None
    tuple_arg: tuple[int, str] | None = None

    class Outputs:
        out_dir: Directory 

Or alternatively in its canonical form, which is preferred when developing tool-packages as it will be type-checkable

In [None]:
@shell.define
class Cp(shell.Spec["Cp.Outputs"]):

    executable = "cp"

    in_fs_objects: MultiInputObj[FsObject] = shell.arg()
    recursive: bool = shell.arg(default=False)
    text_arg: str = shell.arg()
    int_arg: int | None =  shell.arg(default=None)
    tuple_arg: tuple[int, str] | None  = shell.arg(default=None)

    @shell.outputs
    class Outputs(shell.Outputs):
        out_dir: Directory = shell.outarg(path_template="{out_dir}")


## Dynamic form

In some cases, it is required to generate the specification for a task dynamically, which can be done by just providing the executable to `shell.define` and specifying all inputs and outputs explicitly

In [None]:
ACommand = shell.define(
        name="a-command",
        inputs={
            "in_file": shell.arg(type=File, help_string="output file", argstr="", position=-1)
        },
        outputs={
            "out_file": shell.outarg(
                type=File, help_string="output file", argstr="", position=-1
            ),
            "out_file_size": {
                "type": int,
                "help_string": "size of the output directory",
                "callable": get_file_size,
            }
        },
    )