# Shell-tasks

## Command-line templates

Shell task specs can be defined using from string templates that resemble the command-line usage examples typically used in in-line help. Therefore, they can be quick and intuitive way to specify a shell task. For example, a simple spec  for the copy command `cp` that omits optional flags,

In [1]:
from pydra.design import shell

Cp = shell.define("cp <in_file> <out|destination>")

Input and output fields are both specified by placing the name of the field within enclosing `<` and `>`. Outputs are differentiated by the `out|` prefix.

This shell task can then be run just as a Python task would be run, first parameterising it, then executing

In [2]:
from pathlib import Path
from tempfile import mkdtemp

# Make a test file to copy
test_dir = Path(mkdtemp())
test_file = test_dir / "in.txt"
with open(test_file, "w") as f:
    f.write("Contents to be copied")

# Parameterise the task definition
cp = Cp(in_file=test_file, destination=test_dir / "out.txt")

# Print the cmdline to be run to double check
print(f"Command-line to be run: {cp.cmdline}")

# Run the shell-comand task
outputs = cp()

print(
    f"Contents of copied file ('{outputs.destination}'): "
    f"'{Path(outputs.destination).read_text()}'"
)

Command-line to be run: cp /tmp/tmpg8i7_zop/in.txt /tmp/tmpg8i7_zop/out.txt


A newer version (0.25) of nipype/pydra is available. You are using 0.25.dev206+gd1d95cc9


Contents of copied file ('/tmp/tmpg8i7_zop/out.txt'): 'Contents to be copied'


If paths to output files are not provided in the parameterisation, it will default to the name of the field

In [3]:
cp = Cp(in_file=test_file)
print(cp.cmdline)

cp /tmp/tmpg8i7_zop/in.txt /home/runner/work/pydra/pydra/new-docs/source/tutorial/destination


### Defifying types

By default, shell-command fields are considered to be of `fileformats.generic.FsObject` type. However, more specific file formats or built-in Python types can be specified by appending the type to the field name after a `:`.

File formats are specified by their MIME type or "MIME-like" strings (see the [FileFormats docs](https://arcanaframework.github.io/fileformats/mime.html) for details)

In [4]:
from fileformats.image import Png

TrimPng = shell.define("trim-png <in_image:image/png> <out|out_image:image/png>")

trim_png = TrimPng(in_image=Png.mock(), out_image="/path/to/output.png")

print(trim_png.cmdline)

trim-png /mock/png.png /path/to/output.png


### Flags and options

Command line flags can also be added to the shell template, either the single or double hyphen form. The field template name immediately following the flag will be associate with that flag.

If there is no space between the flag and the field template, then the field is assumed to be a boolean, otherwise it is assumed to be of type string unless otherwise specified.

If a field is optional, the field template should end with a `?`. Tuple fields are specified by comma separated types.

Varargs are specified by the type followed by an ellipsis, e.g. `<my_varargs:generic/file,...>`

In [5]:
from pprint import pprint
from pydra.engine.helpers import fields_dict

Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> "
            "-R<recursive> "
            "--text-arg <text_arg?> "
            "--int-arg <int_arg:int?> "
            "--tuple-arg <tuple_arg:int,str?> "
        ),
    )

pprint(fields_dict(Cp))
pprint(fields_dict(Cp.Outputs))

ValueError: sep (' ') can only be provided when type is iterable <class 'fileformats.generic.fsobject.FsObject'> for field 'in_fs_objects'

### Defaults

Defaults can be specified by appending them to the field template after `=`

In [6]:
Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> "
            "-R<recursive=True> "
            "--text-arg <text_arg='foo'> "
            "--int-arg <int_arg:int=99> "
            "--tuple-arg <tuple_arg:int,str=(1,'bar')> "
        ),
    )

print(f"'--int-arg' default: {fields_dict(Cp)['int_arg'].default}")

ValueError: sep (' ') can only be provided when type is iterable <class 'fileformats.generic.fsobject.FsObject'> for field 'in_fs_objects'

### Additional field attributes

Additional attributes of the fields in the template can be specified by providing `shell.arg` or `shell.outarg` fields to the `inputs` and `outputs` keyword arguments to the define

In [7]:
Cp = shell.define(
        (
            "cp <in_fs_objects:fs-object,...> <out|out_dir:directory> <out|out_file:file?> "
            "-R<recursive> "
            "--text-arg <text_arg> "
            "--int-arg <int_arg:int?> "
            "--tuple-arg <tuple_arg:int,str> "
        ),
        inputs={"recursive": shell.arg(
            help=(
                "If source_file designates a directory, cp copies the directory and "
                "the entire subtree connected at that point."
            )
        )},
        outputs={
            "out_dir": shell.outarg(position=-2),
            "out_file": shell.outarg(position=-1),
        },
    )


pprint(fields_dict(Cp))
pprint(fields_dict(Cp.Outputs))

ValueError: sep (' ') can only be provided when type is iterable <class 'fileformats.generic.fsobject.FsObject'> for field 'in_fs_objects'

### Callable outptus

In addition to outputs that are specified to the tool on the command line, outputs can be derived from the outputs of the tool by providing a Python function that can take the output directory and inputs as arguments and return the output value. Callables can be either specified in the `callable` attribute of the `shell.out` field, or in a dictionary mapping the output name to the callable

In [8]:
import os
from pydra.design import shell
from pathlib import Path
from fileformats.generic import File

# Arguments to the callable function can be one of 
def get_file_size(out_file: Path) -> int:
    """Calculate the file size"""
    result = os.stat(out_file)
    return result.st_size


CpWithSize = shell.define(
    "cp <in_file:file> <out|out_file:file>",
    outputs={"out_file_size": get_file_size},
)

# Parameterise the task definition
cp_with_size = CpWithSize(in_file=File.sample())

# Run the command
outputs = cp_with_size()


print(f"Size of the output file is: {outputs.out_file_size}")

Size of the output file is: 256


The callable can take any combination of the following arguments, which will be passed
to it when it is called

* field: the `Field` object to be provided a value, useful when writing generic callables
* output_dir: a `Path` object referencing the working directory the command was run within
* inputs: a dictionary containing all the resolved inputs to the task
* stdout: the standard output stream produced by the command
* stderr: the standard error stream produced by the command
* *name of an input*: the name of any of the input arguments to the task, including output args that are part of the command line (i.e. output files)

To make workflows that use the interface type-checkable, the canonical form of a shell
task dataclass should inherit from `shell.Def` parameterized by its nested Outputs class,
and the `Outputs` nested class should inherit from `shell.Outputs`.

In [9]:
from pydra.engine.specs import ShellDef, ShellOutputs

@shell.define
class Cp(ShellDef["Cp.Outputs"]):

    executable = "cp"

    in_fs_objects: MultiInputObj[FsObject]
    recursive: bool = shell.arg(argstr="-R", default=False)
    text_arg: str = shell.arg(argstr="--text-arg")
    int_arg: int | None = shell.arg(argstr="--int-arg")
    tuple_arg: tuple[int, str] | None = shell.arg(argstr="--tuple-arg")

    @shell.outputs
    class Outputs(ShellOutputs):
        out_dir: Directory = shell.outarg(path_template="{out_dir}")


NameError: name 'MultiInputObj' is not defined

## Dynamic definitions

In some cases, it is required to generate the definition for a task dynamically, which can be done by just providing the executable to `shell.define` and specifying all inputs and outputs explicitly

In [10]:
from fileformats.generic import File
from pydra.engine.helpers import list_fields

ACommand = shell.define(
    "a-command",
    inputs={
        "in_file": shell.arg(type=File, help="output file", argstr="", position=-2)
    },
    outputs={
        "out_file": shell.outarg(
            type=File, help="output file", argstr="", position=-1
        ),
        "out_file_size": {
            "type": int,
            "help": "size of the output directory",
            "callable": get_file_size,
        }
    },
)


print(f"ACommand input fields: {list_fields(ACommand)}")
print(f"ACommand input fields: {list_fields(ACommand.Outputs)}")


ACommand input fields: [arg(name='in_file', type=<class 'fileformats.generic.file.File'>, default=EMPTY, help='output file', requires=[], converter=None, validator=None, hash_eq=False, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=-2, sep=None, allowed_values=None, container_path=False, formatter=None), arg(name='additional_args', type=list[str], default=Factory(factory=<class 'list'>, takes_self=False), help='Additional free-form arguments to append to the end of the command.', requires=[], converter=None, validator=None, hash_eq=False, xor=(), copy_mode=<CopyMode.any: 15>, copy_collation=<CopyCollation.any: 0>, copy_ext_decomp=<ExtensionDecomposition.single: 1>, readonly=False, argstr='', position=None, sep=' ', allowed_values=None, container_path=False, formatter=None), outarg(name='out_file', type=<class 'fileformats.generic.file.File'>, default=EMPTY, help='output