# Pipelime Command Line Interface

The pipelime command line interface is a powerful tool to automate data processing.
First, you can get help simply typing `pipelime`, `pipelime help`, `pipelime --help` or
even `pipelime -h`:

In [None]:
!pipelime

The CLI is built around the concept of `Pipelime Command`, which encapsulates an
operation and makes it available both to the CLI and usual python scripting. Such
commands are dynamically loaded at runtime, so you can always run a third-party command
just by setting its full class path, eg, `my_package.my_module.MyCommand` or
`path/to/my_module.py:MyCommand`. Alternatively, let pipelime find and load your command
by setting `--module my_package.my_module` or `--module path/to/my_module.py`, then
refer to it by its pydantic title (see **#TODO REF**).

The list of available commands and sequence operators (more on this later in this doc)
can be retrieved with:

In [None]:
!pipelime list

Now, to get help for a specific command or sequence operator, just type
`pipelime help <cmd>`, `pipelime <cmd> help`, `pipelime --help <cmd>`, etc,
eg (best viewed in a *real* terminal window):

In [None]:
!pipelime --help clone

Also, the same help can be printed during an interactive session by explicitly calling the printer: 

In [None]:
from pipelime.cli.utils import print_command_op_stage_info

print_command_op_stage_info("clone")

## Running A Command

As you can see above, the *clone* command:
* needs 3 arguments: **input** (required), **output** (required) and **grabber** (optional)
* each argument is, in fact, an *interface* encapsulating a full range of options in a tree-like structure

When you call *clone* through the pipelime cli, you can set all those options in different ways, ie:
* pydash-like key paths prefixed with "+", where the "." separates nested keys and "[]"
indexes a list, eg, `+input.folder path/to/folder`.
* a json/yaml configuration file specified as `--config path/to/cfg.yaml`. Note that command line
options update and override config file definitions.

In [None]:
!pipelime clone +input.folder ../../tests/sample_data/datasets/underfolder_minimnist +output.folder ./clone_out +output.exists_ok=True

Likewise, the `CloneCommand` can be created and run in a python script as well:

In [None]:
from pipelime.commands import CloneCommand
from pipelime.cli.pretty_print import print_command_outputs

cmd = CloneCommand(
    input={"folder": "../../tests/sample_data/datasets/underfolder_minimnist"},  # type: ignore
    output={"folder": "./clone_out", "exists_ok": True},  # type: ignore
)
cmd()
print_command_outputs(cmd)

## Executing A Graph Of Commands

Multiple commands can be chained ad executed as a Direct Acyclic Graph (DAG) by the *run* command (`RunCommand`):

In [None]:
print_command_op_stage_info("run")

The `nodes` attribute is a mapping of nodes, where the keys are the nodes' names and the values the actual commands to execute. As a practical example, look at the *complex_dag.yaml* file. It may seem intimidating at first, but we can easily understand the data flow by drawing it:

In [None]:
!pipelime draw --config complex_dag.yaml

Oops! Something went wrong...
As the error message says, we need to specify some variables. To get a full list,
just audit the configuration file:

In [None]:
!pipelime audit --config complex_dag.yaml

The `complex_params.yaml` file defines such variables, except for `params.root_folder`, which defined by the user on the command line using the special `!` prefix:

In [None]:
!pipelime audit --config complex_dag.yaml --context complex_params.yaml !params.root_folder=./output

Now we are ready to inspect and run the computation graph:

In [None]:
!pipelime draw --config complex_dag.yaml --context complex_params.yaml !params.root_folder=./output

In [None]:
!pipelime run --config complex_dag.yaml --context complex_params.yaml !params.root_folder=./output

And now run again!

In [None]:
!pipelime run --config complex_dag.yaml --context complex_params.yaml !params.root_folder=./output

Ahaa! We got an error:
```
FileExistsError: Trying to overwrite an existing dataset. Please use `exists_ok=True` to overwrite.
```
Looking at `complex_dag.yaml`, we can see that the `exists_ok` option is not set for
`nodes.sum_1.$args.output`. We can fix this by adding it on the command line (best viewed in a *real* terminal window):

In [None]:
!pipelime run --config complex_dag.yaml --context complex_params.yaml !params.root_folder=./output "+nodes.sum_1.$args.output.exists_ok" True