# Sample Stages

A stage is a pydantic model derived from `pipelime.stages.SampleStage` that processes samples
while they are extracted from a sequence. A stage is applied to a dataset through the
operation `map`:

In [None]:
from pipelime.cli.utils import print_command_op_stage_info

print_command_op_stage_info("map")

The available stages are listed by running `pipelime list` or calling the printer:

In [None]:
from pipelime.cli.utils import print_commands_ops_stages_list

print_commands_ops_stages_list(show_cmds=False, show_ops=False, show_stages=True)

## Custom Stages

To write your own sample stage, you need to create a class derived from
`pipelime.stages.SampleStage` and implement the
`__call__(self, x: Sample) -> Sample` method.
Also, when manipulating the samples, you should never modify the original sample, rather
use Sample's method to get an updated instance. Here a minimal list:
* `shallow_copy`: returns a sample with a new internal mapping object, but the same item
instances
* `deep_copy`: duplicates the whole sample, including the items
* `set_item`: extends the sample with a new item or changes the item assigned to an
existing key
* `set_value`: changes the value of an existing item
* `deep_set`/`deep_get`: sets/gets the value of a nested structure, such as `MetadataItem`, using a
pydash-like address
* `match`: returns the result of a `dictquery` match
* `rename_key`: changes the name of a key
* `duplicate_key`: creates a new key and assignes a reference to another item
* `remove_keys`/`extract_keys`: creates a new sample with a subset of the original keys
* `merge`/`update`: updates and overwrites the original sample
* `to_dict`: converts the sample to a dictionary of item values

As an example, review the class `MyStage` in the *my_stage.py* module:
1. it needs a source key and a target key
1. if the current sample has the source key and it is a numpy array, the value is read
1. the value is multiplied by 2.5
1. a new numpy item is initialized with such value and it is assigned to the target key

Let's see how it looks like in the pipelime shell!

In [None]:
!pipelime help my_stage.py:MyStage

In [None]:
!pipelime -m my_stage.py help MyStage

And now apply the stage within a custom data pipe (beware of the different shell
escaping patterns):

In [None]:
### windows cmd ###
# !pipelime pipe +input.folder "../../tests/sample_data/datasets/underfolder_minimnist" +output.folder "./my_stage_output" +output.exists_ok "+operations.map.$model" my_stage.py:MyStage "+operations.map.$args.source_key" label "+operations.map.$args.target_key" double_half_label

### bash/zsh (single quotes to escape $) ###
# !pipelime pipe +input.folder "../../tests/sample_data/datasets/underfolder_minimnist" +output.folder "./my_stage_output" +output.exists_ok '+operations.map.$model' my_stage.py:MyStage '+operations.map.$args.source_key' label '+operations.map.$args.target_key' double_half_label