# Writing Your Own Sequence Operator

A sequence operator is a pydantic model derived from
`pipelime.sequences.SamplesSequence`. Though it is intended to somehow process the
samples, it should not be confused with a sample stage:
* a sample stage is a function that takes a sample and returns a sample, while a
sequence operator returns a sample at a given index
* a sample stage has access only to the sample it processes, while a sequence
operator has a complete access to the source data
* a sample stage *receives* the sample to process, while a sequence operator
*may pick or not* the sample from another source

There exists two types of sequence operators:
* *generators*: they do not have a source samples sequence
* *pipes*: they get another samples sequence as source input

Therefore, generators are called as class methods on the SamplesSequence class, while
piped operators are instance methods of the object creating a new SamplesSequence with
`self` as input source:

In [None]:
from pipelime.sequences import SamplesSequence

seq = SamplesSequence.toy_dataset(length=10)  # type: ignore
seq = seq.shuffle()

## Generators

To define your own generator you can directly derive from
`pipelime.sequences.SamplesSequence`, then:
* put the decorator `@pipelime.sequences.source_sequence` on top of your class
* set a `title` as metaclass keyword argument, eg,
`class Foo(SamplesSequence, title="foo")`: this will be the name of the class method to
call
* describe what the generator does in the class help
* use `pydantic.Field` for each parameter:
  * always set a default or `...` for required parameters (CAVEAT: use
  `default_factory` to create mutable objects, eg, `dict` or `list`)
  * insert a descriptive help with `description=...`
* define a `pydantic.validator` to give better insights when inputs are wrong
* implement `def size(self) -> int` and `def get_sample(self, idx: int) -> Sample`

## Pipes

To define your own pipe you can derive from
`pipelime.sequences.pipes.PipedSequenceBase` to get a reasonable base implementation,
then:
* put the decorator `@pipelime.sequences.piped_sequence` on top of your class
* set a `title` as metaclass keyword argument, eg,
`class Foo(PipedSequenceBase, title="foo")`: this will be the name of the method to
call
* describe what the pipe does in the class help
* use `pydantic.Field` for each parameter:
  * always set a default or `...` for required parameters (CAVEAT: use
  `default_factory` to create mutable objects, eg, `dict` or `list`)
  * insert a descriptive help with `description=...`
* define a `pydantic.validator` to give better insights when inputs are wrong
* *(only if needed)* implement `def size(self) -> int` and/or
`def get_sample(self, idx: int) -> Sample`
* to get access to the source samples sequence, simply use `self.source`

## A Pipe Example

In `my_pipe.py` there is a piped sequence operator that reverse the order of the first
*n* samples of the source sequence. To see the help from command line, we have to
first specify the module path, then the operator name, since it has to registered as
attribute on SamplesSequence:

In [None]:
!pipelime -m my_pipe.py reversed help

In [None]:
import my_pipe
from pipelime.cli.utils import print_command_op_stage_info

print_command_op_stage_info("reversed")

In [None]:
import my_pipe
from pipelime.sequences import SamplesSequence

seq = SamplesSequence.from_underfolder(  # type: ignore
    "../../tests/sample_data/datasets/underfolder_minimnist"
)

print("IDX | Original Sequence | Reversed Sequence")
for idx, (s1, s2) in enumerate(zip(seq, seq.reversed(num=10))):
    print(f"#{idx:0>2d}", "|", int(s1["label"]()), "|", int(s2["label"]()), sep="\t")