# Optional Pipeline Steps

A recurring requirement, e.g., in data analysis workflows, is to have optional steps in a pipeline.
This could be a chain of multiple corrections to be applied, where a user can choose to apply or skip each step.
This brings two challenges.
Firstly, Sciline does not provide a way to skip steps in a pipeline, since it is in conflict with the idea that a pipeline is a directed acyclic graph (DAG).
Secondly, attempts to work around this limitation, e.g., by using providers that perform no operation instead of a correction, depending on a flag are hampered by cumbersome and misleading domain type naming.
Adding additional corrections furthermore require access to the source code of the pipeline, which is not always practical when the pipeline is part of a library.

In [None]:
from typing import NewType
import sciline

RawData = NewType('RawData', float)
CorrectedData = NewType('CorrectedData', float)
CorrectionA = NewType('CorrectionA', float)
CorrectionB = NewType('CorrectionB', float)


def compute_correction_a() -> CorrectionA:
    # Placeholder for actual computation logic
    return 1.0


def compute_correction_b() -> CorrectionB:
    # Placeholder for actual computation logic
    return 2.0


def _do_correction_a(raw_data: float, correction_a: CorrectionA) -> float:
    return raw_data * correction_a


def _do_correction_b(raw_data: float, correction_b: CorrectionB) -> float:
    return raw_data - correction_b


def apply_a(raw_data: RawData, correction_a: CorrectionA) -> CorrectedData:
    corrected_data = _do_correction_a(raw_data, correction_a)
    return CorrectedData(corrected_data)


def apply_b(raw_data: RawData, correction_b: CorrectionB) -> CorrectedData:
    corrected_data = _do_correction_b(raw_data, correction_b)
    return CorrectedData(corrected_data)


def apply_a_and_b(
    raw_data: RawData, correction_a: CorrectionA, correction_b: CorrectionB
) -> CorrectedData:
    corrected_data = _do_correction_a(raw_data, correction_a)
    corrected_data = _do_correction_b(corrected_data, correction_b)
    return CorrectedData(corrected_data)


pl = sciline.Pipeline((compute_correction_a, compute_correction_b, apply_a_and_b))
pl.visualize(mode='both')

Above, we used `apply_a_and_b` to apply both corrections `a` and `b`.
To control which corrections to apply, we can insert the desired `apply` function into the pipeline:

In [None]:
pl.insert(apply_a)
pl.visualize(mode='both')

In [None]:
pl.insert(apply_b)
pl.visualize(mode='both')

While this does not solve all problems, it allows us to create a pipeline that can be easily modified to include or exclude corrections.
The pipeline author will need to foresee this in the pipeline design, and thin wrapper functions need to be maintained for each combination of corrections.
While there is an upfront cost to this, it will allow pipeline users to not only select between the pre-defined corrections, but also add new corrections or combinations of corrections to the pipeline.
The limitation is that these corrections will all be applied at the same point in the pipeline, unless the pipeline author has foreseen this by allowing for different correction stages in the pipeline.