# Recipes

## Avoiding side effects

It is strongly discouraged to use [side effects](https://en.wikipedia.org/wiki/Side_effect_%28computer_science%29) in code that runs as part of a pipeline.
This applies to, among others, file output, setting global variables, or communicating over a network.
The reason is that side effects rely on code running in a specific order.
But pipelines in Sciline have a relaxed notion of time in that the scheduler determines when and if a provider runs.

### File output

Files typically only need to be written at the end of a pipeline.
We can use [Pipeline.bind_and_call](../generated/classes/sciline.Pipeline.rst#sciline.Pipeline.bind_and_call) to call a function which writes the file:

In [None]:
from typing import NewType

import sciline

_fake_filesystem = {}

Param = NewType('Param', float)
Data = NewType('Data', float)
Filename = NewType('Filename', str)


def foo(p: Param) -> Data:
    return Data(2 * p)


def write_file(d: Data, filename: Filename) -> None:
    _fake_filesystem[filename] = d


pipeline = sciline.Pipeline([foo], params={Param: 3.1, Filename: 'output.dat'})

pipeline.bind_and_call(write_file)

In [None]:
_fake_filesystem

We could also write the file using

In [None]:
write_file(pipeline.compute(Data), 'output.dat')

But `bind_and_call` allows us to request additional parameters like the file name from the pipeline.
This is especially useful in combination with [generic providers](../user-guide/generic-providers.ipynb) or [parameter tables](../user-guide/parameter-tables.ipynb).

**Why is this better than writing a file in a provider?**
Using `bind_and_call` guarantees that the file gets written and that it gets written after the pipeline.
The latter prevents providers from accidentally relying on the file.

## Continue from intermediate results

It is a common need to be able to continue the pipeline from some intermediate result computed earlier.



### Setup

Lets look at a situation where we have some "raw" data files and the workflow consists of three steps
  * loading the raw data
  * cleaning the raw data
  * computing a sum of the cleaned data.

In [None]:
from typing import NewType

Filename = NewType('Filename', str)
RawData = NewType('RawData', list)
CleanData = NewType('CleanData', list)
Result = NewType('Result', list)

filesystem = {'raw.txt': list(map(str, range(10)))}

def load(filename: Filename) -> RawData:
    """Load the data from the filename."""
    data = filesystem[filename]
    return RawData(data)

def clean(raw_data: RawData) -> CleanData:
    """Clean the data, convert from str."""
    return CleanData(list(map(float, raw_data)))

def process(clean_data: CleanData) -> Result:
    """Compute the sum of the clean data."""
    return Result(sum(clean_data))


In [None]:
import sciline

pipeline = sciline.Pipeline(
    [load, clean, process,],
    params={ Filename: 'raw.txt', })
pipeline

### Setting intermediate results

If we select `Result` the task graph will use the `Filename` input because it needs to read the raw data from the file system:

In [None]:
pipeline.get(Result)

But if the cleaned data has already been produced it is unnecessary to "re-clean" it, in that case we can proceed directly from the clean data to the compute sum step.
To do this we replace the `CleanData` provider with the data that was loaded and cleaned:

In [None]:
data = pipeline.compute(CleanData)
pipeline[CleanData] = data
pipeline

Then if we select `Result` the task graph will no longer use the `Filename` input and instead it will proceed directly from the `CleanData` as input:

In [None]:
pipeline.get(Result)

In [None]:
pipeline.compute(Result)

## Replacing providers

This example shows how to replace a provider in the pipeline using the `Pipeline.insert` method.

### Setup
Same setup as in [Continue from intermediate results](#continue-from-intermediate-results).

In [None]:
pipeline = sciline.Pipeline(
    [load, clean, process,],
    params={ Filename: 'raw.txt', })
pipeline

### Replacing a provider using `Pipeline.insert`

Let's say the `clean` provider doesn't do all the preprocessing that we want it to do, we also want to remove either the odd or even numbers before processing:

In [None]:
from typing import Literal, Union

Target = NewType('Target', str)

def clean_and_remove_some(raw_data: RawData, target: Target) -> CleanData:
    if target == 'odd':
        return [n for n in map(float, raw_data) if n % 2 == 1]
    if target == 'even':
        return [n for n in map(float, raw_data) if n % 2 == 0]
    raise ValueError

To replace the old `CleanData` provider we need to use `Pipeline.insert`:

In [None]:
pipeline.insert(clean_and_remove_some)
pipeline[Target] = 'odd'

In [None]:
pipeline

Now if we select the `Result` we see that the new provider will be used in the computation:

In [None]:
pipeline.get(Result)

In [None]:
pipeline.compute(Result)