# SDK Flows

This tutorial will explain the concept of `Flows`.
Flows are used to bundle or chain a set of Tasks. E.g. Start a Training, perform a Test and run predictions on a dataset using an Inference Task.
While Tasks are more low level building blocks that require more in depth configuration (gaining flexibility), Flows are high level and easier in use. If you often use the same chain of Tasks, you can make your own Flow to structure your code better.

Flows are also used under the hood of the RVAI platform to run the whole training flow of a customer, or to perform k-fold validation.
Additionally, flows are used for algorithm CI/CD, to have a quick and elegant way to test an algorithm fully end-to-end.
## Installation

First install the required the sdk python packages.
Simplest way is to install the 'rvai' meta package which will install all SDK related packages. Access to our devpi server is required.



In [None]:
!pip install -qqq rvai==1.1.0rc51 pygraphviz rvai.pipelines.dummy_sdk_pipeline==1.1.0rc13

## Creating a Dataset

Because the definition of Cells and Pipelines is already covered in other tutorials, we use a dummy sdk pipeline for this one. Because the dummy pipeline does not really train, we create a dataset with random data.


In [None]:
# required RVAI base class
from rvai.base.data import Dataset

# used for typing
from rvai.types import Image, BoundingBox, Point
from typing import Sequence, Tuple
import numpy as np

from rvai.pipelines.dummy_sdk_pipeline.dummy_cells import DummySamples, DummyAnnotations

In [None]:
class DummyDataset(
    Dataset[DummySamples, DummyAnnotations]
):
    def __init__(
        self, length: int
    ):
        # create random data
        self.images = [Image(np.random.random((100, 50, 3))) for i in range(length)]
        self.bounding_boxes = [BoundingBox(p1=Point(x=0, y=0), p2=Point(x=1, y=1)) for i in range(length)]

    def __getitem__(
        self, index
    ) -> Tuple[DummySamples, DummyAnnotations]:
        return (
            DummySamples(image=self.images[index]),
            DummyAnnotations(bbox=self.bounding_boxes[index]),
        )

    def __len__(self):
        return len(self.images)


train_dataset = DummyDataset(80)
validation_dataset = DummyDataset(10)
test_dataset = DummyDataset(1)

# display an example image and its label
samples, annotations = train_dataset[0]
print(annotations)

### Train - Evaluate - Test Flow
The `TrainTestEvalFlow` executes a Training, a Test and additionally performs a prediction on each sample in the optional test_dataset. The results of all tasks are passed through via the `.updates()` method. In the RVAI platform, all prediction results are saved to the database to later visualize them in the UI.

In [None]:
from rvai.base.runtime import init
from rvai.base.inference import PredictionResult

from rvai.base.flows import KFoldTrainTestEvalFlow, TrainTestEvalFlow
from rvai.pipelines.dummy_sdk_pipeline.dummy_sdk_pipeline import DummySDKPipeline
from rvai.pipelines.dummy_sdk_pipeline.dummy_cells import DummyTrainingParameters

In [None]:
rt = init('ray')

# Get the training pipeline
inference_pipeline = DummySDKPipeline.build()
cell_ref, cell_base = inference_pipeline.trainable_cells[0]

training_pipeline = inference_pipeline.get_training_pipeline(cell_base)
trainable_cell_ref = training_pipeline.get_cell_ref(training_pipeline.trainable_cell)
parameters = DummyTrainingParameters(epochs=5)

# define the flow
flow = TrainTestEvalFlow(
        runtime=rt,
        pipeline=training_pipeline,
        models={},
        parameters={trainable_cell_ref: parameters},
        train_dataset=train_dataset,
        validation_dataset=validation_dataset,
        test_dataset=test_dataset,
)

# Start the flow & follow up on updates
flow.start()
async for update in flow.updates():
    print(update)

# Finally get the result, containing metrics and model_path
result = await flow
print(result)

## K-fold cross validation setup

The k-fold cross validation is used for a very limited set of customers who want to validate their models on 100% of their data.
So the k-fold cross validation flow, does k trainings, each time with $\frac{1}{k}\%$ of the data as testset and $1-\frac{1}{k}\%$ as train dataset.
In this way, after $k$ times, the whole dataset has been in the testset once.

In [None]:
# define the flow
flow = KFoldTrainTestEvalFlow(
        runtime=rt,
        pipeline=training_pipeline,
        models={},
        parameters={trainable_cell_ref: parameters},
        train_dataset=train_dataset,
        validation_dataset=validation_dataset,
        test_dataset=test_dataset,
        folds=3,
)

# Start the flow & follow up on updates
flow.start()
async for update in flow.updates():
    # filter so the output is not flood
    if not isinstance(update, PredictionResult):
        print(update)

# Finally get the result, containing metrics and model_path
result = await flow
print(result)

In [None]:
rt.stop()