# Advanced Tutorial 2: Pipeline

## Overview

In this tutorial, we will discuss the following topics:

* [Iterating Through Pipeline](#ta02itp)
    * [Basic Concept](#ta02bc)
    * [Example](#ta02example)
* [Dropping Last Batch](#ta02dlb)
* [Padding Batch Data](#ta02pbd)
* [Benchmark Pipeline Speed](#ta02bps)

In the [beginner tutorial 4](../beginner/t04_pipeline.ipynb), we learned how to build data pipeline that handles data loading and preprocessing tasks efficiently. Now that you have understood some basic operations in the `Pipeline`, we will demonstrate some advanced concepts and how to leverage them to create efficient `Pipeline` in this tutorial.

<a id='ta02itp'></a>

## Iterating Through Pipeline

In many deep learning tasks, the parameter of preprocessing task is precomputed by looping through the dataset. For example, in `ImageNet` dataset, people usually use a precomputed global pixel average for each channel to normalize the image. 

<a id='ta02bc'></a>

### Basic Concept

In this section, we will see how to iterate through the pipeline in FastEstimator. First we will create sample NumpyDataset from the data dictionary and load it into `Pipeline`.

In [None]:
import numpy as np
from fastestimator.dataset.data import cifar10
    
# sample numpy array to later create datasets from them
x_train, y_train = (np.random.sample((10, 2)), np.random.sample((10, 1)))
train_data = {"x": x_train, "y": y_train}

In [2]:
import fastestimator as fe
from fastestimator.dataset.numpy_dataset import NumpyDataset

# create NumpyDataset from the sample data
dataset_fe = NumpyDataset(train_data)

pipeline_fe = fe.Pipeline(train_data=dataset_fe, batch_size=3)

Let's get the loader object for the `Pipeline`, then iterate loader with a for loop:

In [3]:
loader_fe = pipeline_fe.get_loader(mode="train")

for batch in loader_fe:
    print(batch)

{'x': tensor([[0.7191, 0.1700],
        [0.1704, 0.3014],
        [0.5974, 0.5004]], dtype=torch.float64), 'y': tensor([[0.5230],
        [0.6890],
        [0.9531]], dtype=torch.float64)}
{'x': tensor([[0.8949, 0.5802],
        [0.7846, 0.9987],
        [0.7752, 0.2345]], dtype=torch.float64), 'y': tensor([[0.4932],
        [0.6442],
        [0.9859]], dtype=torch.float64)}
{'x': tensor([[0.2825, 0.3795],
        [0.3616, 0.9058],
        [0.9917, 0.2027]], dtype=torch.float64), 'y': tensor([[0.8371],
        [0.7936],
        [0.3906]], dtype=torch.float64)}
{'x': tensor([[0.4236, 0.3067]], dtype=torch.float64), 'y': tensor([[0.2536]], dtype=torch.float64)}


<a id='ta02example'></a>

### Example

Let's say we have CIFAR-10 dataset and we want to find global average pixel value over three channels:

In [4]:
from fastestimator.dataset.data import cifar10

cifar_train, _ = cifar10.load_data()

We will take the `batch_size` 64 and load the data into `Pipeline`

In [5]:
pipeline_cifar = fe.Pipeline(train_data=cifar_train, batch_size=64)

Now we will iterate through batch data and compute the mean pixel values for all three channels of the dataset. 

In [6]:
loader_fe = pipeline_cifar.get_loader(mode="train", shuffle=False)
mean_arr = np.zeros((3))
for i, batch in enumerate(loader_fe):
    mean_arr = mean_arr + np.mean(batch["x"].numpy(), axis=(0, 1, 2))
mean_arr = mean_arr / (i+1)

In [7]:
print("Mean pixel value over the channels are: ", mean_arr)

Mean pixel value over the channels are:  [125.32287898 122.96682199 113.8856495 ]


<a id='ta02dlb'></a>

## Dropping Last Batch

If `total number of data` is not dividable by `batch_size`, by default, the last batch will have less data than other batches.  To drop the last batch we can set `drop_last` to `True`. Therefore, if the last batch is incomplete it will be dropped.

In [8]:
pipeline_fe = fe.Pipeline(train_data=dataset_fe, batch_size=3, drop_last=True)

<a id='ta02pbd'></a>

## Padding Batch Data

There might be scenario where the input tensors have different dimensions within a batch. For example, in Natural language processing, we have input strings with different lengths. For that we need to pad the data to the maximum length within the batch.


To further illustrate in code, we will take numpy array that contains different shapes of array elements and load it into the `Pipeline`.

In [9]:
# define numpy arrays with different shapes
elem1 = np.array([4, 5])
elem2 = np.array([1, 2, 6])
elem3 = np.array([3])

# create train dataset
x_train = np.array([elem1, elem2, elem3])
train_data = {"x": x_train}
dataset_fe = NumpyDataset(train_data)

We will set any `pad_value` that we want to append at the end of the tensor data. `pad_value` can be either `int` or `float`

In [10]:
pipeline_fe = fe.Pipeline(train_data=dataset_fe, batch_size=3, pad_value=0)

Now let's print the batch data after padding:

In [11]:
for elem in iter(pipeline_fe.get_loader(mode='train', shuffle=False)):
    print(elem)

{'x': tensor([[4, 5, 0],
        [1, 2, 6],
        [3, 0, 0]])}


<a id='ta02bps'></a>

## Benchmark Pipeline Speed

It is often the case that the bottleneck of deep learning training is data pipeline, as a result, GPU is underutilized. A speed diagnostic tool of `Pipeline` can help users checking the speed of pipeline and diagnose training speed problems.  The way to benchmark pipeline speed in FastEstimator is very simple: simply call `Pipeline.benchmark`.

For illustration, we will create `Pipeline` for the CIFAR-10 dataset with list of numpy operators that expand dimensions, apply `minmax` and finally rotate the input images. 

In [12]:
from fastestimator.op.numpyop.univariate import Minmax, ExpandDims
from fastestimator.op.numpyop.multivariate import Rotate

pipeline = fe.Pipeline(train_data=cifar_train,
                       ops=[ExpandDims(inputs="x", outputs="x"),
                            Minmax(inputs="x", outputs="x_out"),
                            Rotate(image_in="x_out", image_out="x_out", limit=180)],
                      batch_size=64)

Let's benchmark the processing speed in the training mode.

In [13]:
pipeline_cifar.benchmark(mode="train")

FastEstimator: Step: 100, Epoch: 1, Steps/sec: 341.9135233103771
FastEstimator: Step: 200, Epoch: 1, Steps/sec: 563.0254899674627
FastEstimator: Step: 300, Epoch: 1, Steps/sec: 671.5856149290763
FastEstimator: Step: 400, Epoch: 1, Steps/sec: 856.025916954988
FastEstimator: Step: 500, Epoch: 1, Steps/sec: 835.0631736638796
FastEstimator: Step: 600, Epoch: 1, Steps/sec: 780.2817676962413
FastEstimator: Step: 700, Epoch: 1, Steps/sec: 656.1022093672684
