# Better Performance with tf.data API

GPU and TPU can reduce the time required to execute a single training step. 

In [1]:
import tensorflow as tf
import time

# artificial dataset
class ArtificialDataset(tf.data.Dataset):
    
    def _generator(num_samples):
        # open the file simulation
        time.sleep(0.03)
        
        for sample_idx in range(num_samples):
            # simulate reading a line from file
            time.sleep(0.015)
            yield (sample_idx,)
            
    def __new__(cls,num_samples=3):
        return tf.data.Dataset.from_generator(
        cls._generator,
        output_signature= tf.TensorSpec(shape=(1,),dtype=tf.int64),
        args=(num_samples,))

# dummy training loop
def benchmark(dataset,num_of_epochs=2):
    start_time = time.perf_counter()
    for epoch_num in range(num_of_epochs):
        for sample in dataset:
            # simulate  a training step
            time.sleep(0.01)
    print("Execution time :",time.perf_counter() - start_time)

## Naive approach

In the naive approach the execution happens in **synchronous mode**. When pipeline fetches the data the model is sitting idle and when the model executes the pipeline sits idele. The training time is sum of pipeline opening, reading and training times as below:

![SequentialExecution](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/Appendix/API/tf.data/tf.data.sequential.jpg)

In [2]:
# naive approach.
benchmark(ArtificialDataset())

Execution time : 0.40446649999999984


## Prefetching

Prefetching overlaps preprocessing and model execution of a training step. While the model is executing training step's, the input pipeline is reading data for step + 1. 

The prefetch transformation decouple the time when data is produced from the time data is consumed. The number of elements to prefetch should be equal to number of batches consumed by training step. This value can be tuned manually or set it to **tf.data.AUTOTUNE** which will prompt the runtime to tune the value dynamically at runtime.

![Prefetching](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/Appendix/API/tf.data/tf.data.prefetching.jpg)

In [4]:
# prefetching
benchmark(ArtificialDataset().prefetch(tf.data.AUTOTUNE))

Execution time : 0.30005549999999914


## Sequential Interleave

Data preparation in sequence.

![Sequential Interleave](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/Appendix/API/tf.data/tf.data.performance.seq.interleave.jpg)

In [5]:
benchmark(
    tf.data.Dataset.range(2)
    .interleave(lambda _: ArtificialDataset())
)

Execution time : 0.6885343000000148


## Parallel Interleave

Data preparation in parallel
![Parallel Interleave](https://raw.githubusercontent.com/rkumar-bengaluru/data-science/main/20-Tensorflow/Appendix/API/tf.data/tf.data.performance.prallel.interleave.jpg)

In [7]:
benchmark(
    tf.data.Dataset.range(2)
    .interleave(
        lambda _: ArtificialDataset(),
        num_parallel_calls=tf.data.AUTOTUNE
    )
)

Execution time : 0.367986099999996


In [8]:
tf.__version__

'2.10.1'