## 1. Introduction
**tf.data: Build TensorFlow input pipelines**

The *tf.data* API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training. The pipeline for a text model might involve extracting symbols from raw text data, converting them to embedding identifiers with a lookup table, and batching together sequences of different lengths. 

The *tf.data* API makes it possible to handle large amounts of data, read from different data formats, and perform complex transformations.

The *tf.data* API introduces a *tf.data.Dataset* abstraction that represents a sequence of elements, in which each element consists of one or more components. For example, in an image pipeline, an element might be a single training example, with a pair of tensor components representing the image and its label.

There are two distinct ways to create a dataset:
1. A data source constructs a Dataset from data stored in memory or in one or more files.
2. A data transformation constructs a dataset from one or more *tf.data.Dataset* objects.





In [1]:
import tensorflow as tf

In [4]:
import pathlib
import os
import matplotlib.pyplot as plt
#import pandas as pd
import numpy as np
np.set_printoptions(precision=4)

## 2. Basic Mechanism

In [47]:
dataset = tf.data.Dataset.from_tensor_slices([1984,8, 5, 6, 8, 10])
dataset

<TensorSliceDataset shapes: (), types: tf.int32>

In [18]:
[print(elem.numpy()) for elem in dataset]

1984
4
5
6
8
10


[None, None, None, None, None, None]

In [38]:
## Or by explicitly creating a Python iterarot using *iter* and consuming its elemenet using *next*
it=iter(dataset)
[print(j.numpy()) for j in it]


1984
4
5
6
8
10


[None, None, None, None, None, None]

In [48]:
print(dataset.reduce(0, lambda state, value: state+value).numpy())

2021


## 3. Reading input data

### 3.1 Consuming NumPy arrays

In [49]:
train, test = tf.keras.datasets.fashion_mnist.load_data()

In [67]:
type(train)

tuple

In [57]:
print(len(train))

2


In [62]:
len(train[0])

60000

In [64]:
len(train[1])

60000

In [65]:
images, labels = train

In [69]:
type(labels)

numpy.ndarray

In [70]:
images = images/255

In [71]:
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset

<TensorSliceDataset shapes: ((28, 28), ()), types: (tf.float64, tf.uint8)>