## 1. Introduction
**tf.data: Build TensorFlow input pipelines**

The *tf.data* API enables you to build complex input pipelines from simple, reusable pieces. For example, the pipeline for an image model might aggregate data from files in a distributed file system, apply random perturbations to each image, and merge randomly selected images into a batch for training. The pipeline for a text model might involve extracting symbols from raw text data, converting them to embedding identifiers with a lookup table, and batching together sequences of different lengths. 

The *tf.data* API makes it possible to handle large amounts of data, read from different data formats, and perform complex transformations.

The *tf.data* API introduces a *tf.data.Dataset* abstraction that represents a sequence of elements, in which each element consists of one or more components. For example, in an image pipeline, an element might be a single training example, with a pair of tensor components representing the image and its label.

In order to use a Dataset we need three steps:
1. **Importing Data**: Create a Dataset instance from some data
2. **Create a Iterator**: By using the created dataset to make an iterator instance to iterate through the dataset
3. **Consuming Data**: By using the created iterator we can get the elements from the dataset to feed the model.

There are two distinct ways to import/create a dataset:
1. A data source constructs a Dataset from data stored in memory or in one or more files.
2. A data transformation constructs a dataset from one or more *tf.data.Dataset* objects.

Refernz:
1. [how to use dataset in tensor-flow](https://towardsdatascience.com/how-to-use-dataset-in-tensorflow-c758ef9e4428)



In [2]:
import tensorflow as tf

In [3]:
import pathlib
import os
import matplotlib.pyplot as plt
#import pandas as pd
import numpy as np
np.set_printoptions(precision=4)

## 2. Importing Data
we first need some data to put inside our dataset

### 2.1 From Numpy
This is the common case, we have a numpy array and we want to pass it to tensorflow.

In [59]:
x = [1984, 2, 3, 4, 7, 23]
dataset = tf.data.Dataset.from_tensor_slices(x)
dataset

<TensorSliceDataset shapes: (), types: tf.int32>

In [48]:
[print(elem.numpy()) for elem in dataset]

1984
2
3
4
7
23


[None, None, None, None, None, None]

In [49]:
## Or by explicitly creating a Python iterarot using *iter* and 
# consuming its elemenet using *next*
it=iter(dataset)
[print(j.numpy()) for j in it]

1984
2
3
4
7
23


[None, None, None, None, None, None]

In [52]:
print(dataset.reduce(0, lambda state, value: state+value).numpy())

2023


In [30]:
y = np.array([1984, 2, 3, 4, 7, 23])
dataset_ = tf.data.Dataset.from_tensor_slices(y)
dataset_

<TensorSliceDataset shapes: (), types: tf.int64>

In [36]:
[print(elem.numpy()) for elem in dataset_]

1984
2
3
4
7
23


[None, None, None, None, None, None]

In [32]:
## Or by explicitly creating a Python iterarot using *iter* and 
# consuming its elemenet using *next*
it=iter(dataset_)
[print(j.numpy()) for j in it]

1984
2
3
4
7
23


[None, None, None, None, None, None]

In [53]:
print(dataset_.reduce(np.array(0), lambda state, value: state+value).numpy())

2023


In [55]:
x = np.random.sample((200,4))
dataset_2 = tf.data.Dataset.from_tensor_slices(x)
dataset_2

<TensorSliceDataset shapes: (4,), types: tf.float64>

In [56]:
features, labels = (np.random.sample((100,2)), np.random.sample((100,1)))
dataset_3 = tf.data.Dataset.from_tensor_slices((features,labels))
dataset_3

<TensorSliceDataset shapes: ((2,), (1,)), types: (tf.float64, tf.float64)>

## 3. Reading input data

### 3.1 Consuming NumPy arrays

In [49]:
train, test = tf.keras.datasets.fashion_mnist.load_data()

In [67]:
type(train)

tuple

In [57]:
print(len(train))

2


In [62]:
len(train[0])

60000

In [64]:
len(train[1])

60000

In [65]:
images, labels = train

In [69]:
type(labels)

numpy.ndarray

In [70]:
images = images/255

In [71]:
dataset = tf.data.Dataset.from_tensor_slices((images, labels))
dataset

<TensorSliceDataset shapes: ((28, 28), ()), types: (tf.float64, tf.uint8)>