The `tf.data` module contains a collection of classes that allows you to easily load data, manipulate it, and pipe it into your model.

In [1]:
import tensorflow as tf

In [3]:
import iris_data

In [18]:
train, test = iris_data.load_data()
features, labels = train

In [28]:
batch_size=100
iris_data.train_input_fn(features, labels, batch_size)

<BatchDataset shapes: ({SepalLength: (?,), SepalWidth: (?,), PetalLength: (?,), PetalWidth: (?,)}, (?,)), types: ({SepalLength: tf.float64, SepalWidth: tf.float64, PetalLength: tf.float64, PetalWidth: tf.float64}, tf.int64)>

`train_input_fn` looks like this

In [12]:
def train_input_fn(features, labels, batch_size):
    """An input function for training"""
    # Convert the inputs to a Dataset.
    dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels))

    # Shuffle, repeat, and batch the examples.
    dataset = dataset.shuffle(1000).repeat().batch(batch_size)

    # Return the dataset.
    return dataset

A dataset represent slices of the array. A dataset does not know how many elements it contains. 

A dataset can be seen as a data structure capable of handling a large set of arrays. 

#### Manipulation
The `shuffle` method uses a fixed-size buffer to shuffle the items as they pass through. In this case the buffer_size is greater than the number of examples in the Dataset, ensuring that the data is completely shuffled (The Iris data set only contains 150 examples).

The `repeat` method restarts the Dataset when it reaches the end. To limit the number of epochs, set the count argument.

The `batch` method collects a number of examples and stacks them, to create batches. This adds a dimension to their shape. The new dimension is added as the first dimension.