# Using Data Pipelines

We can **fit** the train data as a **tf Dataset** iterator

In [2]:
import tensorflow as tf

For the model, let's use a sub-classed model as done in the preceeding notebook

In [14]:
class MyModel(tf.keras.Model):
    def __init__(self):
        super(MyModel, self).__init__()
        
        inputs = tf.keras.Input(shape=(28, 28))

        self.x0 = tf.keras.layers.Flatten()
        self.x1 = tf.keras.layers.Dense(512, activation='relu')
        self.x2 = tf.keras.layers.Dropout(0.4)
        
        self.output_pred = tf.keras.layers.Dense(10,
                                            activation='softmax')
    def call(self, inputs):
        x = self.x0(inputs)
        # Loop through each of the 2 layers
        for i in range(1, 3): 
            x = getattr(self, f'x{i}')(x)

        return self.output_pred(x) 

### Load the data

In [4]:
mnist = tf.keras.datasets.mnist

In [5]:
mnist_data = mnist.load_data()
(x_train, y_train), (x_test, y_test) = mnist_data

In [6]:
x_train, x_test = tf.cast(x_train/255., dtype=tf.float32), tf.cast(x_test/255., dtype=tf.float32)
y_train, y_test = tf.cast(y_train, dtype=tf.int64), tf.cast(y_test, dtype=tf.int64)

### Apply pipelines to data

- The `data.Dataset` iterator does much of the work that goes in preparing the data 

In [20]:
batch_size = 32
buffer_size = 10000
n_epochs = 10

In [33]:
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(batch_size).shuffle(buffer_size)
train_dataset = train_dataset.map(lambda x, y : (tf.image.random_flip_left_right(x), y)).repeat()

We **map** method invokes a method that randomly flips one of every 2 images across the y axid.
- This increases the size of the train dataset

**repeat** refeads the data from the beginning to the end

In [34]:
test_dataset = tf.data.Dataset.from_tensor_slices((x_test, y_test)).batch(batch_size)

test_dataset = test_dataset.shuffle(buffer_size).repeat()

### Fitting the model

In [35]:
model = MyModel()

model.compile(optimizer=tf.keras.optimizers.Adam(),
             loss='sparse_categorical_crossentropy',
             metrics=['accuracy'])

In [36]:
steps = x_train.numpy().shape[0]   # We need the no. of steps because of the `repeat` method
print('steps:', steps)

steps: 60000


In [None]:
model.fit(train_dataset, epochs=n_epochs, steps_per_epoch=steps)

Epoch 1/10
 7033/60000 [==>...........................] - ETA: 6:22 - loss: 0.0779 - accuracy: 0.9755

In [None]:
model.evaluate(x_test, y_test)