## Building a data pipeline

tf.data

Goal of this tutorial
1. Learn how to use tf.data and the best practices
2. Build an efficient pipeline for loading images and preprocessing them
3. Build and efficient pipeline for text, including how to build a vocabulary

In [None]:
import os
print(os.getcwd())
import tensorflow as tf

In [None]:
dataset = tf.data.TextLineDataset("file.txt")

In [None]:
for line in dataset:
    print(line)

#We get an error
#RuntimeError: dataset.__iter__() is only supported when eager execution is enabled.

What’s really happening is that dataset is a node of the Tensorflow Graph that contains instructions to read the file. We need to initialize the graph and evaluate this node in a Session if we want to read it. While this may sound awfully complicated, this is quite the oposite : now, even the dataset object is a part of the graph, so you don’t need to worry about how to feed the data into your model !

We need to add a few things to make it work. First, let’s create an iterator object over the dataset

In [None]:
iterator = dataset.make_one_shot_iterator()

#Then you need to call get_next() to get the tensor that will contain your data
next_element = iterator.get_next()

#The one_shot_iterator method creates an iterator that will be able to iterate once over the dataset. 
#In other words, once we reach the end of the dataset, it will stop yielding elements and raise an Exception.

#Now, next_element is a graph’s node that will contain the next element of iterator over the Dataset at each execution. 
#Now, let’s run it

In [None]:
with tf.Session() as sess:
    for line in range(3):
        print(sess.run(next_element))

you can easily apply transformations to your dataset.

For instance, splitting words by space is as easy as adding one line

In [None]:
dataset = dataset.map(lambda string: tf.string_split([string]).values)

In [None]:
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    for line in range(3):
        print(sess.run(next_element))

You can even shuffle the dataset 

In [None]:
dataset = dataset.shuffle(buffer_size=3)
#It will load elements 3 by 3 and shuffle them at each iteration.

In [None]:
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    for line in range(3):
        print(sess.run(next_element))

You can even create batches

In [None]:
dataset = dataset.batch(2)

and pre-fetch the data (in other words, it will always have one batch ready to be loaded)

In [None]:
dataset = dataset.prefetch(1)

In [None]:
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))

and as you can see, we now have a batch created from the shuffled Dataset! All the nodes in the Graph are assumed to be batched: every Tensor will have shape = [None, ...] where None corresponds to the (unspecified) batch dimension

## Why we use initializable iterators

In [None]:
dataset = tf.data.TextLineDataset("file.txt")
iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

Thanks to the init_op we can chose to “restart” from the beginning. This will become quite handy when we want to perform multiple epochs !

In [None]:
with tf.Session() as sess:
    #Initialize the operator
    sess.run(init_op)
    print(sess.run(next_element))
    print(sess.run(next_element))
    sess.run(init_op) #Iterator starts from the beginning
    print(sess.run(next_element))

As we use only one session over the different epochs, we need to be able to restart the iterator. Some other approaches (like tf.Estimator) alleviate the need of using initializable iterators by creating a new session at each epoch. But this comes at a cost: the weights and the graph must be re-loaded and re-initialized with each call to estimator.train() or estimator.evaluate().

# Importing Data

In order to use a Dataset we need three steps:

1. **Importing Data** Create a Dataset instance from some data
2. **Create an Iterator** By using the created dataset to make an Iterator instance to iterate thought the dataset
3. **Consuming Data** By using the created iterator we can get the elements from the dataset to feed the model

Regardless of the type of iterator, get_next function of iterator is used to create an operation in your Tensorflow graph which when run over a session, returns the values from the fed Dataset of iterator. Also, iterator doesn’t keep track of how many elements are present in the Dataset. Hence, it is normal to keep running the iterator’s get_next operation till Tensorflow’s tf.errors.OutOfRangeError exception is occurred. This is usually the skeleton code of how a Dataset and iterator looks like.

In [None]:
# Create dataset and perform transformations on it
dataset = << Create Dataset object >>
dataset = << Perform transformations on dataset >>

# Create iterator
iterator = << Create iterator using dataset >>
next_batch = iterator.get_next()

# Create session
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())

    try: 
        # Keep running next_batch till the Dataset is exhausted
        while True:
            sess.run(next_batch)
            
    except tf.errors.OutOfRangeError:
        pass

In [None]:
#we have a numpy array and we want to pass it to tensorflow.

import numpy as np
import tensorflow as tf

x = np.random.sample((100,2))

dataset = tf.data.Dataset.from_tensor_slices(x)
print(dataset.output_types)  # <dtype: 'float64'>
print(dataset.output_shapes) #(2,)

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))

In [None]:
#We can also pass more than one numpy array, one classic example is when we have a couple of data divided into features 
#and labels

import numpy as np
import tensorflow as tf

features, labels = (np.random.sample((100,2)), np.random.sample((100,1)))

dataset = tf.data.Dataset.from_tensor_slices((features,labels))
print(dataset.output_types)  # tf.float64, tf.float64)
print(dataset.output_shapes) #(TensorShape([Dimension(2)]), TensorShape([Dimension(1)]))

iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))

In [None]:
#We can also pass a tensor
#Do not forget to use initializable iterator.

import numpy as np
import tensorflow as tf

dataset1 = tf.data.Dataset.from_tensor_slices(tf.random_uniform([100,2]))
print(dataset1.output_types)  # <dtype: 'float32'>
print(dataset1.output_shapes) #(2,)

iterator = dataset1.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(next_element))

In [None]:
#We can also pass a tuple of tensors

import numpy as np
import tensorflow as tf

dataset2 = tf.data.Dataset.from_tensor_slices((tf.random_uniform([4]),
    tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)) )
print(dataset2.output_types)  # (tf.float32, tf.int32)
print(dataset2.output_shapes) #TensorShape([]), TensorShape([Dimension(100)]))

iterator = dataset2.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(next_element))

In [None]:
#We can also pass a nested tuple of tensors
import numpy as np
import tensorflow as tf

dataset3 = tf.data.Dataset.zip((dataset1,dataset2))
print(dataset3.output_types)  # (tf.float32, (tf.float32, tf.int32))
print(dataset3.output_shapes) #(TensorShape([Dimension(2)]), (TensorShape([]), TensorShape([Dimension(100)])))

iterator = dataset3.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op)
    print(sess.run(next_element))

In [None]:
#It is often convenient to give names to each component of an element, for example if they represent different features 
#of a training example.

dataset = tf.data.Dataset.from_tensor_slices(
   {"a": tf.random_uniform([4]),
    "b": tf.random_uniform([4, 100], maxval=100, dtype=tf.int32)})
print(dataset.output_types)  # ==> "{'a': tf.float32, 'b': tf.int32}"
print(dataset.output_shapes)  # ==> "{'a': (), 'b': (100,)}"

In [None]:
#We can also pass a placeholder
#This is useful when we want to dynamic change the data inside the Dataset, 

x=tf.placeholder(tf.float32, shape=[None,2])
dataset = tf.data.Dataset.from_tensor_slices(x)

data = np.random.sample((100,2))

iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op, feed_dict={x:data})
    print(sess.run(next_element))

In [None]:
#We can also initialise a Dataset from a generator, this is useful when we have an array of different elements length 
#(e.g a sequence)

sequence = np.array([[[1]],[[2],[3]],[[3],[4],[5]]])

def generator():
    for next_element in sequence:
        yield next_element
        
dataset = tf.data.Dataset().batch(1).from_generator(generator, output_types = tf.int64, output_shapes=(tf.TensorShape([None,1])))

iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op)
    for i in range(3):
        print(sess.run(next_element))

#In this case you also need specify the types and the shapes of your data that will be used to create the correct tensors.

# READ FROM A CSV FILE

In [None]:
import numpy as np
import tensorflow as tf

CSV_path = '/Users/mustafamuratarat/data.csv'
dataset = tf.contrib.data.make_csv_dataset(CSV_path, batch_size = 15)

iterator = dataset.make_one_shot_iterator()
next_element =  iterator.get_next()
print(next_element)
inputs, labels = next_element['Estimation'], next_element['Algorithm']

with tf.Session() as sess:
    print(sess.run([inputs, labels]))

## Datasets Transformations

In [None]:
dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
# Create a dataset with data of [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
import tensorflow as tf
tf.enable_eager_execution()

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.repeat(2)
# Duplicate the dataset
# Data will be [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

iterator = dataset.make_one_shot_iterator()

for i in iterator:
    print(i)

In [None]:
import tensorflow as tf
tf.enable_eager_execution()

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.shuffle(5)
# Shuffle the dataset
# Assumed shuffling: [3, 0, 7, 9, 4, 2, 5, 0, 1, 7, 5, 9, 4, 6, 2, 8, 6, 8, 1, 3]
iterator = dataset.make_one_shot_iterator()

for i in iterator:
    print(i)

In [None]:
#In Map transformation, you can apply some operations to all the individual data elements in your dataset. 
#Use this particular transformation to apply various types of data augmentation

import tensorflow as tf
tf.enable_eager_execution()

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))

def map_fn(x):
    return x * 3

dataset = dataset.map(map_fn)
# Same as dataset = dataset.map(lambda x: x * 3)
# Multiply each element with 3 using map transformation
# Dataset: [0, 3, 6, 9, 12, 15, 18, 21, 24, 27]
iterator = dataset.make_one_shot_iterator()

for i in iterator:
    print(i)

In [None]:
#During the course of training, if you wish to filter out some elements from Dataset, use filter function.

import tensorflow as tf
tf.enable_eager_execution()

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))

def filter_fn(x):
    return tf.reshape(tf.not_equal(x % 5, 1), [])

dataset = dataset.filter(filter_fn)
# Same as dataset = dataset.filter(lambda x: tf.reshape(tf.not_equal(x % 5, 1), []))
# Filter out all those elements whose modulus 5 returns 1
# Dataset: [0, 2, 3, 4, 5, 7, 8, 9]

iterator = dataset.make_one_shot_iterator()

for i in iterator:
    print(i)

In [None]:
import tensorflow as tf
tf.enable_eager_execution()

dataset = tf.data.Dataset.from_tensor_slices(tf.range(10))
dataset = dataset.batch(4)


iterator = dataset.make_one_shot_iterator()

for i in iterator:
    print(i)

## Ordering of Transformation

The ordering of the application of the transformation is very important. Your model may learn differently for the same Dataset but differently ordered transformations. Take a look at the code sample in which it has been shown that different set of data is produced.

In [None]:
# Ordering #1
dataset1 = tf.data.Dataset.from_tensor_slices(tf.range(10))
# Dataset: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

dataset1 = dataset1.batch(4)
# Dataset: [0, 1, 2, 3], [4, 5, 6, 7], [8, 9]

dataset1 = dataset1.repeat(2)
# Dataset: [0, 1, 2, 3], [4, 5, 6, 7], [8, 9], [0, 1, 2, 3], [4, 5, 6, 7], [8, 9]
# Notice a 2 element batch in between

dataset1 = dataset1.shuffle(4)
# Shuffles at batch level.
# Dataset: [0, 1, 2, 3], [4, 5, 6, 7], [8, 9], [8, 9], [0, 1, 2, 3], [4, 5, 6, 7]



# Ordering #2
dataset2 = tf.data.Dataset.from_tensor_slices(tf.range(10))
# Dataset: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

dataset2 = dataset2.shuffle(4)
# Dataset: [3, 1, 0, 4, 5, 8, 6, 9, 7, 2]

dataset2 = dataset2.repeat(2)
# Dataset: [3, 1, 0, 4, 5, 8, 6, 9, 7, 2, 3, 1, 0, 4, 5, 8, 6, 9, 7, 2]

dataset2 = dataset2.batch(4)
# Dataset: [3, 1, 0, 4], [5, 8, 6, 9], [7, 2, 3, 1], [0, 4, 5, 8], [6, 9, 7, 2]

## Create an Iterator

There exists 4 types of iterators.
1. **One shot**. It can iterate once through a dataset, you cannot feed any value to it.
2. **Initializable** You can dynamically change calling its initializer operation and passing the new data with feed_dict. It is basically a bucket that you can fill with stuff.
3. **Reinitializable**. It can be initialised from different Dataset. Very useful when you have a training dataset that needs some additional transformation and a testing dataset. It is like using a tower crane to select different container.
4. **Feedable**. It can be used to select with iterator to use. Following the previous example, it is like a tower crane that selects which tower crane to use to select which container to take.

# One Shot Iterator

In [None]:
import numpy as np
import tensorflow as tf

x=np.random.sample((100,2))
dataset = tf.data.Dataset.from_tensor_slices(x)
iterator = dataset.make_one_shot_iterator()
next_element = iterator.get_next()

with tf.Session() as sess:
    print(sess.run(next_element))

## Initializable Iterator

In [None]:
x = tf.placeholder(tf.float32, shape =[None,2])
dataset = tf.data.Dataset.from_tensor_slices(x)

data = np.random.sample((100,2))

iterator = dataset.make_initializable_iterator()
next_element = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    sess.run(init_op, feed_dict ={x:data})
    print(sess.run(next_element))

Imagine that now we have a train set and a test set, a real common scenario:

In [None]:
import numpy as np
import tensorflow as tf
#initializable iterator to switch between dataset
epochs = 10

x, y = tf.placeholder(tf.float32, shape =[None,2]), tf.placeholder(tf.float32, shape =[None,1])
dataset = tf.data.Dataset.from_tensor_slices((x,y))

train_data = (np.random.sample((100,2)), np.random.sample((100,1)))
test_data = (np.array([[1,2]]), np.array([[0]]))

iterator = dataset.make_initializable_iterator()
features, labels = iterator.get_next()
init_op = iterator.initializer

with tf.Session() as sess:
    #initialise iterator with train data
    sess.run(init_op, feed_dict ={x:train_data[0], y: train_data[1]})
    for _ in range(epochs):
        print(sess.run([features, labels]))
    #switch to test data
    sess.run(init_op, feed_dict ={x:test_data[0], y: test_data[1]})
    print(sess.run([features, labels]))

As can be seen, using initializer operation, we have changed the dataset between training and test using the same Dataset object.

This iterator is very ideal when you have to train your model with datasets which are split across multiple places and you are not able to accumulate them into one place.

## Reinitializable Iterator

The concept is similar to before, we want to dynamic switch between data. But instead of feed new data to the same dataset, we switch dataset. As before, we want to have a train dataset and a test dataset

In [None]:
import numpy as np
import tensorflow as tf

epochs = 10

#create some data
training_data = (np.random.sample((100,2)), np.random.sample((100,1)))
testing_data = (np.random.sample((10,2)), np.random.sample((10,1)))

#create two datasets, one for training and one for testing
training_dataset = tf.data.Dataset.from_tensor_slices(training_data)
testing_dataset = tf.data.Dataset.from_tensor_slices(testing_data)

#this is the trick, we create a generic iterator
iterator = tf.data.Iterator.from_structure(training_dataset.output_types, training_dataset.output_shapes)

#create initialization operations
training_init_op = iterator.make_initializer(training_dataset)
testing_init_op = iterator.make_initializer(testing_dataset)

features, labels = iterator.get_next()

with tf.Session() as sess:
    sess.run(training_init_op) #swtich to train dataset
    for _ in range(epochs):
        print(sess.run([features, labels]))
    sess.run(testing_init_op) #swtich to testing dataset
    print(sess.run([features, labels]))

## Feedable Iterator
This is very similar to the reinitializable iterator, but instead of switch between datasets, it switch between iterators.


In [None]:
import numpy as np
import tensorflow as tf

epochs = 10

#create some data
training_data = (np.random.sample((100,2)), np.random.sample((100,1)))
testing_data = (np.random.sample((10,2)), np.random.sample((10,1)))

#create placeholder
x, y = tf.placeholder(tf.float32, shape =[None,2]), tf.placeholder(tf.float32, shape =[None,1])

#create two datasets, one for training and one for testing
training_dataset = tf.data.Dataset.from_tensor_slices((x,y))
testing_dataset = tf.data.Dataset.from_tensor_slices((x,y))

#Then, we can create our iterator, in this case we use the initializable iterator, 
#but you can also use a one shot iterator
training_iterator = training_dataset.make_initializable_iterator()
testing_iterator = testing_dataset.make_initializable_iterator()

#Now, we need to defined and handle , that will be out placeholder that can be dynamically changed.
handle = tf.placeholder(tf.string, shape=[])

#Then, similar to before, we define a generic iterator using the shape of the dataset
iterator = tf.data.Iterator.from_string_handle(handle, training_dataset.output_types, training_dataset.output_shapes)

#Then, we get the next elements
next_elements = iterator.get_next()

#create initialization operations
training_init_op = training_iterator.initializer
testing_init_op = testing_iterator.initializer

with tf.Session() as sess:
    train_handle = sess.run(training_iterator.string_handle())
    test_handle = sess.run(testing_iterator.string_handle())
    #initialize iterators
    sess.run(training_init_op, feed_dict ={x:training_data[0], y: training_data[1]})
    sess.run(testing_init_op, feed_dict ={x:testing_data[0], y: testing_data[1]})
    
    for _ in range(epochs):
        x,y = sess.run(next_elements, feed_dict = {handle: train_handle})
        print(x,y)
    x,y = sess.run(next_elements, feed_dict = {handle: test_handle})
    print([x,y])

# Consuming Data

In order to pass the data to a model we have to just pass the tensors generated from get_next()

In [None]:
import numpy as np
import tensorflow as tf

epochs = 10
batch_size = 16

#using two numpy arrays
features, labels = (np.array([np.random.sample((100,2))]), np.array([np.random.sample((100,1))]))
dataset = tf.data.Dataset.from_tensor_slices((features,labels)).repeat().batch(batch_size)
iterator = dataset.make_one_shot_iterator()
x, y = iterator.get_next()

#make a simple neural network model
net = tf.layers.dense(x, 8, activation = tf.tanh)
net = tf.layers.dense(net, 8, activation = tf.tanh)
prediction = tf.layers.dense(net, 1, activation = tf.tanh)
loss = tf.losses.mean_squared_error(labels = y, predictions=prediction)
train_op = tf.train.AdamOptimizer().minimize(loss)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for i in range(epochs):
        _, loss_value = sess.run([train_op, loss])
        #data (100X2, 100X1) will pass two times for one epoch
        print(sess.run([x,y]))
        print("Iter: {}, Loss: {:.4f}".format(i, loss_value))

In [None]:
import numpy as np
import tensorflow as tf

#Switch between train and test set using Initializable iterator
EPOCHS = 10
#create a placeholder to dynamically switch between batch sizes
batch_size = tf.placeholder(tf.int64)
BATCH_SIZE = 32

x, y = tf.placeholder(tf.float32, shape=[None,2]), tf.placeholder(tf.float32, shape=[None,1])
dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(batch_size).repeat()

# using two numpy arrays
train_data = (np.random.sample((100,2)), np.random.sample((100,1)))
test_data = (np.random.sample((20,2)), np.random.sample((20,1)))

iterator = dataset.make_initializable_iterator()
features, labels = iterator.get_next()
init_op = iterator.initializer
# make a simple model
net = tf.layers.dense(features, 8, activation=tf.tanh) # pass the first value from iter.get_next() as input
net = tf.layers.dense(net, 8, activation=tf.tanh)
prediction = tf.layers.dense(net, 1, activation=tf.tanh)
loss = tf.losses.mean_squared_error(prediction, labels) # pass the second value from iter.get_net() as label
train_op = tf.train.AdamOptimizer().minimize(loss)
n_batches = train_data[0].shape[0] // BATCH_SIZE

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    #initialise iterator with train data
    sess.run(init_op, feed_dict={x: train_data[0], y: train_data[1], batch_size: BATCH_SIZE})
    print('Training...')
    for i in range(EPOCHS):
        total_loss=0
        for _ in range(n_batches):
            _, loss_value = sess.run([train_op, loss])
            #print(sess.run([features, labels]))
            total_loss += loss_value
        print("Iter: {}, Loss: {:.4f}".format(i, total_loss / n_batches))
    #initialise iterator with test data
    sess.run(init_op, feed_dict={x: test_data[0], y:test_data[1], batch_size: test_data[0].shape[0]})
    print("Test: {:.4f}".format(sess.run(loss)))

In [None]:
# Wrapping all together -> Switch between train and test set using Reinitializable iterator
EPOCHS = 10
# create a placeholder to dynamically switch between batch sizes
batch_size = tf.placeholder(tf.int64)

x, y = tf.placeholder(tf.float32, shape=[None,2]), tf.placeholder(tf.float32, shape=[None,1])
train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(batch_size).repeat()
test_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(batch_size) # always batch even if you want to one shot it
# using two numpy arrays
train_data = (np.random.sample((100,2)), np.random.sample((100,1)))
test_data = (np.random.sample((20,2)), np.random.sample((20,1)))

# create a iterator of the correct shape and type
iter = tf.data.Iterator.from_structure(train_dataset.output_types,
                                           train_dataset.output_shapes)
features, labels = iter.get_next()
# create the initialisation operations
train_init_op = iter.make_initializer(train_dataset)
test_init_op = iter.make_initializer(test_dataset)

# make a simple model
net = tf.layers.dense(features, 8, activation=tf.tanh) # pass the first value from iter.get_next() as input
net = tf.layers.dense(net, 8, activation=tf.tanh)
prediction = tf.layers.dense(net, 1, activation=tf.tanh)

loss = tf.losses.mean_squared_error(prediction, labels) # pass the second value from iter.get_net() as label
train_op = tf.train.AdamOptimizer().minimize(loss)
n_batches = train_data[0].shape[0] // 16
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    # initialise iterator with train data
    sess.run(train_init_op, feed_dict = {x : train_data[0], y: train_data[1], batch_size: 16})
    print('Training...')
    for i in range(EPOCHS):
        tot_loss = 0
        for _ in range(n_batches):
            _, loss_value = sess.run([train_op, loss])
            tot_loss += loss_value
        print("Iter: {}, Loss: {:.4f}".format(i, tot_loss / n_batches))
    # initialise iterator with test data
    sess.run(test_init_op, feed_dict = {x : test_data[0], y: test_data[1], batch_size:len(test_data[0])})
    print('Test Loss: {:4f}'.format(sess.run(loss)))

# BENCHMARK 

In [None]:
log_time = {}
# copied form https://medium.com/pythonhive/python-decorator-to-measure-the-execution-time-of-methods-fa04cb6bb36d
def how_much(method):
    def timed(*args, **kw):
        ts = time.time()
        result = method(*args, **kw)
        te = time.time()
        
        if 'log_time' in kw:
            name = kw.get('log_name', method.__name__)
            kw['log_time'][name] = (te - ts)
            
        return result
    return timed

In [None]:
# benchmark
import time
DATA_SIZE = 5000
DATA_SHAPE = ((32,32),(20,))
BATCH_SIZE = 64 
N_BATCHES = DATA_SIZE // BATCH_SIZE
EPOCHS = 10

test_size = (DATA_SIZE//100)*20 

train_shape = ((DATA_SIZE, *DATA_SHAPE[0]),(DATA_SIZE, *DATA_SHAPE[1]))
test_shape = ((test_size, *DATA_SHAPE[0]),(test_size, *DATA_SHAPE[1]))
print(train_shape, test_shape)
train_data = (np.random.sample(train_shape[0]), np.random.sample(train_shape[1]))
test_data = (np.random.sample(test_shape[0]), np.random.sample(test_shape[1]))

In [None]:
# used to keep track of the methodds
log_time = {}

tf.reset_default_graph()
sess = tf.InteractiveSession()

input_shape = [None, *DATA_SHAPE[0]] # [None, 64, 64, 3]
output_shape = [None,*DATA_SHAPE[1]] # [None, 20]
print(input_shape, output_shape)

x, y = tf.placeholder(tf.float32, shape=input_shape), tf.placeholder(tf.float32, shape=output_shape)

@how_much
def one_shot(**kwargs):
    print('one_shot')
    train_dataset = tf.data.Dataset.from_tensor_slices(train_data).batch(BATCH_SIZE).repeat()
    train_el = train_dataset.make_one_shot_iterator().get_next()
    
    test_dataset = tf.data.Dataset.from_tensor_slices(test_data).batch(BATCH_SIZE).repeat()
    test_el = test_dataset.make_one_shot_iterator().get_next()
    for i in range(EPOCHS):
        print(i)
        for _ in range(N_BATCHES):
            sess.run(train_el)
        for _ in range(N_BATCHES):
            sess.run(test_el)
            
@how_much
def initialisable(**kwargs):
    print('initialisable')
    dataset = tf.data.Dataset.from_tensor_slices((x, y)).batch(BATCH_SIZE).repeat()

    iter = dataset.make_initializable_iterator()
    elements = iter.get_next()
    
    for i in range(EPOCHS):
        print(i)
        sess.run(iter.initializer, feed_dict={ x: train_data[0], y: train_data[1]})
        for _ in range(N_BATCHES):
            sess.run(elements)
        sess.run(iter.initializer, feed_dict={ x: test_data[0], y: test_data[1]})
        for _ in range(N_BATCHES):
            sess.run(elements)
@how_much            
def reinitializable(**kwargs):
    print('reinitializable')
    # create two datasets, one for training and one for test
    train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE).repeat()
    test_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE).repeat()
    # create a iterator of the correct shape and type
    iter = tf.data.Iterator.from_structure(train_dataset.output_types,
                                               train_dataset.output_shapes)
    elements = iter.get_next()
    # create the initialisation operations
    train_init_op = iter.make_initializer(train_dataset)
    test_init_op = iter.make_initializer(test_dataset)
    
    for i in range(EPOCHS):
        print(i)
        sess.run(train_init_op, feed_dict={ x: train_data[0], y: train_data[1]})
        for _ in range(N_BATCHES):
            sess.run(elements)
        sess.run(test_init_op, feed_dict={ x: test_data[0], y: test_data[1]})
        for _ in range(N_BATCHES):
            sess.run(elements)
            
@how_much            
def feedable(**kwargs):
    print('feedable')
    # create two datasets, one for training and one for test
    train_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE).repeat()
    test_dataset = tf.data.Dataset.from_tensor_slices((x,y)).batch(BATCH_SIZE).repeat()
    # create the iterators from the dataset
    train_iterator = train_dataset.make_initializable_iterator()
    test_iterator = test_dataset.make_initializable_iterator()

    handle = tf.placeholder(tf.string, shape=[])
    iter = tf.data.Iterator.from_string_handle(
        handle, train_dataset.output_types, train_dataset.output_shapes)
    elements = iter.get_next()

    train_handle = sess.run(train_iterator.string_handle())
    test_handle = sess.run(test_iterator.string_handle())

    sess.run(train_iterator.initializer, feed_dict={ x: train_data[0], y: train_data[1]})
    sess.run(test_iterator.initializer, feed_dict={ x: test_data[0], y: test_data[1]})

    for i in range(EPOCHS):
        print(i)
        for _ in range(N_BATCHES):
            sess.run(elements, feed_dict={handle: train_handle})
        for _ in range(N_BATCHES):
            sess.run(elements, feed_dict={handle: test_handle})
            
one_shot(log_time=log_time)
initialisable(log_time=log_time)
reinitializable(log_time=log_time)
feedable(log_time=log_time)

sorted((value,key) for (key,value) in log_time.items())