<h1 style="color:white;background-color:rgb(255, 108, 0);padding-top:1em;padding-bottom:0.7em;padding-left:1em;">3.2 Placeholders and Dataset API</h1>
<hr>

<h2>Introduction</h2>

In this lesson we are going to cover how to input data to TensorFlow
<br>
first with the help of placeholders and then with the TensorFlow Dataset API.

First let's import the required modules:

In [None]:
import numpy as np
import tensorflow as tf

<h2>Placeholders</h2>

Placeholders in tensorflow are very similar to constants.
<br>
The difference is that placeholders can be present in the computation graph
<br>
without a value being assigned to them. The value of a placeholder only have to be
<br>
specified before the evaluation of the tensor.

A placeholder in TensorFlow can be created like

```python
tf.placeholder(
    dtype,
    shape=None,
    name=None
)
```

It can be seen that contrary to a constant, a placeholder does not have a 'value' argument.
<br>
The value of the placeholder have to be set in the optional argument 'feed_dict' of Session.run()

Further information on placeholders in TensorFlow can be found at https://www.tensorflow.org/api_docs/python/tf/placeholder

<p style="margin-top:2em;">Now let's see some examples of the usage of placeholders:</p>

In [None]:
#Create a placeholder
x = tf.placeholder(tf.int32, name='x')

#Perform operations on the placeholder (Build the computation graph)
y1 = x**2
y2 = 2*x + 1
y3 = y1 + x

#Define the feed_dict argument
xv_1 = [1,2,3] #First value for x
xv_2 = 5 #Second value for x
feed_dict_1 = {x: xv_1}
feed_dict_2 = {x: xv_2}

#Evaluate the tensors
with tf.Session() as sess:
    y1v,y2v,y3v = sess.run([y1,y2,y3], feed_dict=feed_dict_1)
    print('In case of x = ', xv_1, '\n')
    print('y1 is\n', y1v, '\n')
    print('y2 is\n', y2v, '\n')
    print('y3 is\n', y3v, '\n')
    
    y1v,y2v,y3v = sess.run([y1,y2,y3], feed_dict=feed_dict_2)
    print('In case of x = ', xv_2, '\n')
    print('y1 is\n', y1v, '\n')
    print('y2 is\n', y2v, '\n')
    print('y3 is\n', y3v, '\n')
    

From the example above it can be seen, that the placeholder can be used
<br>
as an input to the defined computation graph, because the same graph can be
<br>
executed on different values by feeding them to the placeholder.
<br>
However, this scheme is not always sufficient for the handling of large amounts of
<br>
input data. For this purpose the Dataset API was introduced.

<h2>TensorFlow Dataset API</h2>

The Dataset class of tensorflow in 'tf.data' provides methods for the handling of potencially large datasets.
<br>
With the Dataset API one can build an input pipleline for large number of elements and define transformations
<br>
on those elements as well.

We are going to cover the basic methods to create a Dataset object
<br>
and to perform some fundamental transformations.
<br>
Detailed information on the Dataset class and its methods and properties can be found at
<br>
https://www.tensorflow.org/api_docs/python/tf/data/Dataset

There are two basic methods to create a dataset object.
<br>
The first one is with the 'from_tensor_slices' method and the
<br>
second one is with the 'from_generator' method.

The elements of a created Dataset can be accesssed via an iterator defined in the Iterator class of 'tf.data'
<br>
The easiest way to create an iterator for a Dataset is to call the 'make_one_shot_iterator' method.

<p style="margin-top:2em;">Let's see how to create TensorFlow Datasets and how to iterate over their elements:</p>

In [None]:
#Create datasets with the from_tensor_slices method
x_np = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]]) #Create numpy array
ds_np = tf.data.Dataset.from_tensor_slices(x_np) #Create datset directly from numpy array

x_tensor = tf.constant([[1,1,1],[2,2,2],[3,3,3],[4,4,4],[5,5,5]]) #Create a tensor
ds_tensor = tf.data.Dataset.from_tensor_slices(x_tensor) #Create dataset from tensor

#Create the iterators for the datasets
iter_np = ds_np.make_one_shot_iterator() #Iterator for the ds_np dataset
iter_tensor = ds_tensor.make_one_shot_iterator() #Iterator for the ds_tensor dataset

#Call the get_next method of the iterators to retrieve elements from the dataset
element_np = iter_np.get_next()
element_tensor = iter_tensor.get_next()

#Of course we also could get the elements in one go like 'element_np = ds_np.make_one_shot_iterator().get_next()'
#but this way the iterator would not be accessible

#Create a session and evaluate the elements
with tf.Session() as sess:
    for i in range(x_np.shape[0]):
        #If we run the for loop more times than the number of elements of the dataset
        #it will result in an OutOfRangeError
        e_np = sess.run(element_np)
        print('Number ', i, ' element of the dataset from the nupy array is ', e_np, '\n')
    
    for i in range(sess.run(tf.shape(x_tensor))[0]):
        e_ten = sess.run(element_tensor)
        print('Number ', i, ' element of the dataset from the constant tensor is ', e_ten, '\n')

In [None]:
#Create datsets with the from_generator method

#Define a generator
def square_numbers(n):
    '''Yield the first n number of square numbers
    
    Args:
        n (int): the number of required square numbers
        
    Returns:
        int: the next square number in line starting from 1 up until n**2
    '''
    for i in range(n):
        yield((i+1)**2)

#Create a generator object
n = 10
generator = lambda: square_numbers(n)

#The generator passed to the from_generator method must be callable.
#By defining generator as a lambda function we can create Datasets from
#generators that also have arguments

#Create the dataset from the generator, create its iterator and retieve its elements
dataset = tf.data.Dataset.from_generator(generator, output_types=tf.int32)
element = dataset.make_one_shot_iterator().get_next()

#Evaluate the elements of the dataset:
with tf.Session() as sess:
    for i in range(n):
        value = sess.run(element)
        print('The next square number is ', value, '\n')

<h3>Iterators</h3>

For now we covered the one shot iterator that can be used once to iterate over the dataset.
<br>
However, there are two more important iterator types.
<br>

The initializable iterator can be re-initialized to iterate over the dataset over-and-over again.
<br>
For an initializable iterator we have to create an initializer and re-initialize the iterator
<br>
whenever we want to return to the beginning of the dataset.

The other method to create an iterator is to create it from a structure.
<br>
This kind of iterator is not bound to a particular dataset and can be reused with multiple datasets.

More information on iterators in TensorFlow can be found at https://www.tensorflow.org/api_docs/python/tf/data/Iterator

<p style="margin-top:2em;">Let's see how to re-initialize an iterator:</p>

In [None]:
#Create a dataset from a constant tensor and make an initializable iterator for it
tensor = tf.constant([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
ds = tf.data.Dataset.from_tensor_slices(tensor)

ds_iter = ds.make_initializable_iterator()
ds_iter_init = ds_iter.initializer #Initialize the dataset by executing this in a session

element = ds_iter.get_next()

n = 4

#Go through the dataset n times
with tf.Session() as sess:
    shape = sess.run(tf.shape(tensor))
    for i in range(n):
        sess.run(ds_iter_init)
        for j in range(shape[0]):
            val = sess.run(element)
            print(val)
        print('\n')

<p style="margin-top:2em;">As the placeholders are also tensors we can create datasets from placeholders.
<br>
Their value only have to be fed during the initialization of the dataset.
<br>
This way the contents of the dataset can be changed easily.
</p>

In [None]:
#Create an dataset from placeholder
placeholder = tf.placeholder(tf.int32)

placeholder_ds = tf.data.Dataset.from_tensor_slices(placeholder)
placeholder_iter = placeholder_ds.make_initializable_iterator()

placeholder_iter_init = placeholder_iter.initializer

element = placeholder_iter.get_next()

with tf.Session() as sess:
    _,shape = sess.run([placeholder_iter_init,tf.shape(placeholder)], feed_dict={placeholder: [1,2,3,4]})
    for i in range(shape[0]):
        value = sess.run(element)
        print(value)
    _,shape = sess.run([placeholder_iter_init,tf.shape(placeholder)], feed_dict={placeholder: [[1,1],[2,2],[3,3]]})
    for i in range(shape[0]):
        value = sess.run(element)
        print(value)

<p style="margin-top:2em;">Creating an iterator for multiple datasets can be done with the
<br>
from_structure method of the Iterator class. The method looks like this:
</p>

```python
@staticmethod
from_structure(
    output_types,
    output_shapes=None,
    shared_name=None,
    output_classes=None
)
```

In order to use it with different datasets an initializer has to be created for the iterator for all datasets.
<br>
For this purpose, the following method can be used:
```python
make_initializer(
    dataset,
    name=None
)
```
Detailed info on this method can be found at https://www.tensorflow.org/api_docs/python/tf/data/Iterator
<p style="margin-top:2em;">Now, let's see how to use this method:</p>

In [None]:
#Create two datasets and use them with a single iterator
list1 = [5,4,3,2,1,0]
list2 = [[2,4],[3,5],[4,6],[5,7]]

ds1 = tf.data.Dataset.from_tensor_slices(list1)
ds2 = tf.data.Dataset.from_tensor_slices(list2)

iterator = tf.data.Iterator.from_structure(tf.int32)

ds1_init = iterator.make_initializer(ds1)
ds2_init = iterator.make_initializer(ds2)

element = iterator.get_next()

with tf.Session() as sess:
    sess.run(ds1_init)
    for i in range(len(list1)):
        val = sess.run(element)
        print(val)
    sess.run(ds2_init)
    for i in range(len(list2)):
        val = sess.run(element)
        print(val)

<h3>Data preparation with the Dataset API</h3>

The Dataset API provides methods to process the data stream.
<br>
These methods can be useful to preprocess the data before feeding it
<br>
to the model.

The detailed description of these function can be found at https://www.tensorflow.org/api_docs/python/tf/data/Dataset

<p style="margin-top:2em;">Let's see how some of these methods work:</p>

In [None]:
#Create a dataset and process its data
def squared(sample):
    return (sample**2)

datasets = []

base_ds = tf.data.Dataset.range(20)
datasets.append(base_ds)

repeat_ds = base_ds.repeat(count=None) #Base dataset repeated indefinitely
datasets.append(repeat_ds)

batch_ds = repeat_ds.batch(batch_size=4) #Creates batches of 4 elements of the repeated dataset
datasets.append(batch_ds)

shuffle_ds = base_ds.shuffle(buffer_size=20) #Buffers 20 samples and returns them in a shuffled order
datasets.append(shuffle_ds)

shuffle_batch_ds = base_ds.batch(4).shuffle(5) #Shuffle the order of batches
datasets.append(shuffle_batch_ds)

batch_shuffle_ds = base_ds.shuffle(20).batch(4) #The samples are shuffled within a batch as well
datasets.append(batch_shuffle_ds)

map_ds = base_ds.map(squared) #Map the squared function for all samples in the dataset
datasets.append(map_ds)

zip_ds = tf.data.Dataset.zip((base_ds,map_ds)) #Zip the base and mapped datasets together

iterator = tf.data.Iterator.from_structure(tf.int64)

initializers=[]

for ds in datasets:
    initializers.append(iterator.make_initializer(ds))
    
zip_element = zip_ds.make_one_shot_iterator().get_next()
    
element = iterator.get_next()

with tf.Session() as sess:
    for init in initializers:
        sess.run(init)
        for i in range(30):
            try:
                val = sess.run(element)
                print(val)
            except tf.errors.OutOfRangeError:
                break
        print('Finished processing data\n')
    for i in range(20):
        val = sess.run(zip_element)
        print(val)

<h2>Excersise 3.2</h2>

 - Create a generator that yields random 3D vectros, $\mathbf{x}=[x_1,x_2,x_3]^T$, where the components $x_i\in \{1,\dots,100\}, i \in \{1,2,3\}$
<br>
<br>
 - Define a function that normalizes these vectors, $f(\mathbf{x})=\dfrac{\mathbf{x}}{\|\mathbf{x}\|}$, where $\|\mathbf{x}\|$ is the length of $\mathbf{x}$
<br>
<br>
 - Make a dataset from the generator and normalize each element of this dataset. Also make it repeat indefinitely in batches of 50
<br>
 - Create a placeholder that can hold a single 3D vector ($\mathbf{v}$)
<br>
 - Calculate the dot product $(s)$ of $\mathbf{v}$ and each normalized $\mathbf{x}$, $s=\mathbf{v}^T\cdot\mathbf{x}$
<br>
 - Find $\mathbf{x}$ for each batch for which $s$ is maximal within the batch ($\mathbf{x_m}$)
<br>
 - Calculate $\|\mathbf{v}\|\cdot\mathbf{x_m}$ for 20 batches and $\|\mathbf{v}\|\cdot\mathbf{x}$ for all $\mathbf{x}$ in a single batch
<br>
 - Plot the resulted points in a 3D graph with the provided function

In [None]:
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

def add_to_plot(ax, points, color='r'):
    '''Add list of points to the 3D plot
    
    Args:
        ax (Axes3D object): Feed the already created ax variable to this value
        points (float(,3)): 3D point arrays arranged in a list
        color (char): color of the points to be visualized
    '''
    x_coords = []
    y_coords = []
    z_coords = []
    for point in points:
        x_coords.append(point[0])
        y_coords.append(point[1])
        z_coords.append(point[2])
    ax.scatter(x_coords, y_coords, z_coords, c=color)
    
#Write your code here:

fig = plt.figure()
ax = Axes3D(fig)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
#Call add_to_plot function for your list of points here:

ax.view_init(30, 30)
plt.show()

See solution here: [Excersise 3.2 solution](Excersise_3_2.ipynb)

Continue: [3.3 Variables and Activation Functions](Variables_Activation.ipynb)