## Mapping function

After initially creating a dataset from NumPy arrays or files, we oftentimes want to apply changes to make the dataset observations more useful. For example, we might create a dataset from heights measured in inches, but we want to train a model on the heights in centimeters. We can convert each observation to the desired format by using the map function.

In [1]:
import numpy as np
import tensorflow as tf

data1 = np.array([[1.2, 2.2],
       [7.3, 0. ]])
data2 = np.array([0.1, 1.1])
d1 = tf.data.Dataset.from_tensor_slices((data1, data2))
d2 = d1.map(lambda x,y:x + y)

for val2 in d2:
       print(val2)

tf.Tensor([1.3 2.3], shape=(2,), dtype=float64)
tf.Tensor([8.4 1.1], shape=(2,), dtype=float64)


## Wrapper functions

One thing to note about map is that its input function must only take in a single argument, representing an individual dataset observation. However, we may want to use a multi-argument function as the input to map. In this case, we can use a wrapper to ensure that the input function is in the correct format.

In [2]:
def f(a, b):
       return a - b
data1 = np.array([[4.3, 2.7],
       [1.3, 1. ]])
data2 = np.array([0.2, 0.5])
d1 = tf.data.Dataset.from_tensor_slices(data1)
d2 = d1.map(lambda x:f(x, data2))

for val2 in d2:
       print(val2)

tf.Tensor([4.1 2.2], shape=(2,), dtype=float64)
tf.Tensor([1.1 0.5], shape=(2,), dtype=float64)
