# Apply function to dataset



## apply vs map

* [Difference between tf.data.Dataset.map() and tf.data.Dataset.apply()](https://stackoverflow.com/questions/47091726/difference-between-tf-data-dataset-map-and-tf-data-dataset-apply)

> The difference is that ```map``` will execute one **function on every element of the Dataset separately**, whereas ```apply ```will execute one **function on the whole Dataset at once** (such as group_by_window given as example in the documentation).
> 
> * the argument of ```apply``` is a **function that takes a Dataset and returns a Dataset**
> ```
> datset.apply(lambda x: x < 10)
> ```
> * the argument of ```map``` is a **function that takes one element** and **returns one transformed element**.


* function argument of ```map``` must be ```tf.Function```. Cannot use Python library e.g. numpy inside the function.

In [2]:
import numpy as np
import tensorflow as tf

# Dataset

In [3]:
dataset = tf.data.Dataset.from_tensor_slices([
    tf.constant([0.14375, 0.0437018, 0.97083336], dtype=np.float32),
    tf.constant([0.14583333, 0.24164525, 0.57916665], dtype=np.float32),
    tf.constant([0.6, 0.5244216, 0.8541667], dtype=np.float32),
])
print(dataset)
print()
for d in dataset:
    print(d)

<TensorSliceDataset element_spec=TensorSpec(shape=(3,), dtype=tf.float32, name=None)>

tf.Tensor([0.14375    0.0437018  0.97083336], shape=(3,), dtype=float32)
tf.Tensor([0.14583333 0.24164525 0.57916665], shape=(3,), dtype=float32)
tf.Tensor([0.6       0.5244216 0.8541667], shape=(3,), dtype=float32)


---
# How to apply function and retain the same Tensor shape

Transform a dataset of an **array of Tensors of shape ```(3,)```** into the same shape but manipulate tensor elements.

* [TensorFlow Dataset - how to make map function return multi columns as one tensor](https://stackoverflow.com/questions/75587284/tensorflow-dataset-how-to-make-map-function-return-multi-columns-as-one-tensor)


## flat_map(g)

Use [flat_map](https://www.tensorflow.org/api_docs/python/tf/data/Dataset#flat_map)(g) where ```g``` generates a Dataset. ```flat_map``` needs to return a Dataset, not Python data structure nor tf.Tensor.


<img src="./image/tf_dataset_flat_map_for_multi_dimension_tensor.png" align="left" width=500/>

In [4]:
def g(x):
    return tf.data.Dataset.from_tensors([x[0]*1, x[1]*2, x[2] * 3])

for d in dataset.flat_map(g):
    print(d)

tf.Tensor([0.14375   0.0874036 2.9125001], shape=(3,), dtype=float32)
tf.Tensor([0.14583333 0.4832905  1.7375    ], shape=(3,), dtype=float32)
tf.Tensor([0.6       1.0488431 2.5625   ], shape=(3,), dtype=float32)


2023-03-02 15:59:49.541954: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz


In [37]:
def g(x):
    return tf.data.Dataset.from_tensor_slices([[x[0]*1, x[1]*2, x[2] * 3]])

for d in dataset.flat_map(g):
    print(d)

tf.Tensor([0.14375   0.0874036 2.9125001], shape=(3,), dtype=float32)
tf.Tensor([0.14583333 0.4832905  1.7375    ], shape=(3,), dtype=float32)
tf.Tensor([0.6       1.0488431 2.5625   ], shape=(3,), dtype=float32)


### Problem - Cannot use ```map```

Cannot retain ```(3,)``` shape. Instead ```Tuple[(),(),()]```. 

In [56]:
def f(x):
    return x[0]*1, x[1]*2, x[2] * 3

for d in dataset.map(f):
    print(d)

(<tf.Tensor: shape=(), dtype=float32, numpy=0.14375>, <tf.Tensor: shape=(), dtype=float32, numpy=0.0874036>, <tf.Tensor: shape=(), dtype=float32, numpy=2.9125001>)
(<tf.Tensor: shape=(), dtype=float32, numpy=0.14583333>, <tf.Tensor: shape=(), dtype=float32, numpy=0.4832905>, <tf.Tensor: shape=(), dtype=float32, numpy=1.7375>)
(<tf.Tensor: shape=(), dtype=float32, numpy=0.6>, <tf.Tensor: shape=(), dtype=float32, numpy=1.0488431>, <tf.Tensor: shape=(), dtype=float32, numpy=2.5625>)


Alternatively, make it a dictionary.

In [59]:
def f(x):
    return {"inputs": [x[0]*1, x[1]*2, x[2] * 3]}

for d in dataset.map(f):
    print(d)

{'inputs': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.14375  , 0.0874036, 2.9125001], dtype=float32)>}
{'inputs': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.14583333, 0.4832905 , 1.7375    ], dtype=float32)>}
{'inputs': <tf.Tensor: shape=(3,), dtype=float32, numpy=array([0.6      , 1.0488431, 2.5625   ], dtype=float32)>}
