<a href="https://colab.research.google.com/github/nicoloceneda/Python-edu/blob/master/TensorFlow_Dataset_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# TensorFlow - Dataset API

---



In [0]:
import numpy as np
import tensorflow as tf
import pathlib

## Creating a TensorFlow Dataset
Create a dataset from a **list**, a **Numpy array** or a **tensor** using `tf.data.Dataset.from_tensor_slices`

In [3]:
a = [1, 2, 3]
dataset_a = tf.data.Dataset.from_tensor_slices(a)
print(dataset_a)

b = np.array([4, 5, 6])
dataset_b = tf.data.Dataset.from_tensor_slices(b)
print(dataset_b)

c = tf.constant([7, 8, 9])
dataset_c = tf.data.Dataset.from_tensor_slices(c)
print(dataset_c)

<TensorSliceDataset shapes: (), types: tf.int32>
<TensorSliceDataset shapes: (), types: tf.int64>
<TensorSliceDataset shapes: (), types: tf.int32>


## Iterating through a dataset
Iterate **entry by entry** through a dataset using `for ... in`

In [6]:
a = [1, 2, 3]
dataset_a = tf.data.Dataset.from_tensor_slices(a)

for pos, item in enumerate(dataset_a):
  print('item {}'.format(pos), item)

item 0 tf.Tensor(1, shape=(), dtype=int32)
item 1 tf.Tensor(2, shape=(), dtype=int32)
item 2 tf.Tensor(3, shape=(), dtype=int32)


## Combining two tensors into a joint dataset
Create a **joint dataset** (to create a one-to-one correspondence between the elements of two tensors) using `tf.data.Dataset.zip` or `tf.data.Dataset.from_tensor_slices`


In [51]:
# First create two separate datasets, then join them (zip)
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
dataset_a = tf.data.Dataset.from_tensor_slices(tensor_a)

tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)
dataset_b = tf.data.Dataset.from_tensor_slices(tensor_b)

dataset_c = tf.data.Dataset.zip((dataset_a, dataset_b))

for item in dataset_c:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

x: [0.01112921 0.5495263 ] y: 1
x: [0.92323786 0.33196753] y: 0
x: [0.26064636 0.52114432] y: 1
x: [0.92387295 0.1034112 ] y: 2


In [52]:
# Directly create a joint dataset (from_tensor_slices)
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)

dataset_c = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))

for item in dataset_c:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

x: [0.54302028 0.05592273] y: 0
x: [0.89410873 0.9297669 ] y: 1
x: [0.80005524 0.88403599] y: 3
x: [0.50147179 0.41842   ] y: 3


Apply **feature scaling** to scale the values to the range [-1, +1] using `map`



In [54]:
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)

dataset_c = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))

dataset_c2 = dataset_c.map(lambda x, y: (x*2 - 1.0, y))

for item in dataset_c2:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

x: [0.64641501 0.06121631] y: 2
x: [-0.22722884 -0.87722895] y: 0
x: [-0.42201164 -0.00991235] y: 2
x: [-0.30142208  0.95123297] y: 2


## Shuffling the dataset, creating batches and repeating
**Shuffle** the elements of a dataset (keeping the one-to-one correspondence between the elements of the two tensors) using `shuffle`


In [56]:
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)

dataset_c = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))

for item in dataset_c:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

print('-'*32)

dataset_c2 = dataset_c.shuffle(buffer_size=len(tensor_b))

for item in dataset_c2:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

x: [0.42108145 0.17728118] y: 0
x: [0.3176818  0.27524818] y: 3
x: [0.6453228  0.82277006] y: 0
x: [0.19365721 0.99243814] y: 0
--------------------------------
x: [0.42108145 0.17728118] y: 0
x: [0.3176818  0.27524818] y: 3
x: [0.6453228  0.82277006] y: 0
x: [0.19365721 0.99243814] y: 0


Create **batches** from a dataset using `batch`


In [42]:
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)

dataset_c = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))

for item in dataset_c:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

print('-'*42)

dataset_d = dataset_c.batch(batch_size=2)

for pos, batch in enumerate(dataset_d):
  print('{}) X:\n'.format(pos), batch[0].numpy(), '\n   Y:\n', batch[1].numpy())



x: [0.06993291 0.77727807 0.56011085] y: 1
x: [0.24436876 0.18717293 0.10313125] y: 1
x: [0.08745759 0.75249465 0.70536724] y: 0
x: [0.38577196 0.06446177 0.57855689] y: 0
------------------------------------------
0) X:
 [[0.06993291 0.77727807 0.56011085]
 [0.24436876 0.18717293 0.10313125]] 
   Y:
 [1 1]
1) X:
 [[0.08745759 0.75249465 0.70536724]
 [0.38577196 0.06446177 0.57855689]] 
   Y:
 [0 0]


**Repeat** the operations using `repeat`






In [68]:
tensor_a = tf.random.uniform(shape=(4, 2), minval=0, maxval=1, dtype=tf.float64)
tensor_b = tf.random.uniform(shape=(4, ), minval=0, maxval=10, dtype=tf.int64)

dataset_c = tf.data.Dataset.from_tensor_slices((tensor_a, tensor_b))

for item in dataset_c:
  print('x:', item[0].numpy(), 'y:', item[1].numpy())

print('-'*42)

dataset_d = dataset_c.shuffle(buffer_size=len(tensor_b)).batch(batch_size=2).repeat(count=2)

for pos, batch in enumerate(dataset_d):
  print('{}) X:\n'.format(pos), batch[0].numpy(), '\n   Y:\n', batch[1].numpy())

x: [0.83327353 0.09604491] y: 6
x: [0.85598053 0.29396247] y: 8
x: [0.34457544 0.0239543 ] y: 6
x: [0.42568612 0.57547804] y: 2
------------------------------------------
0) X:
 [[0.34457544 0.0239543 ]
 [0.83327353 0.09604491]] 
   Y:
 [6 6]
1) X:
 [[0.85598053 0.29396247]
 [0.42568612 0.57547804]] 
   Y:
 [8 2]
2) X:
 [[0.83327353 0.09604491]
 [0.34457544 0.0239543 ]] 
   Y:
 [6 6]
3) X:
 [[0.42568612 0.57547804]
 [0.85598053 0.29396247]] 
   Y:
 [2 8]
