<a href="https://colab.research.google.com/github/martin-fabbri/colab-notebooks/blob/master/deeplearning.ai/tf/b4_public_datasets_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Using public datasets with TF Datasets

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds

from tensorflow.keras import layers

tfds.__version__

'4.0.1'

In [2]:
mnist_data = tfds.load('fashion_mnist')
type(mnist_data), mnist_data

[1mDownloading and preparing dataset fashion_mnist/3.0.1 (download: 29.45 MiB, generated: 36.42 MiB, total: 65.87 MiB) to /root/tensorflow_datasets/fashion_mnist/3.0.1...[0m


HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Completed...', max=1.0, style=Progre…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Dl Size...', max=1.0, style=ProgressSty…

HBox(children=(FloatProgress(value=1.0, bar_style='info', description='Extraction completed...', max=1.0, styl…






KeyboardInterrupt: ignored

In [None]:
for item in mnist_data:
  print(type(item), item)

If you want to load these splits into a dataset containing the actual data, you can simply specify the split you want in the tfds.load command, like this:

In [None]:
mnist_train = tfds.load(name='fashion_mnist', split='train')
assert isinstance(mnist_train, tf.data.Dataset)
type(mnist_train)

In this instance, we we a `PrefetchDataset` object, which we can iterate through to inspect the data. One nice feature is that we can apply `take(1)` and get the first record.

In [None]:
item = next(iter(mnist_train.take(1)))
print(type(item))
print(item.keys())

In [None]:
image = item['image']
print(type(image))
print(image.shape)
print(image[0:0])

In [None]:
label = item['label']
print(type(label))
print(label)

In [None]:
mnist_test, info = tfds.load(name='fashion_mnist', with_info='true')
info

## Using TFDS with Keras Model

In [None]:
mnist = tf.keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = mnist.load_data()
print(type(train_images))

When using TFDS the code is very similar, but with some minor changes. The Keras datasets gave us `ndarray` that worked natively in `model.fit`. However, with TFDS we will need to do a little conversion work.

In [None]:
(train_images, train_labels), (test_images, test_labels) = \
  tfds.as_numpy(
      tfds.load('fashion_mnist',
                split=['train', 'test'],
                batch_size=-1,
                as_supervised=True))
print(type(train_images))

In [None]:
# we need to rescale our images before feeding them into the network
# train_images = train_images * 1.0/255.0
# test_images = test_images * 1.0/255.0
# skipping this rescaling step in favor of adding rescaling directly
# into the model pipeline(see layers...Rescaling) 

model = tf.keras.models.Sequential([
  layers.experimental.preprocessing.Rescaling(1.0/255.0),
  layers.Flatten(input_shape=(28, 28, 1)),
  layers.Dense(128, activation='relu'),
  layers.Dropout(0.2),
  layers.Dense(10, activation='softmax')
])

model.compile(
    loss='sparse_categorical_crossentropy', 
    optimizer='adam', 
    metrics=['accuracy']
)

model.fit(
    train_images,
    train_labels,
    epochs=5
)