# Keras Dataset API Basics in TensorFlow

In [1]:
import tensorflow as tf
print(tf.__version__)

2025-05-16 20:12:51.276334: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.18.1


## Loading a Dataset

In [2]:
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
print('Train shape:', x_train.shape, y_train.shape)
print('Test shape:', x_test.shape, y_test.shape)

Train shape: (60000, 28, 28) (60000,)
Test shape: (10000, 28, 28) (10000,)


## Creating a tf.data.Dataset

In [4]:
# Create a `tf.data.Dataset` from the NumPy arrays
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))

2025-05-16 20:13:33.012928: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 47040000 exceeds 10% of free system memory.


In [5]:
train_ds

<_TensorSliceDataset element_spec=(TensorSpec(shape=(28, 28), dtype=tf.uint8, name=None), TensorSpec(shape=(), dtype=tf.uint8, name=None))>

## Batching

In [6]:
batch_size = 32
train_ds_batched = train_ds.batch(batch_size)

In [7]:
for images, labels in train_ds_batched.take(1):
    print('Batch images shape:', images.shape)
    print('Batch labels shape:', labels.shape)

2025-05-16 20:14:40.832150: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 47040000 exceeds 10% of free system memory.


Batch images shape: (32, 28, 28)
Batch labels shape: (32,)


2025-05-16 20:14:41.127062: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Shuffling

Shuffling helps randomize the order of samples.

In [8]:
train_ds = train_ds_batched.shuffle(buffer_size=10000)

In [9]:
for images, labels in train_ds.take(1):
    print('Shuffled batch images shape:', images.shape)

2025-05-16 20:15:05.194918: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 47040000 exceeds 10% of free system memory.


Shuffled batch images shape: (32, 28, 28)


2025-05-16 20:15:05.518014: I tensorflow/core/framework/local_rendezvous.cc:405] Local rendezvous is aborting with status: OUT_OF_RANGE: End of sequence


## Iterating Through Batches

You can iterate through the dataset using a for loop.

In [10]:
for batch_num, (images, labels) in enumerate(train_ds_batched.take(3)):
    print(f'Batch {batch_num+1}: images shape {images.shape}, labels shape {labels.shape}')

2025-05-16 20:15:13.432519: W external/local_xla/xla/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 47040000 exceeds 10% of free system memory.


Batch 1: images shape (32, 28, 28), labels shape (32,)
Batch 2: images shape (32, 28, 28), labels shape (32,)
Batch 3: images shape (32, 28, 28), labels shape (32,)
