# How to count number of records in Dataset

* [What is the fastest method to count elements of a tensorflow.data.Datset?](https://stackoverflow.com/questions/59827628/what-is-the-fastest-method-to-count-elements-of-a-tensorflow-data-datset)

> Short answer "No".

> For in-memory datasets there's: tf.data.experimental.cardinality(dataset), but tf.data.Datasets are inherently lazy loaded, and can be infinite, so there's no knowing how many elements there are in a tf.data.Dataset without iterating through it.

* [tf.data.experimental.cardinality](https://www.tensorflow.org/api_docs/python/tf/data/experimental/cardinality)

> Returns the cardinality of dataset, if known.

If the size of the dimension cannot be determined at runtime, the function returns -2.

In [1]:
import tensorflow as tf
import tensorflow_datasets as tfds

In [3]:
mnist, info = tfds.load("mnist", split="train", as_supervised=True, with_info=True)
info

tfds.core.DatasetInfo(
    name='mnist',
    full_name='mnist/3.0.1',
    description="""
    The MNIST database of handwritten digits.
    """,
    homepage='http://yann.lecun.com/exdb/mnist/',
    data_path='/Users/oonisim/tensorflow_datasets/mnist/3.0.1',
    file_format=tfrecord,
    download_size=11.06 MiB,
    dataset_size=21.00 MiB,
    features=FeaturesDict({
        'image': Image(shape=(28, 28, 1), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'test': <SplitInfo num_examples=10000, num_shards=1>,
        'train': <SplitInfo num_examples=60000, num_shards=1>,
    },
    citation="""@article{lecun2010mnist,
      title={MNIST handwritten digit database},
      author={LeCun, Yann and Cortes, Corinna and Burges, CJ},
      journal={ATT Labs [Online]. Available: http://yann.lecun.com/exdb/mnist},
      volume={2},
      year={2010}
    }""",


In [9]:
for row in mnist.take(1):
    print("image", tf.shape(row[0]), "label", row[1].numpy())

image tf.Tensor([28 28  1], shape=(3,), dtype=int32) label 4


2023-03-05 15:39:14.671820: W tensorflow/core/kernels/data/cache_dataset_ops.cc:856] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset  will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.


In [11]:
tf.data.Dataset.cardinality(mnist).numpy()

60000