## Feature parsing

After creating an Example spec, it can be used to parse serialized protocol buffers that are read from a TFRecords file. Specifically, we use the Example spec as an argument to the tf.io.parse_single_example function, which converts a serialized protocol buffer into a usable feature dictionary.

In [None]:
import tensorflow as tf

print(example_spec)
# {'student_id': FixedLenFeature(shape=[], dtype=tf.string, default_value='N/A'), 'yearly_gpa': FixedLenFeature(shape=4, dtype=tf.float32, default_value=None), 'majors': VarLenFeature(dtype=tf.string)}
print(repr(ex))
# features {
#   feature {
#     key: "majors"
#     value {
#       bytes_list {
#         value: "English"
#         value: "Psychology"
#       }
#     }
#   }
#   feature {
#     key: "student_id"
#     value {
#       bytes_list {
#         value: "leemaya"
#       }
#     }
#   }
#   feature {
#     key: "yearly_gpa"
#     value {
#       float_list {
#         value: 3.9600000381469727
#         value: 4.0
#         value: 3.880000114440918
#         value: 3.930000066757202
#       }
#     }
#   }
# }

parsed = tf.io.parse_single_example(
    ex.SerializeToString(), example_spec)
print(repr(parsed))
# {'majors': <tensorflow.python.framework.sparse_tensor.SparseTensor object at 0x7f4e60dfe490>, 'student_id': <tf.Tensor: shape=(), dtype=string, numpy=b'leemaya'>, 'yearly_gpa': <tf.Tensor: shape=(4,), dtype=float32, numpy=array([3.96, 4.  , 3.88, 3.93], dtype=float32)>}

You’ll notice that the output of tf.io.parse_single_example is a dictionary mapping feature names to either a tf.Tensor or a tf.sparse.SparseTensor. Each FixedLenFeature is converted to a tf.Tensor and each VarLenFeature is converted to a tf.SparseTensor.

A tf.Tensor is basically TensorFlow’s version of NumPy arrays, meaning it is a container for feature data with a fixed shape. tf.sparse.SparseTensor is used to represent data that may have many missing or empty values, making it useful for variable-length features.

## Shapes: () vs. 1

Using () (or []) corresponds to a single data value, while using 1 (represented as (1,) in tf.Tensor) corresponds to a list containing a single data value.