# TFRecord files in Python


The [tf.io](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/io) module also contains pure-Python funtions for reading and writing TFRecord files.

## Import all the necessary libraries

In [1]:
import numpy as np
import tensorflow as tf

np.random.seed(5)

tf.__version__

'2.0.0'

## Utils

In [2]:
# The following functions can be used to convert a value to a type compatible
# with tf.Example.

def _bytes_feature(value):
    """Returns a bytes_list from a string/byte."""
    if isinstance(value, type(tf.constant(0))):
        value = value.numpy() # BytesList won't unpack a string from an EagerTensor.
    return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def _float_feature(value):
    """Return a float_list form a float/double."""
    return tf.train.Feature(float_list=tf.train.FloatList(value=[value]))

def _int64_feature(value):
    """Return a int64_list from a bool/enum/int/uint."""
    return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

In [3]:
def serialize_example(feature0, feature1, feature2, feature3):
    """
    Creates a tf.Example message ready to be written to a file.
    """
    
    # Create a dictionary mapping the feature name to 
    # the tf.Example-compatible data type
    feature = {
        'feature0': _int64_feature(feature0),
        'feature1': _int64_feature(feature1),
        'feature2': _bytes_feature(feature2),
        'feature3': _float_feature(feature3),
    }
    
    # Create a Features message using tf.train.Example.
    example_proto = tf.train.Example(features=tf.train.Features(feature=feature))
    return example_proto.SerializeToString()

## Writing a TFRecord file

1. Create the 10,000 observations data.
2. Write the 10,000 observations to the file `test.tfrecord`.
   
Each observation is converted to a `tf.Example` message, 
the written to file. 
You can then verify that the file `test.record` has been created
  

### Create the observations data

In [4]:
# The number of observations in the dataset.
n_observations = int(1e4)

# Boolean feature, encoded as False or True.
feature0 = np.random.choice([False, True], n_observations)

# Integer feature, random from 0 to 4.
feature1 = np.random.randint(0, 5, n_observations)

# String feature
strings = np.array([b'cat', b'dog', b'chicken', b'horse', b'goat'])
feature2 = strings[feature1]

# Float feature, from a standard normal distribution
feature3 = np.random.randn(n_observations)

### Writing the  `tf.Example` observations to the file

In [5]:
filename = 'test.tfrecord'

with tf.io.TFRecordWriter(filename) as writer:
    for i in range(n_observations):
        example = serialize_example(feature0[i], feature1[i], feature2[i], feature3[i])
        writer.write(example)

In [6]:
!du -sh {filename}

984K	test.tfrecord


## Reading a TFRecord file

These serialized tensors can be easily parsed using 
[tf.train.Example.ParseFromString](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/train/Example#ParseFromString)

In [7]:
filenames = [filename]

# Create a Dataset from a TFRecord file
raw_dataset = tf.data.TFRecordDataset(filenames)
raw_dataset

<TFRecordDatasetV2 shapes: (), types: tf.string>

In [8]:
for raw_record in raw_dataset.take(1):
    print("Serialized example: ")
    print("========================")
    print(repr(raw_record))
    
    example = tf.train.Example()
    example.ParseFromString(raw_record.numpy())
    print("\nProto message: ")
    print("========================")
    print(example)


Serialized example: 
<tf.Tensor: id=10020, shape=(), dtype=string, numpy=b'\nS\n\x11\n\x08feature0\x12\x05\x1a\x03\n\x01\x01\n\x11\n\x08feature1\x12\x05\x1a\x03\n\x01\x03\n\x15\n\x08feature2\x12\t\n\x07\n\x05horse\n\x14\n\x08feature3\x12\x08\x12\x06\n\x04\xa3\x89\xc8\xbf'>

Proto message: 
features {
  feature {
    key: "feature0"
    value {
      int64_list {
        value: 1
      }
    }
  }
  feature {
    key: "feature1"
    value {
      int64_list {
        value: 3
      }
    }
  }
  feature {
    key: "feature2"
    value {
      bytes_list {
        value: "horse"
      }
    }
  }
  feature {
    key: "feature3"
    value {
      float_list {
        value: -1.5667003393173218
      }
    }
  }
}

