<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Structuring-TFRecords" data-toc-modified-id="Structuring-TFRecords-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Structuring TFRecords</a></span></li><li><span><a href="#Movie-recommendations-using-tf.train.Example" data-toc-modified-id="Movie-recommendations-using-tf.train.Example-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Movie recommendations using tf.train.Example</a></span><ul class="toc-item"><li><span><a href="#tf.train.Feature" data-toc-modified-id="tf.train.Feature-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>tf.train.Feature</a></span></li><li><span><a href="#tf.train.Features" data-toc-modified-id="tf.train.Features-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>tf.train.Features</a></span></li><li><span><a href="#tf.train.Example" data-toc-modified-id="tf.train.Example-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>tf.train.Example</a></span></li><li><span><a href="#tf.python_io.TFRecordWriter" data-toc-modified-id="tf.python_io.TFRecordWriter-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>tf.python_io.TFRecordWriter</a></span></li><li><span><a href="#Final" data-toc-modified-id="Final-2.5"><span class="toc-item-num">2.5&nbsp;&nbsp;</span>Final</a></span></li></ul></li><li><span><a href="#Movie-recommendations-using-tf.train.SequenceExample" data-toc-modified-id="Movie-recommendations-using-tf.train.SequenceExample-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Movie recommendations using tf.train.SequenceExample</a></span></li></ul></div>

# TFRecord 

Credit: https://medium.com/mostly-ai/tensorflow-records-what-they-are-and-how-to-use-them-c46bc4bbb564

I am re-doing the examples provided in the link as an exercise.

In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


## Structuring TFRecords

A TFRecord file stores your data as a sequence of binary strings. This means you need to specify the structure of your data before you write it to the file. 

Tensorflow provides two components for this purpose: 
1. `tf.train.Examples` 
2. `tf.train.SequenceExample` 

You have to store each sample of your data in one of these structures, then serialize it and use a `tf.python_io.TFRecordWriter` to write it to disk.

## Movie recommendations using tf.train.Example

If your dataset consist of features, where each feature is a list of values of the same type, tf.train.Example is the right component to use.

1. `tf.train.BytesList`
2. `tf.train.FloatList`
3. `tf.train.Int64List` 

are at the core of a `tf.train.Feature`. All three have a single attribute value, which expects a list of respective bytes, float, and int.

Python strings need to be converted to bytes, (e.g. my_string.encode(‘utf-8’)) before they are stored in a tf.train.BytesList.

In [14]:
movie_name_list = tf.train.BytesList(value=[b'The Shawshank Redemption', b'Fight Club'])
movie_rating_list = tf.train.FloatList(value=[9.0, 9.7])

print(movie_name_list)
print(movie_rating_list)

value: "The Shawshank Redemption"
value: "Fight Club"

value: 9.0
value: 9.699999809265137



### tf.train.Feature

`tf.train.Feature` wraps a list of data of a specific type so Tensorflow can understand it. It has a single attribute, which is a union of bytes_list/float_list/int64_list. 

In [15]:
movie_names = tf.train.Feature(bytes_list=movie_name_list)
movie_ratings = tf.train.Feature(float_list=movie_rating_list)

print(movie_names)
print(movie_ratings)

bytes_list {
  value: "The Shawshank Redemption"
  value: "Fight Club"
}

float_list {
  value: 9.0
  value: 9.699999809265137
}



### tf.train.Features

`tf.train.Features` is a collection of named features. It has a single attribute feature that expects a **dictionary** where the **key is the name of the features** and the value a tf.train.Feature.

In [16]:
movie_dict = {
  'Movie Names': movie_names,
  'Movie Ratings': movie_ratings
}
movies = tf.train.Features(feature=movie_dict)

print(movies)

feature {
  key: "Movie Names"
  value {
    bytes_list {
      value: "The Shawshank Redemption"
      value: "Fight Club"
    }
  }
}
feature {
  key: "Movie Ratings"
  value {
    float_list {
      value: 9.0
      value: 9.699999809265137
    }
  }
}



### tf.train.Example

`tf.train.Example` is one of the main components for structuring a TFRecord. An `tf.train.Example` **stores features in a single attribute features** of type `tf.train.Features`.


In [17]:
example = tf.train.Example(features=movies)
print(example)

features {
  feature {
    key: "Movie Names"
    value {
      bytes_list {
        value: "The Shawshank Redemption"
        value: "Fight Club"
      }
    }
  }
  feature {
    key: "Movie Ratings"
    value {
      float_list {
        value: 9.0
        value: 9.699999809265137
      }
    }
  }
}



###  tf.python_io.TFRecordWriter

In contrast to the previous components, tf.python_io.TFRecordWriter actually is a Python class. It accepts a file path in its path attribute and creates a writer object that works just like any other file object. The TFRecordWriter class offers write, flush and close methods. The method write accepts a string as parameter and writes it to disk, meaning that structured data must be serialized first. To this end, tf.train.Example and tf.train.SequenceExample provide SerializeToString methods.

In our example, each TFRecord represents the movie ratings and corresponding suggestions of a single user (a single sample).

In [18]:
# "example" is of type tf.train.Example.
with tf.python_io.TFRecordWriter('movie_ratings.tfrecord') as writer:
  writer.write(example.SerializeToString())

### Final

Here’s a complete example that writes the features to a TFRecord file, then reads the file back in and prints the parsed features.

In [8]:
# Create example data
data = {
    'Age': 29,
    'Movie': ['The Shawshank Redemption', 'Fight Club'],
    'Movie Ratings': [9.0, 9.7],
    'Suggestion': 'Inception',
    'Suggestion Purchased': 1.0,
    'Purchase Price': 9.99
}

print(data)

{'Age': 29, 'Movie': ['The Shawshank Redemption', 'Fight Club'], 'Movie Ratings': [9.0, 9.7], 'Suggestion': 'Inception', 'Suggestion Purchased': 1.0, 'Purchase Price': 9.99}


In [9]:
# Create the Example
example = tf.train.Example(features=tf.train.Features(feature={
    'Age': tf.train.Feature(
        int64_list=tf.train.Int64List(value=[data['Age']])),
    'Movie': tf.train.Feature(
        bytes_list=tf.train.BytesList(
            value=[m.encode('utf-8') for m in data['Movie']])),
    'Movie Ratings': tf.train.Feature(
        float_list=tf.train.FloatList(value=data['Movie Ratings'])),
    'Suggestion': tf.train.Feature(
        bytes_list=tf.train.BytesList(
            value=[data['Suggestion'].encode('utf-8')])),
    'Suggestion Purchased': tf.train.Feature(
        float_list=tf.train.FloatList(
            value=[data['Suggestion Purchased']])),
    'Purchase Price': tf.train.Feature(
        float_list=tf.train.FloatList(value=[data['Purchase Price']]))
}))

print(example)

features {
  feature {
    key: "Age"
    value {
      int64_list {
        value: 29
      }
    }
  }
  feature {
    key: "Movie"
    value {
      bytes_list {
        value: "The Shawshank Redemption"
        value: "Fight Club"
      }
    }
  }
  feature {
    key: "Movie Ratings"
    value {
      float_list {
        value: 9.0
        value: 9.699999809265137
      }
    }
  }
  feature {
    key: "Purchase Price"
    value {
      float_list {
        value: 9.989999771118164
      }
    }
  }
  feature {
    key: "Suggestion"
    value {
      bytes_list {
        value: "Inception"
      }
    }
  }
  feature {
    key: "Suggestion Purchased"
    value {
      float_list {
        value: 1.0
      }
    }
  }
}



In [11]:
# Write TFrecord file
with tf.python_io.TFRecordWriter('customer_1.tfrecord') as writer:
    writer.write(example.SerializeToString())

In [12]:
# Read and print data:
sess = tf.InteractiveSession()

# Read TFRecord file
reader = tf.TFRecordReader()
filename_queue = tf.train.string_input_producer(['customer_1.tfrecord'])

_, serialized_example = reader.read(filename_queue)

# Define features
read_features = {
    'Age': tf.FixedLenFeature([], dtype=tf.int64),
    'Movie': tf.VarLenFeature(dtype=tf.string),
    'Movie Ratings': tf.VarLenFeature(dtype=tf.float32),
    'Suggestion': tf.FixedLenFeature([], dtype=tf.string),
    'Suggestion Purchased': tf.FixedLenFeature([], dtype=tf.float32),
    'Purchase Price': tf.FixedLenFeature([], dtype=tf.float32)}

# Extract features from serialized data
read_data = tf.parse_single_example(serialized=serialized_example,
                                    features=read_features)

# Many tf.train functions use tf.train.QueueRunner,
# so we need to start it before we read
tf.train.start_queue_runners(sess)

# Print features
for name, tensor in read_data.items():
    print('{}: {}'.format(name, tensor.eval()))

Movie: SparseTensorValue(indices=array([[0],
       [1]]), values=array([b'The Shawshank Redemption', b'Fight Club'], dtype=object), dense_shape=array([2]))
Movie Ratings: SparseTensorValue(indices=array([[0],
       [1]]), values=array([9. , 9.7], dtype=float32), dense_shape=array([2]))
Age: 29
Purchase Price: 9.989999771118164
Suggestion: b'Inception'
Suggestion Purchased: 1.0


## Movie recommendations using tf.train.SequenceExample

TBC.