# Lab: TF Transform # 

**Learning Objectives**
1. Preproccess data and engineer new features using TF Transform 
1. Create and deploy Apache Beam pipeline 
1. Use processed data to train model locally then serve a prediction

## Introduction 
While Pandas is fine for experimenting, for operationalization of your workflow it is better to do preprocessing in Apache Beam. This will also help if you need to preprocess data in flight, since Apache Beam allows for streaming. In this lab we will pull data from BigQuery then use Apache Beam and TF Transform to process the data.  

Only specific combinations of TensorFlow/Beam are supported by tf.transform so make sure to get a combo that works. In this lab we will be using: 
* TFT 0.24.0
* TF 2.3.0 
* Apache Beam [GCP] 2.24.0

## Setup ##

Before starting, we need to ensure the proper versions of packages are installed. 

**After running the following cell be sure to restart the kernel!**

In [None]:
%pip install tensorflow==2.3.0 tensorflow-transform==0.24.0 apache-beam[gcp]==2.24.0

First we will import the packages we need for the notebook. Note that we will be using the implementation of Apache Beam in TensorFlow Transform for this lab. Ignore any errors referring to `tfx_bsl` for this notebook.

In [None]:
import argparse
import math
import os
import pprint
import tempfile

from absl import logging
import apache_beam as beam
import tensorflow as tf
import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import schema_utils

Next we will download our data and place it into a folder. Also we will create a directory for our metadata from the TF Transform job.

In [None]:
%%bash

# Remove files from previous notebook runs if necessary
rm -r input_data
rm -r working_dir

mkdir ./input_data
mkdir ./working_dir

wget https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data 
wget https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test
    
mv adult.* ./input_data/

We will define lists with the CSV column names and split the columns based on if they are refer to categorical or numeric features. Note that `education-num` is an optional feature (i.e. it can have a value of `None`), so we will put it in a different list as we will need to treat it differerently.

In [None]:
ORDERED_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education-num',
    'marital-status', 'occupation', 'relationship', 'race', 'sex',
    'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'label'
]

CATEGORICAL_FEATURE_KEYS = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
]
NUMERIC_FEATURE_KEYS = [
    'age',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
]
OPTIONAL_NUMERIC_FEATURE_KEYS = [
    'education-num',
]
LABEL_KEY = 'label'

## Preprocessing pipeline in TF Transform ##

Now we're ready to create our pipeline. In a bit we will be creating a preprocessing function (`preprocessing_fn`) to apply our `tf.transform` operations to our data, then we will create an Apache Beam pipeline to carry out the operations and collect metadata to repeat the transformations at serving time. 

First though, let us create a custom `PTransform` that will allow us to filter out errors when applying `beam.Map`-like operations. This keep the pipeline from crashing because of a single bad line of data, while letting us keep track of the number of bad rows. If this number is a large percentage of our number of instances, then we would want to more carefully explore our data.

In [None]:
class MapAndFilterErrors(beam.PTransform):
  """Like beam.Map but filters out errors in the map_fn."""

  class _MapAndFilterErrorsDoFn(beam.DoFn):
    """Count the bad examples using a beam metric."""

    def __init__(self, fn):
      self._fn = fn
      # Create a counter to measure number of bad elements.
      self._bad_elements_counter = beam.metrics.Metrics.counter(
          'census_example', 'bad_elements')

    def process(self, element):
      try:
        yield self._fn(element)
      except Exception:  # pylint: disable=broad-except
        # Catch any exception the above call.
        self._bad_elements_counter.inc(1)

  def __init__(self, fn):
    self._fn = fn

  def expand(self, pcoll):
    return pcoll | beam.ParDo(self._MapAndFilterErrorsDoFn(self._fn))

Next we need to define a `feature_spec` to inform our TF Transform pipeline how to expect incoming data, that is to define a schema. This information will be saved as Dataset metadata and shared with the model at serving time as well. We will go ahead and define some constants that we will need later in the process.

In [None]:
RAW_DATA_FEATURE_SPEC = dict([(name, tf.io.FixedLenFeature([], tf.string))
                              for name in CATEGORICAL_FEATURE_KEYS] +
                             [(name, tf.io.FixedLenFeature([], tf.float32))
                              for name in NUMERIC_FEATURE_KEYS] +
                             [(name, tf.io.VarLenFeature(tf.float32))  
                              for name in OPTIONAL_NUMERIC_FEATURE_KEYS] +
                             [(LABEL_KEY,
                               tf.io.FixedLenFeature([], tf.string))])

RAW_DATA_METADATA = dataset_metadata.DatasetMetadata(
    schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC))

# Constants used for training.  Note that the number of instances will be
# computed by tf.Transform in future versions, in which case it can be read from
# the metadata. 
TRAIN_BATCH_SIZE = 128
TRAIN_NUM_EPOCHS = 50
NUM_TRAIN_INSTANCES = 32561
NUM_TEST_INSTANCES = 16281

# Names of temp files
TRANSFORMED_TRAIN_DATA_FILEBASE = 'train_transformed'
TRANSFORMED_TEST_DATA_FILEBASE = 'test_transformed'
EXPORTED_MODEL_DIR = 'exported_model_dir'

Before we define our Apache Beam pipeline, we will create our preprocessing function. We will do the following:

1. Scale all numeric features to the range \[0,1\] using `tft.scale_to_0_1`
2. We will pack our optional numeric feature into a sparse tensor (since we could have null values) and then convert to a dense tensor. After that we will need to drop a dimension before scaling the feature as we did above using `tft.scale_to_0_1`
3. For the categorical features, we will use `tft.compute_and_apply_vocabulary` to run through the data and compile a dictionary of feature values and then return a `Tensor` with the corresponding indexes in the vocabular list for each instance.
4. Then we will replace the original labels with integer valued labels (0 or 1).
5. Finally, we will remove trailing periods when the data is read with `tf.data`. This is of course data specific, and you would need to explore the data to see that this is necessary.

In [None]:
def preprocessing_fn(inputs):
    """Preprocess input columns into transformed columns."""
    # Since we are modifying some features and leaving others unchanged, we
    # start by setting `outputs` to a copy of `inputs.
    outputs = inputs.copy()

    # Scale numeric columns to have range [0, 1].
    for key in NUMERIC_FEATURE_KEYS:
        outputs[key] = tft.scale_to_0_1(inputs[key])

    for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
        # This is a SparseTensor because it is optional. Here we fill in a default
        # value when it is missing.
        sparse = tf.sparse.SparseTensor(inputs[key].indices, inputs[key].values,
                                        [inputs[key].dense_shape[0], 1])
        dense = tf.sparse.to_dense(sp_input=sparse, default_value=0.)
        # Reshaping from a batch of vectors of size 1 to a batch to scalars.
        dense = tf.squeeze(dense, axis=1)
        outputs[key] = tft.scale_to_0_1(dense)

    # For all categorical columns except the label column, we generate a
    # vocabulary but do not modify the feature.  This vocabulary is instead
    # used in the trainer, by means of a feature column, to convert the feature
    # from a string to an integer id.
    for key in CATEGORICAL_FEATURE_KEYS:
        outputs[key] = tft.compute_and_apply_vocabulary(
                           tf.strings.strip(inputs[key]), num_oov_buckets=1, vocab_filename=key)

    # For the label column we provide the mapping from string to index.
    table_keys = ['>50K', '<=50K']
    initializer = tf.lookup.KeyValueTensorInitializer(
        keys=table_keys,
        values=tf.cast(tf.range(len(table_keys)), tf.int64),
        key_dtype=tf.string,
        value_dtype=tf.int64)
        table = tf.lookup.StaticHashTable(initializer, default_value=-1)

    # Romove trailing periods for test data when the data is read with tf.data.
    label_str = tf.strings.regex_replace(inputs[LABEL_KEY], r'\.', '')
    label_str = tf.strings.strip(label_str)
    data_labels = table.lookup(label_str)
    transformed_label = tf.one_hot(
    indices=data_labels, depth=len(table_keys), on_value=1.0, off_value=0.0)
    outputs[LABEL_KEY] = tf.reshape(transformed_label, [-1, len(table_keys)])

    return outputs

Now we will define our Apache Beam pipeline. The `transform_data` function will be used to define and run our pipeline. We will apply a PTransform to parse our data and apply our custom PTransform defined earlier, after that we will apply our `preprocessing_fn` using `AnalayzeAndTransformDataset`. After that we will encode our transformed data into the TFRecord format, and then apply `TransformDataset` to transform our test dataset as well. Finally we will write out our `transform_fn` and metadata for the dataset so that we can also reference it at serving time.

In this notebook we will run the Apache Beam pipeline locally since this is a smaller dataset. If we were using a larger dataset we could pass the proper options to the pipeline to run the pipeline on Dataflow.

In [None]:
def transform_data(train_data_file, test_data_file, working_dir):
  """Transform the data and write out as a TFRecord of Example protos.

  Read in the data using the CSV reader, and transform it using a
  preprocessing pipeline that scales numeric data and converts categorical data
  from strings to int64 values indices, by creating a vocabulary for each
  category.

  Args:
    train_data_file: File containing training data
    test_data_file: File containing test data
    working_dir: Directory to write transformed data and metadata to
  """

# The "with" block will create a pipeline, and run that pipeline at the exit
  # of the block.
  with beam.Pipeline() as pipeline:
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
      # Create a coder to read the census data with the schema.  To do this we
      # need to list all columns in order since the schema doesn't specify the
      # order of columns in the csv.
      converter = tft.coders.CsvCoder(ORDERED_CSV_COLUMNS,
                                      RAW_DATA_METADATA.schema)

      # Read in raw data and convert using CSV converter.  Note that we apply
      # some Beam transformations here, which will not be encoded in the TF
      # graph since we don't do the from within tf.Transform's methods
      # (AnalyzeDataset, TransformDataset etc.).  These transformations are just
      # to get data into a format that the CSV converter can read, in particular
      # removing spaces after commas.
      #
      # We use MapAndFilterErrors instead of Map to filter out decode errors in
      # convert.decode which should only occur for the trailing blank line.
    
      raw_data = (
          pipeline
          | 'ReadTrainData' >> beam.io.ReadFromText(train_data_file)
          | 'FixCommasTrainData' >> beam.Map(
              lambda line: line.replace(', ', ','))
          | 'DecodeTrainData' >> MapAndFilterErrors(converter.decode))

      # Combine data and schema into a dataset tuple.  Note that we already used
      # the schema to read the CSV data, but we also need it to interpret
      # raw_data.
      raw_dataset = (raw_data, RAW_DATA_METADATA)
        
      transformed_dataset, transform_fn = (
          raw_dataset | tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))
    
      transformed_data, transformed_metadata = transformed_dataset
        
      transformed_data_coder = tft.coders.ExampleProtoCoder(
          transformed_metadata.schema)

      _ = (
          transformed_data
          | 'EncodeTrainData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTrainData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE)))

      # Now apply transform function to test data.  In this case we remove the
      # trailing period at the end of each line, and also ignore the header line
      # that is present in the test data file.
      raw_test_data = (
          pipeline
          | 'ReadTestData' >> beam.io.ReadFromText(test_data_file,
                                                   skip_header_lines=1)
          | 'FixCommasTestData' >> beam.Map(
              lambda line: line.replace(', ', ','))
          | 'RemoveTrailingPeriodsTestData' >> beam.Map(lambda line: line[:-1])
          | 'DecodeTestData' >> MapAndFilterErrors(converter.decode))

      raw_test_dataset = (raw_test_data, RAW_DATA_METADATA)

      transformed_test_dataset = (
          (raw_test_dataset, transform_fn) | tft_beam.TransformDataset())
      # Don't need transformed data schema, it's the same as before.
      transformed_test_data, _ = transformed_test_dataset

      _ = (
          transformed_test_data
          | 'EncodeTestData' >> beam.Map(transformed_data_coder.encode)
          | 'WriteTestData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE)))

      # Will write a SavedModel and metadata to working_dir, which can then
      # be read by the tft.TFTransformOutput class.
      _ = (
          transform_fn
          | 'WriteTransformFn' >> tft_beam.WriteTransformFn(working_dir))


After we have preprocessed our data, we're ready to ingest the transformed data for training. We will create our `input_fn` which will return a `tf.data` pipeline. Note that we are using the `transformed_feature_spec` function applied to our `tf.transform` output to build in the metadata from TF Transform into our `tf.data` pipeline.

In [None]:
def input_fn(tf_transform_output, transformed_examples_pattern, batch_size):
  """An input function reading from transformed data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples_pattern: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """
  return tf.data.experimental.make_batched_features_dataset(
      file_pattern=transformed_examples_pattern,
      batch_size=batch_size,
      features=tf_transform_output.transformed_feature_spec(),
      reader=tf.data.TFRecordDataset,
      label_key=LABEL_KEY,
      shuffle=True).prefetch(tf.data.experimental.AUTOTUNE


But, what if the input data has not already been transformed? We don't want to rerun the entire pipeline necessarily, but rather apply the transformations from our TF Transform pipeline to the batches of data as they're being passed to the model. Here we will leverage a `transform_features_layer` to preprocess the features (using the metadata and TF graph generated by TF Transform) before passing data into the model.

In [None]:
def input_fn_raw(tf_transform_output, raw_examples_pattern, batch_size):
  """An input function reading from raw data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    raw_examples_pattern: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """
 
  # Order incoming features in expected order from the list defined earlier.
  def get_ordered_raw_data_dtypes():
    result = []
    for col in ORDERED_CSV_COLUMNS:
      if col not in RAW_DATA_FEATURE_SPEC:
        result.append(0.0)
        continue
      spec = RAW_DATA_FEATURE_SPEC[col]
      if isinstance(spec, tf.io.FixedLenFeature):
        result.append(spec.dtype)
      else:
        result.append(0.0)
    return result

  dataset = tf.data.experimental.make_csv_dataset(
      file_pattern=raw_examples_pattern,
      batch_size=batch_size,
      column_names=ORDERED_CSV_COLUMNS,
      column_defaults=get_ordered_raw_data_dtypes(),
      prefetch_buffer_size=0,
      ignore_errors=True)

  tft_layer = tf_transform_output.transform_features_layer()

  def transform_dataset(data):
    raw_features = {}
    for key, val in data.items():
      if key not in RAW_DATA_FEATURE_SPEC:
        continue
      if isinstance(RAW_DATA_FEATURE_SPEC[key], tf.io.VarLenFeature):
        raw_features[key] = tf.RaggedTensor.from_tensor(
            tf.expand_dims(val, -1)).to_sparse()
        continue
      raw_features[key] = val
    transformed_features = tft_layer(raw_features)
    data_labels = transformed_features.pop(LABEL_KEY)
    return (transformed_features, data_labels)

  return dataset.map(
      transform_dataset,
      num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(
          tf.data.experimental.AUTOTUNE)

Our training and test datasets have now been transformed! However, we need to ensure that we avoid training-serving skew by applying the transformations at prediction time as well. To ensure this, we need to create a function that includes our `transform_fn` as part of the model. We will do this by creating a new layer in our model via the `transform_feature_layer` function. We define a function `serve_tf_examples_fn` and decorate it as a `tf.function` so that it can be compiled into the model graph. Finally we will create a concrete function via `get_concrete_function` so we can include it in the model signatures.

In [None]:
def export_serving_model(tf_transform_output, model, output_dir):
  """Exports a keras model for serving.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    model: A keras model to export for serving.
    output_dir: A directory where the model will be exported to.
  """
  # The layer has to be saved to the model for keras tracking purpases.
  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Serving tf.function model wrapper."""
    feature_spec = RAW_DATA_FEATURE_SPEC.copy()
    feature_spec.pop(LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    outputs = model(transformed_features)
    classes_names = tf.constant([['0', '1']])
    classes = tf.tile(classes_names, [tf.shape(outputs)[0], 1])
    return {'classes': classes, 'scores': outputs}

  concrete_serving_fn = serve_tf_examples_fn.get_concrete_function(
      tf.TensorSpec(shape=[None], dtype=tf.string, name='inputs'))
  signatures = {'serving_default': concrete_serving_fn}

  # This is required in order to make this model servable with model_server.
  versioned_output_dir = os.path.join(output_dir, '1')
  model.save(versioned_output_dir, save_format='tf', signatures=signatures)

Now we're ready to define our training and evaluation loop! For the most part, this is the same standard function we've seen in previous examples. The biggest difference is that we include the output directory of the TF Transform pipeline so that we can transform the data as needed.

In [None]:
def train_and_evaluate(raw_train_eval_data_path_pattern,
                       transformed_train_eval_data_path_pattern,
                       output_dir,
                       transform_output_dir,
                       num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    raw_train_eval_data_path_pattern: A pair of patterns of raw
      (train data file paths, eval data file paths) in CSV format.
    transformed_train_eval_data_path_pattern: A pair of patterns of transformed
      (train data file paths, eval data file paths) in TFRecord format.
    output_dir: A directory where the output should be exported to.
    transform_output_dir: The location of the Transform output.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  if not ((raw_train_eval_data_path_pattern is None) ^
          (transformed_train_eval_data_path_pattern is None)):
    raise ValueError(
        'Exactly one of raw_train_eval_data_path_pattern and '
        'transformed_train_eval_data_path_pattern should be provided')
  tf_transform_output = tft.TFTransformOutput(transform_output_dir)

  if raw_train_eval_data_path_pattern is not None:
    selected_input_fn = input_fn_raw
    (train_data_path_pattern,
     eval_data_path_pattern) = raw_train_eval_data_path_pattern
  else:
    selected_input_fn = input_fn
    (train_data_path_pattern,
     eval_data_path_pattern) = transformed_train_eval_data_path_pattern

  train_dataset = selected_input_fn(
      tf_transform_output, train_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)

  # Evaluate model on test dataset.
  validation_dataset = selected_input_fn(
      tf_transform_output, eval_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)

  feature_spec = tf_transform_output.transformed_feature_spec().copy()
  feature_spec.pop(LABEL_KEY)

  inputs = {}
  for key, spec in feature_spec.items():
    if isinstance(spec, tf.io.VarLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=[None], name=key, dtype=spec.dtype, sparse=True)
    elif isinstance(spec, tf.io.FixedLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=spec.shape, name=key, dtype=spec.dtype)
    else:
      raise ValueError('Spec type is not supported: ', key, spec)

  encoded_inputs = {}
  for key in inputs:
    feature = tf.expand_dims(inputs[key], -1)
    if key in CATEGORICAL_FEATURE_KEYS:
      num_buckets = tf_transform_output.num_buckets_for_transformed_feature(key)
      encoding_layer = (
          tf.keras.layers.experimental.preprocessing.CategoryEncoding(
              max_tokens=num_buckets, output_mode='binary', sparse=False))
      encoded_inputs[key] = encoding_layer(feature)
    else:
      encoded_inputs[key] = feature

  stacked_inputs = tf.concat(tf.nest.flatten(encoded_inputs), axis=1)
  output = tf.keras.layers.Dense(100, activation='relu')(stacked_inputs)
  output = tf.keras.layers.Dense(70, activation='relu')(output)
  output = tf.keras.layers.Dense(50, activation='relu')(output)
  output = tf.keras.layers.Dense(20, activation='relu')(output)
  output = tf.keras.layers.Dense(2, activation='sigmoid')(output)
  model = tf.keras.Model(inputs=inputs, outputs=output)

  model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
  logging.info(model.summary())

  model.fit(train_dataset, validation_data=validation_dataset,
            epochs=TRAIN_NUM_EPOCHS,
            verbose=2,
            steps_per_epoch=math.ceil(num_train_instances / TRAIN_BATCH_SIZE),
            validation_steps=math.ceil(num_test_instances / TRAIN_BATCH_SIZE))

  # Export the model.
  export_serving_model(tf_transform_output, model, output_dir)

  return model.evaluate(validation_dataset, steps=num_test_instances)


In [None]:
def main(input_data_dir,
         working_dir,
         read_raw_data_for_training=True,
         num_train_instances=NUM_TRAIN_INSTANCES,
         num_test_instances=NUM_TEST_INSTANCES):
  if not working_dir:
    working_dir = tempfile.mkdtemp(dir=input_data_dir)

  train_data_file = os.path.join(input_data_dir, 'adult.data')
  test_data_file = os.path.join(input_data_dir, 'adult.test')

  transform_data(train_data_file, test_data_file, working_dir)

  if read_raw_data_for_training:
    raw_train_and_eval_patterns = (train_data_file, test_data_file)
    transformed_train_and_eval_patterns = None
  else:
    train_pattern = os.path.join(working_dir,
                                 TRANSFORMED_TRAIN_DATA_FILEBASE + '*')
    eval_pattern = os.path.join(working_dir,
                                TRANSFORMED_TEST_DATA_FILEBASE + '*')
    raw_train_and_eval_patterns = None
    transformed_train_and_eval_patterns = (train_pattern, eval_pattern)
  output_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  results = train_and_evaluate(
      raw_train_and_eval_patterns,
      transformed_train_and_eval_patterns,
      output_dir,
      working_dir,
      num_train_instances=num_train_instances,
      num_test_instances=num_test_instances)

  pprint.pprint(results)

We're ready to train our model. We will run our main function defined above to execute the training loop after providing the input directory containing our data and the working directory we wish to use.

In [None]:
INPUT_DATA_DIR="./input_data"
WORKING_DIR="./working_dir"

main(INPUT_DATA_DIR, WORKING_DIR)

We now have saved a trained model with our custom serving input function to apply our tranformed defined by TF Transform. Let's now be sure that everything works!

To see a little bit of information about what inputs are expected, we can use the `saved_model_cli show` function.

In [None]:
!saved_model_cli show --dir ./working_dir/exported_model_dir/1/ --tag_set serve --signature_def serving_default

We see that we expect a scalar input of a string for our prediction. Why is this? Well recall that we created a pipeline to import TFRecords as `tf.Example`s. We need to pass along the tensor in the protobuf format to match what we saw at training time. Fortunately, the `text_format` function in the `google-protobuf` package can handle this for us.

In [None]:
from google.protobuf import text_format

_PREDICT_TF_EXAMPLE_TEXT_PB = """
    features {
      feature {
        key: "age"
        value { float_list: { value: 25 } }
      }
      feature {
        key: "workclass"
        value { bytes_list: { value: "Private" } }
      }
      feature {
        key: "education"
        value { bytes_list: { value: "11th" } }
      }
      feature {
        key: "education-num"
        value { float_list: { value: 7 } }
      }
      feature {
        key: "marital-status"
        value { bytes_list: { value: "Never-married" } }
      }
      feature {
        key: "occupation"
        value { bytes_list: { value: "Machine-op-inspct" } }
      }
      feature {
        key: "relationship"
        value { bytes_list: { value: "Own-child" } }
      }
      feature {
        key: "race"
        value { bytes_list: { value: "Black" } }
      }
      feature {
        key: "sex"
        value { bytes_list: { value: "Male" } }
      }
      feature {
        key: "capital-gain"
        value { float_list: { value: 0 } }
      }
      feature {
        key: "capital-loss"
        value { float_list: { value: 0 } }
      }
      feature {
        key: "hours-per-week"
        value { float_list: { value: 40 } }
      }
      feature {
        key: "native-country"
        value { bytes_list: { value: "United-States" } }
      }
    }
    """

_MODEL_NAME = 'my_model'

_CLASSIFICATION_REQUEST_TEXT_PB = """model_spec { name: "%s" }
    input {
      example_list {
        examples {
          %s
        }
      }
    }""" % (_MODEL_NAME, _PREDICT_TF_EXAMPLE_TEXT_PB)


model = tf.keras.models.load_model('./working_dir/exported_model_dir/1/')

example = text_format.Parse(_PREDICT_TF_EXAMPLE_TEXT_PB, tf.train.Example())
prediction = model.signatures['serving_default'](
    tf.constant([example.SerializeToString()], tf.string))

print(prediction['scores'].numpy())