<div class="devsite-table-wrapper"><table class="tfo-notebook-buttons" align="left">
<td><a target="_blank" href="https://www.tensorflow.org/tfx/tutorials/transform/census">
<img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a></td>
<td><a target="_blank" href="https://colab.sandbox.google.com/github/tensorflow/tfx/blob/master/docs/tutorials/transform/census.ipynb">
<img src="https://www.tensorflow.org/images/colab_logo_32px.png">Run in Google Colab</a></td>
<td><a target="_blank" href="https://github.com/tensorflow/tfx/blob/master/docs/tutorials/transform/census.ipynb">
<img width=32px src="https://www.tensorflow.org/images/GitHub-Mark-32px.png">View source on GitHub</a></td>
<td><a target="_blank" href="https://storage.googleapis.com/tensorflow_docs/tfx/docs/tutorials/transform/census.ipynb">
<img width=32px src="https://www.tensorflow.org/images/download_logo_32px.png">Download notebook</a></td>
</table></div>

##### Copyright 2020 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Preprocessing data with TensorFlow Transform
***The Feature Engineering Component of TensorFlow Extended (TFX)***

This example colab notebook provides a somewhat more advanced example of how <a target='_blank' href='https://www.tensorflow.org/tfx/transform/'>TensorFlow Transform</a> (`tf.Transform`) can be used to preprocess data using exactly the same code for both training a model and serving inferences in production.

TensorFlow Transform is a library for preprocessing input data for TensorFlow, including creating features that require a full pass over the training dataset.  For example, using TensorFlow Transform you could:

* Normalize an input value by using the mean and standard deviation
* Convert strings to integers by generating a vocabulary over all of the input values
* Convert floats to integers by assigning them to buckets, based on the observed data distribution

TensorFlow has built-in support for manipulations on a single example or a batch of examples. `tf.Transform` extends these capabilities to support full passes over the entire training dataset.

The output of `tf.Transform` is exported as a TensorFlow graph which you can use for both training and serving. Using the same graph for both training and serving can prevent skew, since the same transformations are applied in both stages.

Key Point: In order to understand `tf.Transform` and how it works with Apache Beam, you'll need to know a little bit about Apache Beam itself.  The <a target='_blank' href='https://beam.apache.org/documentation/programming-guide/'>Beam Programming Guide</a> is a great place to start.

##What we're doing in this example

In this example we'll be processing a <a target='_blank' href='https://archive.ics.uci.edu/ml/machine-learning-databases/adult'>widely used dataset containing census data</a>, and training a model to do classification.  Along the way we'll be transforming the data using `tf.Transform`.

Key Point: As a modeler and developer, think about how this data is used and the potential benefits and harm a model's predictions can cause. A model like this could reinforce societal biases and disparities. Is a feature relevant to the problem you want to solve or will it introduce bias? For more information, read about <a target='_blank' href='https://developers.google.com/machine-learning/fairness-overview/'>ML fairness</a>.

Note: <a target='_blank' href='https://www.tensorflow.org/tfx/model_analysis'>TensorFlow Model Analysis</a> is a powerful tool for understanding how well your model predicts for various segments of your data, including understanding how your model may reinforce societal biases and disparities.

### Upgrade Pip

To avoid upgrading Pip in a system when running locally, check to make sure that we're running in Colab.  Local systems can of course be upgraded separately.

In [2]:
try:
  import colab
  !pip install --upgrade pip
except:
  pass

### Install TensorFlow Transform

**Note: In Google Colab, because of package updates, the first time you run this cell you must restart the runtime (Runtime > Restart runtime ...).**

In [3]:
!pip install tensorflow-transform



Collecting numpy<1.20,>=1.16


  Using cached numpy-1.19.5-cp37-cp37m-manylinux2010_x86_64.whl (14.8 MB)






Collecting httplib2<0.20.0,>=0.8
  Using cached httplib2-0.19.1-py3-none-any.whl (95 kB)




























Installing collected packages: numpy, httplib2
  Attempting uninstall: numpy
    Found existing installation: numpy 1.21.4


    Uninstalling numpy-1.21.4:
      Successfully uninstalled numpy-1.21.4


  Attempting uninstall: httplib2
    Found existing installation: httplib2 0.20.2
    Uninstalling httplib2-0.20.2:
      Successfully uninstalled httplib2-0.20.2


[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tfx 0.29.0 requires packaging<21,>=20, but you have packaging 21.3 which is incompatible.
tfx 0.29.0 requires tensorflow-hub<0.10,>=0.9.0, but you have tensorflow-hub 0.12.0 which is incompatible.[0m
Successfully installed httplib2-0.19.1 numpy-1.19.5


## Python check, imports, and globals
First we'll make sure that we're using Python 3, and then go ahead and install and import the stuff we need.

In [4]:
import sys

# Confirm that we're using Python 3
assert sys.version_info.major == 3, 'Oops, not running Python 3. Use Runtime > Change runtime type'

In [5]:
import math
import os
import pprint

import tensorflow as tf
print('TF: {}'.format(tf.__version__))

import apache_beam as beam
print('Beam: {}'.format(beam.__version__))

import tensorflow_transform as tft
import tensorflow_transform.beam as tft_beam
print('Transform: {}'.format(tft.__version__))

from tfx_bsl.public import tfxio
from tfx_bsl.coders.example_coder import RecordBatchToExamples

!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
!wget https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test

train = './adult.data'
test = './adult.test'

TF: 2.4.4


Beam: 2.34.0
Transform: 0.29.0


--2021-12-04 10:43:05--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.data
Resolving storage.googleapis.com (storage.googleapis.com)... 142.251.8.128, 74.125.204.128, 64.233.189.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.251.8.128|:443... connected.
HTTP request sent, awaiting response... 

200 OK
Length: 3974305 (3.8M) [application/octet-stream]
Saving to: ‘adult.data’

adult.data            0%[                    ]       0  --.-KB/s               


2021-12-04 10:43:05 (135 MB/s) - ‘adult.data’ saved [3974305/3974305]



--2021-12-04 10:43:05--  https://storage.googleapis.com/artifacts.tfx-oss-public.appspot.com/datasets/census/adult.test
Resolving storage.googleapis.com (storage.googleapis.com)... 142.250.157.128, 108.177.125.128, 64.233.189.128, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|142.250.157.128|:443... connected.
HTTP request sent, awaiting response... 

200 OK
Length: 2003153 (1.9M) [application/octet-stream]
Saving to: ‘adult.test’

adult.test            0%[                    ]       0  --.-KB/s               


2021-12-04 10:43:05 (177 MB/s) - ‘adult.test’ saved [2003153/2003153]



### Name our columns
We'll create some handy lists for referencing the columns in our dataset.

In [6]:
CATEGORICAL_FEATURE_KEYS = [
    'workclass',
    'education',
    'marital-status',
    'occupation',
    'relationship',
    'race',
    'sex',
    'native-country',
]
NUMERIC_FEATURE_KEYS = [
    'age',
    'capital-gain',
    'capital-loss',
    'hours-per-week',
]
OPTIONAL_NUMERIC_FEATURE_KEYS = [
    'education-num',
]
ORDERED_CSV_COLUMNS = [
    'age', 'workclass', 'fnlwgt', 'education', 'education-num',
    'marital-status', 'occupation', 'relationship', 'race', 'sex',
    'capital-gain', 'capital-loss', 'hours-per-week', 'native-country', 'label'
]
LABEL_KEY = 'label'

###Define our features and schema
Let's define a schema based on what types the columns are in our input.  Among other things this will help with importing them correctly.

In [7]:
RAW_DATA_FEATURE_SPEC = dict(
    [(name, tf.io.FixedLenFeature([], tf.string))
     for name in CATEGORICAL_FEATURE_KEYS] +
    [(name, tf.io.FixedLenFeature([], tf.float32))
     for name in NUMERIC_FEATURE_KEYS] +
    [(name, tf.io.VarLenFeature(tf.float32))
     for name in OPTIONAL_NUMERIC_FEATURE_KEYS] +
    [(LABEL_KEY, tf.io.FixedLenFeature([], tf.string))]
)

SCHEMA = tft.tf_metadata.dataset_metadata.DatasetMetadata(
    tft.tf_metadata.schema_utils.schema_from_feature_spec(RAW_DATA_FEATURE_SPEC)).schema

###Setting hyperparameters and basic housekeeping
Constants and hyperparameters used for training.  The bucket size includes all listed categories in the dataset description as well as one extra for "?" which represents unknown.

Note: The number of instances will be computed by `tf.Transform` in future versions, in which case it can be read from the metadata.  Similarly BUCKET_SIZES will not be needed as this information will be stored in the metadata for each of the columns.

In [8]:
testing = os.getenv("WEB_TEST_BROWSER", False)
NUM_OOV_BUCKETS = 1
if testing:
  TRAIN_NUM_EPOCHS = 1
  NUM_TRAIN_INSTANCES = 1
  TRAIN_BATCH_SIZE = 1
  NUM_TEST_INSTANCES = 1
else:
  TRAIN_NUM_EPOCHS = 16
  NUM_TRAIN_INSTANCES = 32561
  TRAIN_BATCH_SIZE = 128
  NUM_TEST_INSTANCES = 16281

# Names of temp files
TRANSFORMED_TRAIN_DATA_FILEBASE = 'train_transformed'
TRANSFORMED_TEST_DATA_FILEBASE = 'test_transformed'
EXPORTED_MODEL_DIR = 'exported_model_dir'

##Preprocessing with `tf.Transform`

###Create a `tf.Transform` preprocessing_fn
The _preprocessing function_ is the most important concept of tf.Transform. A preprocessing function is where the transformation of the dataset really happens. It accepts and returns a dictionary of tensors, where a tensor means a [`Tensor`](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/Tensor) or [`SparseTensor`](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/SparseTensor). There are two main groups of API calls that typically form the heart of a preprocessing function:

1. **TensorFlow Ops:** Any function that accepts and returns tensors, which usually means TensorFlow ops. These add TensorFlow operations to the graph that transforms raw data into transformed data one feature vector at a time.  These will run for every example, during both training and serving.
2. **TensorFlow Transform Analyzers:** Any of the analyzers provided by tf.Transform. Analyzers also accept and return tensors, but unlike TensorFlow ops they only run once, during training, and typically make a full pass over the entire training dataset. They create [tensor constants](https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/constant), which are added to your graph. For example, `tft.min` computes the minimum of a tensor over the training dataset. tf.Transform provides a fixed set of analyzers, but this will be extended in future versions.

Caution: When you apply your preprocessing function to serving inferences, the constants that were created by analyzers during training do not change.  If your data has trend or seasonality components, plan accordingly.

In [9]:
def preprocessing_fn(inputs):
  """Preprocess input columns into transformed columns."""
  # Since we are modifying some features and leaving others unchanged, we
  # start by setting `outputs` to a copy of `inputs.
  outputs = inputs.copy()

  # Scale numeric columns to have range [0, 1].
  for key in NUMERIC_FEATURE_KEYS:
    outputs[key] = tft.scale_to_0_1(inputs[key])

  for key in OPTIONAL_NUMERIC_FEATURE_KEYS:
    # This is a SparseTensor because it is optional. Here we fill in a default
    # value when it is missing.
    sparse = tf.sparse.SparseTensor(inputs[key].indices, inputs[key].values,
                                    [inputs[key].dense_shape[0], 1])
    dense = tf.sparse.to_dense(sp_input=sparse, default_value=0.)
    # Reshaping from a batch of vectors of size 1 to a batch to scalars.
    dense = tf.squeeze(dense, axis=1)
    outputs[key] = tft.scale_to_0_1(dense)

  # For all categorical columns except the label column, we generate a
  # vocabulary but do not modify the feature.  This vocabulary is instead
  # used in the trainer, by means of a feature column, to convert the feature
  # from a string to an integer id.
  for key in CATEGORICAL_FEATURE_KEYS:
    outputs[key] = tft.compute_and_apply_vocabulary(
        tf.strings.strip(inputs[key]),
        num_oov_buckets=NUM_OOV_BUCKETS,
        vocab_filename=key)

  # For the label column we provide the mapping from string to index.
  table_keys = ['>50K', '<=50K']
  with tf.init_scope():
    initializer = tf.lookup.KeyValueTensorInitializer(
        keys=table_keys,
        values=tf.cast(tf.range(len(table_keys)), tf.int64),
        key_dtype=tf.string,
        value_dtype=tf.int64)
    table = tf.lookup.StaticHashTable(initializer, default_value=-1)
  # Remove trailing periods for test data when the data is read with tf.data.
  label_str = tf.strings.regex_replace(inputs[LABEL_KEY], r'\.', '')
  label_str = tf.strings.strip(label_str)
  data_labels = table.lookup(label_str)
  transformed_label = tf.one_hot(
      indices=data_labels, depth=len(table_keys), on_value=1.0, off_value=0.0)
  outputs[LABEL_KEY] = tf.reshape(transformed_label, [-1, len(table_keys)])

  return outputs

###Transform the data
Now we're ready to start transforming our data in an Apache Beam pipeline.

1. Read in the data using the CSV reader
1. Transform it using a preprocessing pipeline that scales numeric data and converts categorical data from strings to int64 values indices, by creating a vocabulary for each category
1. Write out the result as a `TFRecord` of `Example` protos, which we will use for training a model later

<aside class="key-term"><b>Key Term:</b> <a target='_blank' href='https://beam.apache.org/'>Apache Beam</a> uses a <a target='_blank' href='https://beam.apache.org/documentation/programming-guide/#applying-transforms'>special syntax to define and invoke transforms</a>.  For example, in this line:

<code><blockquote>result = pass_this | 'name this step' >> to_this_call</blockquote></code>

The method <code>to_this_call</code> is being invoked and passed the object called <code>pass_this</code>, and <a target='_blank' href='https://stackoverflow.com/questions/50519662/what-does-the-redirection-mean-in-apache-beam-python'>this operation will be referred to as <code>name this step</code> in a stack trace</a>.  The result of the call to <code>to_this_call</code> is returned in <code>result</code>.  You will often see stages of a pipeline chained together like this:

<code><blockquote>result = apache_beam.Pipeline() | 'first step' >> do_this_first() | 'second step' >> do_this_last()</blockquote></code>

and since that started with a new pipeline, you can continue like this:

<code><blockquote>next_result = result | 'doing more stuff' >> another_function()</blockquote></code></aside>

In [10]:
def transform_data(train_data_file, test_data_file, working_dir):
  """Transform the data and write out as a TFRecord of Example protos.

  Read in the data using the CSV reader, and transform it using a
  preprocessing pipeline that scales numeric data and converts categorical data
  from strings to int64 values indices, by creating a vocabulary for each
  category.

  Args:
    train_data_file: File containing training data
    test_data_file: File containing test data
    working_dir: Directory to write transformed data and metadata to
  """

  # The "with" block will create a pipeline, and run that pipeline at the exit
  # of the block.
  with beam.Pipeline() as pipeline:
    with tft_beam.Context(temp_dir=tempfile.mkdtemp()):
      # Create a TFXIO to read the census data with the schema. To do this we
      # need to list all columns in order since the schema doesn't specify the
      # order of columns in the csv.
      # We first read CSV files and use BeamRecordCsvTFXIO whose .BeamSource()
      # accepts a PCollection[bytes] because we need to patch the records first
      # (see "FixCommasTrainData" below). Otherwise, tfxio.CsvTFXIO can be used
      # to both read the CSV files and parse them to TFT inputs:
      # csv_tfxio = tfxio.CsvTFXIO(...)
      # raw_data = (pipeline | 'ToRecordBatches' >> csv_tfxio.BeamSource())
      csv_tfxio = tfxio.BeamRecordCsvTFXIO(
          physical_format='text',
          column_names=ORDERED_CSV_COLUMNS,
          schema=SCHEMA)

      # Read in raw data and convert using CSV TFXIO.  Note that we apply
      # some Beam transformations here, which will not be encoded in the TF
      # graph since we don't do the from within tf.Transform's methods
      # (AnalyzeDataset, TransformDataset etc.).  These transformations are just
      # to get data into a format that the CSV TFXIO can read, in particular
      # removing spaces after commas.
      raw_data = (
          pipeline
          | 'ReadTrainData' >> beam.io.ReadFromText(
              train_data_file, coder=beam.coders.BytesCoder())
          | 'FixCommasTrainData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'DecodeTrainData' >> csv_tfxio.BeamSource())

      # Combine data and schema into a dataset tuple.  Note that we already used
      # the schema to read the CSV data, but we also need it to interpret
      # raw_data.
      raw_dataset = (raw_data, csv_tfxio.TensorAdapterConfig())

      # The TFXIO output format is chosen for improved performance.
      transformed_dataset, transform_fn = (
          raw_dataset | tft_beam.AnalyzeAndTransformDataset(
              preprocessing_fn, output_record_batches=True))

      # Transformed metadata is not necessary for encoding.
      transformed_data, _ = transformed_dataset

      # Extract transformed RecordBatches, encode and write them to the given
      # directory.
      _ = (
          transformed_data
          | 'EncodeTrainData' >>
          beam.FlatMapTuple(lambda batch, _: RecordBatchToExamples(batch))
          | 'WriteTrainData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE)))

      # Now apply transform function to test data.  In this case we remove the
      # trailing period at the end of each line, and also ignore the header line
      # that is present in the test data file.
      raw_test_data = (
          pipeline
          | 'ReadTestData' >> beam.io.ReadFromText(
              test_data_file, skip_header_lines=1,
              coder=beam.coders.BytesCoder())
          | 'FixCommasTestData' >> beam.Map(
              lambda line: line.replace(b', ', b','))
          | 'RemoveTrailingPeriodsTestData' >> beam.Map(lambda line: line[:-1])
          | 'DecodeTestData' >> csv_tfxio.BeamSource())

      raw_test_dataset = (raw_test_data, csv_tfxio.TensorAdapterConfig())

      # The TFXIO output format is chosen for improved performance.
      transformed_test_dataset = (
          (raw_test_dataset, transform_fn)
          | tft_beam.TransformDataset(output_record_batches=True))

      # Transformed metadata is not necessary for encoding.
      transformed_test_data, _ = transformed_test_dataset

      # Extract transformed RecordBatches, encode and write them to the given
      # directory.
      _ = (
          transformed_test_data
          | 'EncodeTestData' >>
          beam.FlatMapTuple(lambda batch, _: RecordBatchToExamples(batch))
          | 'WriteTestData' >> beam.io.WriteToTFRecord(
              os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE)))

      # Will write a SavedModel and metadata to working_dir, which can then
      # be read by the tft.TFTransformOutput class.
      _ = (
          transform_fn
          | 'WriteTransformFn' >> tft_beam.WriteTransformFn(working_dir))

##Using our preprocessed data to train a model using tf.keras

To show how `tf.Transform` enables us to use the same code for both training and serving, and thus prevent skew, we're going to train a model.  To train our model and prepare our trained model for production we need to create input functions.  The main difference between our training input function and our serving input function is that training data contains the labels, and production data does not.  The arguments and returns are also somewhat different.

Note: This section uses tf.keras for training. If you are looking for an example using tf.estimator for training, please see the next section.

###Create an input function for training

In [11]:
def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """An input function reading from transformed data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """
  def input_fn():
    return tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        label_key=LABEL_KEY,
        shuffle=True).prefetch(tf.data.experimental.AUTOTUNE)

  return input_fn

###Create an input function for serving

Let's create an input function that we could use in production, and prepare our trained model for serving.

In [12]:
def _make_serving_input_fn(tf_transform_output, raw_examples, batch_size):
  """An input function reading from raw data, converting to model input.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    raw_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input data for training or eval, in the form of k.
  """

  def get_ordered_raw_data_dtypes():
    result = []
    for col in ORDERED_CSV_COLUMNS:
      if col not in RAW_DATA_FEATURE_SPEC:
        result.append(0.0)
        continue
      spec = RAW_DATA_FEATURE_SPEC[col]
      if isinstance(spec, tf.io.FixedLenFeature):
        result.append(spec.dtype)
      else:
        result.append(0.0)
    return result

  def input_fn():
    dataset = tf.data.experimental.make_csv_dataset(
        file_pattern=raw_examples,
        batch_size=batch_size,
        column_names=ORDERED_CSV_COLUMNS,
        column_defaults=get_ordered_raw_data_dtypes(),
        prefetch_buffer_size=0,
        ignore_errors=True)

    tft_layer = tf_transform_output.transform_features_layer()

    def transform_dataset(data):
      raw_features = {}
      for key, val in data.items():
        if key not in RAW_DATA_FEATURE_SPEC:
          continue
        if isinstance(RAW_DATA_FEATURE_SPEC[key], tf.io.VarLenFeature):
          raw_features[key] = tf.RaggedTensor.from_tensor(
              tf.expand_dims(val, -1)).to_sparse()
          continue
        raw_features[key] = val
      transformed_features = tft_layer(raw_features)
      data_labels = transformed_features.pop(LABEL_KEY)
      return (transformed_features, data_labels)

    return dataset.map(
        transform_dataset,
        num_parallel_calls=tf.data.experimental.AUTOTUNE).prefetch(
            tf.data.experimental.AUTOTUNE)

  return input_fn

###Train, Evaluate, and Export our model

In [13]:
def export_serving_model(tf_transform_output, model, output_dir):
  """Exports a keras model for serving.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    model: A keras model to export for serving.
    output_dir: A directory where the model will be exported to.
  """
  # The layer has to be saved to the model for keras tracking purpases.
  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Serving tf.function model wrapper."""
    feature_spec = RAW_DATA_FEATURE_SPEC.copy()
    feature_spec.pop(LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)
    transformed_features = model.tft_layer(parsed_features)
    outputs = model(transformed_features)
    classes_names = tf.constant([['0', '1']])
    classes = tf.tile(classes_names, [tf.shape(outputs)[0], 1])
    return {'classes': classes, 'scores': outputs}

  concrete_serving_fn = serve_tf_examples_fn.get_concrete_function(
      tf.TensorSpec(shape=[None], dtype=tf.string, name='inputs'))
  signatures = {'serving_default': concrete_serving_fn}

  # This is required in order to make this model servable with model_server.
  versioned_output_dir = os.path.join(output_dir, '1')
  model.save(versioned_output_dir, save_format='tf', signatures=signatures)  

In [14]:
def train_and_evaluate(working_dir,
                       num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: The location of the Transform output.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  train_data_path_pattern = os.path.join(working_dir,
                                 TRANSFORMED_TRAIN_DATA_FILEBASE + '*')
  eval_data_path_pattern = os.path.join(working_dir,
                            TRANSFORMED_TEST_DATA_FILEBASE + '*')
  tf_transform_output = tft.TFTransformOutput(working_dir)

  train_input_fn = _make_training_input_fn(
      tf_transform_output, train_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  train_dataset = train_input_fn()

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output, eval_data_path_pattern, batch_size=TRAIN_BATCH_SIZE)
  validation_dataset = eval_input_fn()

  feature_spec = tf_transform_output.transformed_feature_spec().copy()
  feature_spec.pop(LABEL_KEY)

  inputs = {}
  for key, spec in feature_spec.items():
    if isinstance(spec, tf.io.VarLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=[None], name=key, dtype=spec.dtype, sparse=True)
    elif isinstance(spec, tf.io.FixedLenFeature):
      inputs[key] = tf.keras.layers.Input(
          shape=spec.shape, name=key, dtype=spec.dtype)
    else:
      raise ValueError('Spec type is not supported: ', key, spec)

  encoded_inputs = {}
  for key in inputs:
    feature = tf.expand_dims(inputs[key], -1)
    if key in CATEGORICAL_FEATURE_KEYS:
      num_buckets = tf_transform_output.num_buckets_for_transformed_feature(key)
      encoding_layer = (
          tf.keras.layers.experimental.preprocessing.CategoryEncoding(
              max_tokens=num_buckets, output_mode='binary', sparse=False))
      encoded_inputs[key] = encoding_layer(feature)
    else:
      encoded_inputs[key] = feature

  stacked_inputs = tf.concat(tf.nest.flatten(encoded_inputs), axis=1)
  output = tf.keras.layers.Dense(100, activation='relu')(stacked_inputs)
  output = tf.keras.layers.Dense(70, activation='relu')(output)
  output = tf.keras.layers.Dense(50, activation='relu')(output)
  output = tf.keras.layers.Dense(20, activation='relu')(output)
  output = tf.keras.layers.Dense(2, activation='sigmoid')(output)
  model = tf.keras.Model(inputs=inputs, outputs=output)

  model.compile(optimizer='adam',
                loss='binary_crossentropy',
                metrics=['accuracy'])
  pprint.pprint(model.summary())

  model.fit(train_dataset, validation_data=validation_dataset,
            epochs=TRAIN_NUM_EPOCHS,
            steps_per_epoch=math.ceil(num_train_instances / TRAIN_BATCH_SIZE),
            validation_steps=math.ceil(num_test_instances / TRAIN_BATCH_SIZE))

  # Export the model.
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  export_serving_model(tf_transform_output, model, exported_model_dir)

  metrics_values = model.evaluate(validation_dataset, steps=num_test_instances)
  metrics_labels = model.metrics_names
  return {l: v for l, v in zip(metrics_labels, metrics_values)}

###Put it all together
We've created all the stuff we need to preprocess our census data, train a model, and prepare it for serving.  So far we've just been getting things ready.  It's time to start running!

Note: Scroll the output from this cell to see the whole process.  The results will be at the bottom.

In [15]:
import tempfile
temp = os.path.join(tempfile.gettempdir(), 'keras')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)







Instructions for updating:
Use ref() instead.


Instructions for updating:
Use ref() instead.


Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


2021-12-04 10:43:07.088016: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:43:07.089022: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


'Counter' object has no attribute 'name'


'Counter' object has no attribute 'name'


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/3dfb612abc894c0ab0ae6895d85b5084/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/3dfb612abc894c0ab0ae6895d85b5084/saved_model.pb


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


'Counter' object has no attribute 'name'


'Counter' object has no attribute 'name'


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/c76371e6c4104068b035f1ba7ac0c160/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/c76371e6c4104068b035f1ba7ac0c160/saved_model.pb












INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2021-12-04 10:43:12.129285: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:43:12.129350: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets written to: /tmp/tmpwtmrrrxa/tftransform_tmp/a447c39aff834eaa8b3df63abd6a0d29/assets


INFO:tensorflow:Assets written to: /tmp/tmpwtmrrrxa/tftransform_tmp/a447c39aff834eaa8b3df63abd6a0d29/assets


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/a447c39aff834eaa8b3df63abd6a0d29/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpwtmrrrxa/tftransform_tmp/a447c39aff834eaa8b3df63abd6a0d29/saved_model.pb


value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2021-12-04 10:43:17.368791: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:43:17.368851: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...


value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore




value: "\n\013\n\tConst_3:0\022\tworkclass"



2021-12-04 10:43:18.716754: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:43:18.716809: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
education (InputLayer)          [(None,)]            0                                            
__________________________________________________________________________________________________
marital-status (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
native-country (InputLayer)     [(None,)]            0                                            
__________________________________________________________________________________________________
occupation (InputLayer)         [(None,)]            0                                            
______________________________________________________________________________________________

  1/255 [..............................] - ETA: 3:29 - loss: 0.6865 - accuracy: 0.6250

 26/255 [==>...........................] - ETA: 0s - loss: 0.6244 - accuracy: 0.7398  

 51/255 [=====>........................] - ETA: 0s - loss: 0.5800 - accuracy: 0.7474

















Epoch 2/16
  1/255 [..............................] - ETA: 0s - loss: 0.3805 - accuracy: 0.8125

 28/255 [==>...........................] - ETA: 0s - loss: 0.3415 - accuracy: 0.8388

 55/255 [=====>........................] - ETA: 0s - loss: 0.3422 - accuracy: 0.8390

















Epoch 3/16
  1/255 [..............................] - ETA: 0s - loss: 0.2260 - accuracy: 0.8984

 27/255 [==>...........................] - ETA: 0s - loss: 0.3245 - accuracy: 0.8461

 53/255 [=====>........................] - ETA: 0s - loss: 0.3279 - accuracy: 0.8465

















Epoch 4/16
  1/255 [..............................] - ETA: 0s - loss: 0.3585 - accuracy: 0.8438

 28/255 [==>...........................] - ETA: 0s - loss: 0.3177 - accuracy: 0.8503

 54/255 [=====>........................] - ETA: 0s - loss: 0.3174 - accuracy: 0.8489

















Epoch 5/16
  1/255 [..............................] - ETA: 0s - loss: 0.2696 - accuracy: 0.8672

 29/255 [==>...........................] - ETA: 0s - loss: 0.3188 - accuracy: 0.8470

 56/255 [=====>........................] - ETA: 0s - loss: 0.3166 - accuracy: 0.8500

















Epoch 6/16
  1/255 [..............................] - ETA: 0s - loss: 0.2152 - accuracy: 0.8750

 28/255 [==>...........................] - ETA: 0s - loss: 0.2969 - accuracy: 0.8531

 55/255 [=====>........................] - ETA: 0s - loss: 0.3029 - accuracy: 0.8538

















Epoch 7/16
  1/255 [..............................] - ETA: 0s - loss: 0.3019 - accuracy: 0.8438

 28/255 [==>...........................] - ETA: 0s - loss: 0.2985 - accuracy: 0.8513

 55/255 [=====>........................] - ETA: 0s - loss: 0.3017 - accuracy: 0.8533

















Epoch 8/16
  1/255 [..............................] - ETA: 0s - loss: 0.3706 - accuracy: 0.8047

 28/255 [==>...........................] - ETA: 0s - loss: 0.3027 - accuracy: 0.8504

 54/255 [=====>........................] - ETA: 0s - loss: 0.2989 - accuracy: 0.8555

















Epoch 9/16
  1/255 [..............................] - ETA: 0s - loss: 0.3020 - accuracy: 0.8359

 27/255 [==>...........................] - ETA: 0s - loss: 0.2756 - accuracy: 0.8722

 52/255 [=====>........................] - ETA: 0s - loss: 0.2831 - accuracy: 0.8672

















Epoch 10/16
  1/255 [..............................] - ETA: 0s - loss: 0.3941 - accuracy: 0.8438

 28/255 [==>...........................] - ETA: 0s - loss: 0.3108 - accuracy: 0.8540

 55/255 [=====>........................] - ETA: 0s - loss: 0.3054 - accuracy: 0.8556

















Epoch 11/16
  1/255 [..............................] - ETA: 0s - loss: 0.3224 - accuracy: 0.8516

 27/255 [==>...........................] - ETA: 0s - loss: 0.3029 - accuracy: 0.8520

 52/255 [=====>........................] - ETA: 0s - loss: 0.2973 - accuracy: 0.8536

















Epoch 12/16
  1/255 [..............................] - ETA: 0s - loss: 0.2806 - accuracy: 0.8438

 28/255 [==>...........................] - ETA: 0s - loss: 0.2851 - accuracy: 0.8572

 55/255 [=====>........................] - ETA: 0s - loss: 0.2905 - accuracy: 0.8563

















Epoch 13/16
  1/255 [..............................] - ETA: 0s - loss: 0.2458 - accuracy: 0.8828

 27/255 [==>...........................] - ETA: 0s - loss: 0.2951 - accuracy: 0.8588

 54/255 [=====>........................] - ETA: 0s - loss: 0.2903 - accuracy: 0.8612

















Epoch 14/16
  1/255 [..............................] - ETA: 0s - loss: 0.2696 - accuracy: 0.8672

 28/255 [==>...........................] - ETA: 0s - loss: 0.2661 - accuracy: 0.8772

 54/255 [=====>........................] - ETA: 0s - loss: 0.2691 - accuracy: 0.8755

















Epoch 15/16
  1/255 [..............................] - ETA: 0s - loss: 0.2345 - accuracy: 0.9062

 28/255 [==>...........................] - ETA: 0s - loss: 0.2568 - accuracy: 0.8839

 55/255 [=====>........................] - ETA: 0s - loss: 0.2620 - accuracy: 0.8805

















Epoch 16/16
  1/255 [..............................] - ETA: 0s - loss: 0.3131 - accuracy: 0.8047

 28/255 [==>...........................] - ETA: 0s - loss: 0.2761 - accuracy: 0.8651

 55/255 [=====>........................] - ETA: 0s - loss: 0.2726 - accuracy: 0.8699

















INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2021-12-04 10:43:37.584301: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.


INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets


INFO:tensorflow:Assets written to: /tmp/keras/exported_model_dir/1/assets


    1/16281 [..............................] - ETA: 27:21 - loss: 0.2994 - accuracy: 0.8750

   38/16281 [..............................] - ETA: 22s - loss: 0.3561 - accuracy: 0.8388  

   73/16281 [..............................] - ETA: 23s - loss: 0.3438 - accuracy: 0.8486

  112/16281 [..............................] - ETA: 22s - loss: 0.3465 - accuracy: 0.8506

  152/16281 [..............................] - ETA: 21s - loss: 0.3488 - accuracy: 0.8489

  191/16281 [..............................] - ETA: 21s - loss: 0.3457 - accuracy: 0.8501

  232/16281 [..............................] - ETA: 21s - loss: 0.3468 - accuracy: 0.8498

  271/16281 [..............................] - ETA: 21s - loss: 0.3465 - accuracy: 0.8487

  312/16281 [..............................] - ETA: 20s - loss: 0.3468 - accuracy: 0.8495

  348/16281 [..............................] - ETA: 21s - loss: 0.3476 - accuracy: 0.8489

  385/16281 [..............................] - ETA: 21s - loss: 0.3468 - accuracy: 0.8492

  423/16281 [..............................] - ETA: 21s - loss: 0.3467 - accuracy: 0.8487

  460/16281 [..............................] - ETA: 21s - loss: 0.3465 - accuracy: 0.8489

  498/16281 [..............................] - ETA: 20s - loss: 0.3469 - accuracy: 0.8491

  536/16281 [..............................] - ETA: 20s - loss: 0.3462 - accuracy: 0.8495

  574/16281 [>.............................] - ETA: 20s - loss: 0.3465 - accuracy: 0.8493

  612/16281 [>.............................] - ETA: 20s - loss: 0.3469 - accuracy: 0.8490

  650/16281 [>.............................] - ETA: 20s - loss: 0.3466 - accuracy: 0.8492

  689/16281 [>.............................] - ETA: 20s - loss: 0.3464 - accuracy: 0.8493

  726/16281 [>.............................] - ETA: 20s - loss: 0.3475 - accuracy: 0.8490



  764/16281 [>.............................] - ETA: 20s - loss: 0.3471 - accuracy: 0.8490

  804/16281 [>.............................] - ETA: 20s - loss: 0.3471 - accuracy: 0.8489

  845/16281 [>.............................] - ETA: 20s - loss: 0.3471 - accuracy: 0.8488

  884/16281 [>.............................] - ETA: 20s - loss: 0.3467 - accuracy: 0.8492

  923/16281 [>.............................] - ETA: 20s - loss: 0.3471 - accuracy: 0.8490

  962/16281 [>.............................] - ETA: 20s - loss: 0.3470 - accuracy: 0.8492

 1001/16281 [>.............................] - ETA: 20s - loss: 0.3470 - accuracy: 0.8493

 1043/16281 [>.............................] - ETA: 20s - loss: 0.3471 - accuracy: 0.8490

 1081/16281 [>.............................] - ETA: 19s - loss: 0.3477 - accuracy: 0.8488

 1122/16281 [=>............................] - ETA: 19s - loss: 0.3470 - accuracy: 0.8491

 1162/16281 [=>............................] - ETA: 19s - loss: 0.3466 - accuracy: 0.8491

 1202/16281 [=>............................] - ETA: 19s - loss: 0.3466 - accuracy: 0.8491

 1241/16281 [=>............................] - ETA: 19s - loss: 0.3466 - accuracy: 0.8491

 1281/16281 [=>............................] - ETA: 19s - loss: 0.3471 - accuracy: 0.8490

 1320/16281 [=>............................] - ETA: 19s - loss: 0.3471 - accuracy: 0.8491

 1358/16281 [=>............................] - ETA: 19s - loss: 0.3470 - accuracy: 0.8492

 1398/16281 [=>............................] - ETA: 19s - loss: 0.3469 - accuracy: 0.8492

 1439/16281 [=>............................] - ETA: 19s - loss: 0.3472 - accuracy: 0.8490

 1480/16281 [=>............................] - ETA: 19s - loss: 0.3472 - accuracy: 0.8490

 1520/16281 [=>............................] - ETA: 19s - loss: 0.3470 - accuracy: 0.8491

 1560/16281 [=>............................] - ETA: 19s - loss: 0.3468 - accuracy: 0.8490

 1600/16281 [=>............................] - ETA: 19s - loss: 0.3468 - accuracy: 0.8491

 1640/16281 [==>...........................] - ETA: 19s - loss: 0.3470 - accuracy: 0.8490

 1678/16281 [==>...........................] - ETA: 18s - loss: 0.3471 - accuracy: 0.8491

 1719/16281 [==>...........................] - ETA: 18s - loss: 0.3472 - accuracy: 0.8491

 1759/16281 [==>...........................] - ETA: 18s - loss: 0.3472 - accuracy: 0.8491

 1799/16281 [==>...........................] - ETA: 18s - loss: 0.3469 - accuracy: 0.8491

 1839/16281 [==>...........................] - ETA: 18s - loss: 0.3466 - accuracy: 0.8492

 1877/16281 [==>...........................] - ETA: 18s - loss: 0.3467 - accuracy: 0.8492

 1917/16281 [==>...........................] - ETA: 18s - loss: 0.3469 - accuracy: 0.8491

 1957/16281 [==>...........................] - ETA: 18s - loss: 0.3470 - accuracy: 0.8490

 1997/16281 [==>...........................] - ETA: 18s - loss: 0.3472 - accuracy: 0.8491

 2037/16281 [==>...........................] - ETA: 18s - loss: 0.3470 - accuracy: 0.8491

 2076/16281 [==>...........................] - ETA: 18s - loss: 0.3471 - accuracy: 0.8490

 2117/16281 [==>...........................] - ETA: 18s - loss: 0.3470 - accuracy: 0.8491

 2156/16281 [==>...........................] - ETA: 18s - loss: 0.3470 - accuracy: 0.8491

 2195/16281 [===>..........................] - ETA: 18s - loss: 0.3471 - accuracy: 0.8490

 2233/16281 [===>..........................] - ETA: 18s - loss: 0.3472 - accuracy: 0.8491

 2271/16281 [===>..........................] - ETA: 18s - loss: 0.3472 - accuracy: 0.8491

 2311/16281 [===>..........................] - ETA: 18s - loss: 0.3469 - accuracy: 0.8491

 2351/16281 [===>..........................] - ETA: 18s - loss: 0.3471 - accuracy: 0.8490

 2390/16281 [===>..........................] - ETA: 17s - loss: 0.3471 - accuracy: 0.8491

 2428/16281 [===>..........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2466/16281 [===>..........................] - ETA: 17s - loss: 0.3471 - accuracy: 0.8490

 2506/16281 [===>..........................] - ETA: 17s - loss: 0.3472 - accuracy: 0.8491

 2545/16281 [===>..........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2583/16281 [===>..........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2620/16281 [===>..........................] - ETA: 17s - loss: 0.3469 - accuracy: 0.8491

 2659/16281 [===>..........................] - ETA: 17s - loss: 0.3468 - accuracy: 0.8491

 2698/16281 [===>..........................] - ETA: 17s - loss: 0.3469 - accuracy: 0.8491

 2738/16281 [====>.........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2777/16281 [====>.........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2816/16281 [====>.........................] - ETA: 17s - loss: 0.3471 - accuracy: 0.8490

 2856/16281 [====>.........................] - ETA: 17s - loss: 0.3471 - accuracy: 0.8490

 2897/16281 [====>.........................] - ETA: 17s - loss: 0.3471 - accuracy: 0.8491

 2937/16281 [====>.........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 2975/16281 [====>.........................] - ETA: 17s - loss: 0.3468 - accuracy: 0.8492

 3015/16281 [====>.........................] - ETA: 17s - loss: 0.3469 - accuracy: 0.8492

 3053/16281 [====>.........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 3094/16281 [====>.........................] - ETA: 17s - loss: 0.3468 - accuracy: 0.8491

 3135/16281 [====>.........................] - ETA: 17s - loss: 0.3470 - accuracy: 0.8491

 3176/16281 [====>.........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3216/16281 [====>.........................] - ETA: 16s - loss: 0.3471 - accuracy: 0.8490

 3255/16281 [====>.........................] - ETA: 16s - loss: 0.3471 - accuracy: 0.8490

 3294/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3332/16281 [=====>........................] - ETA: 16s - loss: 0.3469 - accuracy: 0.8491

 3369/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3406/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3442/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3477/16281 [=====>........................] - ETA: 16s - loss: 0.3469 - accuracy: 0.8491

 3515/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3553/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3591/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3629/16281 [=====>........................] - ETA: 16s - loss: 0.3470 - accuracy: 0.8491

 3668/16281 [=====>........................] - ETA: 16s - loss: 0.3471 - accuracy: 0.8491

 3707/16281 [=====>........................] - ETA: 16s - loss: 0.3469 - accuracy: 0.8491

 3746/16281 [=====>........................] - ETA: 16s - loss: 0.3469 - accuracy: 0.8491

 3783/16281 [=====>........................] - ETA: 16s - loss: 0.3468 - accuracy: 0.8492





























































































































































































































































































































































































































































































































































































































































{'accuracy': 0.8490878939628601, 'loss': 0.34699547290802}


## (Optional) Using our preprocessed data to train a model using tf.estimator

If you would rather use an Estimator model instead of a Keras model, the code
in this section shows how to do that.

###Create an input function for training

In [16]:
def _make_training_input_fn(tf_transform_output, transformed_examples,
                            batch_size):
  """Creates an input function reading from transformed data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.
    transformed_examples: Base filename of examples.
    batch_size: Batch size.

  Returns:
    The input function for training or eval.
  """
  def input_fn():
    """Input function for training and eval."""
    dataset = tf.data.experimental.make_batched_features_dataset(
        file_pattern=transformed_examples,
        batch_size=batch_size,
        features=tf_transform_output.transformed_feature_spec(),
        reader=tf.data.TFRecordDataset,
        shuffle=True)

    transformed_features = tf.compat.v1.data.make_one_shot_iterator(
        dataset).get_next()

    # Extract features and label from the transformed tensors.
    transformed_labels = tf.where(
        tf.equal(transformed_features.pop(LABEL_KEY), 1))

    return transformed_features, transformed_labels[:,1]

  return input_fn

###Create an input function for serving

Let's create an input function that we could use in production, and prepare our trained model for serving.

In [17]:
def _make_serving_input_fn(tf_transform_output):
  """Creates an input function reading from raw data.

  Args:
    tf_transform_output: Wrapper around output of tf.Transform.

  Returns:
    The serving input function.
  """
  raw_feature_spec = RAW_DATA_FEATURE_SPEC.copy()
  # Remove label since it is not available during serving.
  raw_feature_spec.pop(LABEL_KEY)

  def serving_input_fn():
    """Input function for serving."""
    # Get raw features by generating the basic serving input_fn and calling it.
    # Here we generate an input_fn that expects a parsed Example proto to be fed
    # to the model at serving time.  See also
    # tf.estimator.export.build_raw_serving_input_receiver_fn.
    raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
        raw_feature_spec, default_batch_size=None)
    serving_input_receiver = raw_input_fn()

    # Apply the transform function that was used to generate the materialized
    # data.
    raw_features = serving_input_receiver.features
    transformed_features = tf_transform_output.transform_raw_features(
        raw_features)

    return tf.estimator.export.ServingInputReceiver(
        transformed_features, serving_input_receiver.receiver_tensors)

  return serving_input_fn

###Wrap our input data in FeatureColumns
Our model will expect our data in TensorFlow FeatureColumns.

In [18]:
def get_feature_columns(tf_transform_output):
  """Returns the FeatureColumns for the model.

  Args:
    tf_transform_output: A `TFTransformOutput` object.

  Returns:
    A list of FeatureColumns.
  """
  # Wrap scalars as real valued columns.
  real_valued_columns = [tf.feature_column.numeric_column(key, shape=())
                         for key in NUMERIC_FEATURE_KEYS]

  # Wrap categorical columns.
  one_hot_columns = [
      tf.feature_column.indicator_column(
          tf.feature_column.categorical_column_with_identity(
              key=key,
              num_buckets=(NUM_OOV_BUCKETS +
                  tf_transform_output.vocabulary_size_by_name(
                      vocab_filename=key))))
      for key in CATEGORICAL_FEATURE_KEYS]

  return real_valued_columns + one_hot_columns

###Train, Evaluate, and Export our model

In [19]:
def train_and_evaluate(working_dir, num_train_instances=NUM_TRAIN_INSTANCES,
                       num_test_instances=NUM_TEST_INSTANCES):
  """Train the model on training data and evaluate on test data.

  Args:
    working_dir: Directory to read transformed data and metadata from and to
        write exported model to.
    num_train_instances: Number of instances in train set
    num_test_instances: Number of instances in test set

  Returns:
    The results from the estimator's 'evaluate' method
  """
  tf_transform_output = tft.TFTransformOutput(working_dir)

  run_config = tf.estimator.RunConfig()

  estimator = tf.estimator.LinearClassifier(
      feature_columns=get_feature_columns(tf_transform_output),
      config=run_config,
      loss_reduction=tf.losses.Reduction.SUM)

  # Fit the model using the default optimizer.
  train_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TRAIN_DATA_FILEBASE + '*'),
      batch_size=TRAIN_BATCH_SIZE)
  estimator.train(
      input_fn=train_input_fn,
      max_steps=TRAIN_NUM_EPOCHS * num_train_instances / TRAIN_BATCH_SIZE)

  # Evaluate model on test dataset.
  eval_input_fn = _make_training_input_fn(
      tf_transform_output,
      os.path.join(working_dir, TRANSFORMED_TEST_DATA_FILEBASE + '*'),
      batch_size=1)

  # Export the model.
  serving_input_fn = _make_serving_input_fn(tf_transform_output)
  exported_model_dir = os.path.join(working_dir, EXPORTED_MODEL_DIR)
  estimator.export_saved_model(exported_model_dir, serving_input_fn)

  return estimator.evaluate(input_fn=eval_input_fn, steps=num_test_instances)

###Put it all together
We've created all the stuff we need to preprocess our census data, train a model, and prepare it for serving.  So far we've just been getting things ready.  It's time to start running!

Note: Scroll the output from this cell to see the whole process.  The results will be at the bottom.

In [20]:
import tempfile
temp = os.path.join(tempfile.gettempdir(), 'estimator')

transform_data(train, test, temp)
results = train_and_evaluate(temp)
pprint.pprint(results)





INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


'Counter' object has no attribute 'name'


'Counter' object has no attribute 'name'


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/a7f3726df5bf498ca24bd528eebca9e9/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/a7f3726df5bf498ca24bd528eebca9e9/saved_model.pb


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:No assets to write.


INFO:tensorflow:No assets to write.


'Counter' object has no attribute 'name'


'Counter' object has no attribute 'name'


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/3466a3517ec243a39102fa6ad6e5fec2/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/3466a3517ec243a39102fa6ad6e5fec2/saved_model.pb












INFO:tensorflow:Saver not created because there are no variables in the graph to restore


2021-12-04 10:44:05.733070: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:05.733123: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets written to: /tmp/tmpi7o66bl8/tftransform_tmp/96186aa415404f0884cb3766b270b9b2/assets


INFO:tensorflow:Assets written to: /tmp/tmpi7o66bl8/tftransform_tmp/96186aa415404f0884cb3766b270b9b2/assets


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/96186aa415404f0884cb3766b270b9b2/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/tmpi7o66bl8/tftransform_tmp/96186aa415404f0884cb3766b270b9b2/saved_model.pb


value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


value: "\n\013\n\tConst_3:0\022\tworkclass"



2021-12-04 10:44:10.983401: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:10.983461: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


value: "\n\013\n\tConst_3:0\022\tworkclass"



2021-12-04 10:44:12.469671: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:12.469756: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore






INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpwufx88ji', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpwufx88ji', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_experimental_max_worker_delay_secs': None, '_session_creation_timeout_secs': 7200, '_checkpoint_save_graph_def': True, '_service': None, '_cluster_spec': ClusterSpec({}), '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.




Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Create CheckpointSaverHook.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Running local_init_op.


2021-12-04 10:44:15.191355: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:15.191419: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...


INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpwufx88ji/model.ckpt.


INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmpwufx88ji/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...


INFO:tensorflow:loss = 88.72284, step = 0


INFO:tensorflow:loss = 88.72284, step = 0


INFO:tensorflow:global_step/sec: 432.87


INFO:tensorflow:global_step/sec: 432.87


INFO:tensorflow:loss = 33.484627, step = 100 (0.233 sec)


INFO:tensorflow:loss = 33.484627, step = 100 (0.233 sec)


INFO:tensorflow:global_step/sec: 764.774


INFO:tensorflow:global_step/sec: 764.774


INFO:tensorflow:loss = 42.72283, step = 200 (0.130 sec)


INFO:tensorflow:loss = 42.72283, step = 200 (0.130 sec)


INFO:tensorflow:global_step/sec: 763.549


INFO:tensorflow:global_step/sec: 763.549


INFO:tensorflow:loss = 55.91174, step = 300 (0.131 sec)


INFO:tensorflow:loss = 55.91174, step = 300 (0.131 sec)


INFO:tensorflow:global_step/sec: 755.175


INFO:tensorflow:global_step/sec: 755.175


INFO:tensorflow:loss = 39.204643, step = 400 (0.133 sec)


INFO:tensorflow:loss = 39.204643, step = 400 (0.133 sec)


INFO:tensorflow:global_step/sec: 792.262


INFO:tensorflow:global_step/sec: 792.262


INFO:tensorflow:loss = 41.268295, step = 500 (0.126 sec)


INFO:tensorflow:loss = 41.268295, step = 500 (0.126 sec)


INFO:tensorflow:global_step/sec: 743.725


INFO:tensorflow:global_step/sec: 743.725


INFO:tensorflow:loss = 51.267006, step = 600 (0.135 sec)


INFO:tensorflow:loss = 51.267006, step = 600 (0.135 sec)


INFO:tensorflow:global_step/sec: 806.716


INFO:tensorflow:global_step/sec: 806.716


INFO:tensorflow:loss = 42.03744, step = 700 (0.124 sec)


INFO:tensorflow:loss = 42.03744, step = 700 (0.124 sec)


INFO:tensorflow:global_step/sec: 763.135


INFO:tensorflow:global_step/sec: 763.135


INFO:tensorflow:loss = 42.66994, step = 800 (0.131 sec)


INFO:tensorflow:loss = 42.66994, step = 800 (0.131 sec)


INFO:tensorflow:global_step/sec: 779.496


INFO:tensorflow:global_step/sec: 779.496


INFO:tensorflow:loss = 48.643982, step = 900 (0.129 sec)


INFO:tensorflow:loss = 48.643982, step = 900 (0.129 sec)


INFO:tensorflow:global_step/sec: 787.431


INFO:tensorflow:global_step/sec: 787.431


INFO:tensorflow:loss = 41.668102, step = 1000 (0.127 sec)


INFO:tensorflow:loss = 41.668102, step = 1000 (0.127 sec)


INFO:tensorflow:global_step/sec: 737.697


INFO:tensorflow:global_step/sec: 737.697


INFO:tensorflow:loss = 40.340927, step = 1100 (0.135 sec)


INFO:tensorflow:loss = 40.340927, step = 1100 (0.135 sec)


INFO:tensorflow:global_step/sec: 755.647


INFO:tensorflow:global_step/sec: 755.647


INFO:tensorflow:loss = 31.146494, step = 1200 (0.133 sec)


INFO:tensorflow:loss = 31.146494, step = 1200 (0.133 sec)


INFO:tensorflow:global_step/sec: 785.653


INFO:tensorflow:global_step/sec: 785.653


INFO:tensorflow:loss = 30.96864, step = 1300 (0.127 sec)


INFO:tensorflow:loss = 30.96864, step = 1300 (0.127 sec)


INFO:tensorflow:global_step/sec: 759.461


INFO:tensorflow:global_step/sec: 759.461


INFO:tensorflow:loss = 38.621964, step = 1400 (0.132 sec)


INFO:tensorflow:loss = 38.621964, step = 1400 (0.132 sec)


INFO:tensorflow:global_step/sec: 777.328


INFO:tensorflow:global_step/sec: 777.328


INFO:tensorflow:loss = 44.518555, step = 1500 (0.129 sec)


INFO:tensorflow:loss = 44.518555, step = 1500 (0.129 sec)


INFO:tensorflow:global_step/sec: 741.005


INFO:tensorflow:global_step/sec: 741.005


INFO:tensorflow:loss = 45.997204, step = 1600 (0.135 sec)


INFO:tensorflow:loss = 45.997204, step = 1600 (0.135 sec)


INFO:tensorflow:global_step/sec: 734.846


INFO:tensorflow:global_step/sec: 734.846


INFO:tensorflow:loss = 50.39132, step = 1700 (0.136 sec)


INFO:tensorflow:loss = 50.39132, step = 1700 (0.136 sec)


INFO:tensorflow:global_step/sec: 752.826


INFO:tensorflow:global_step/sec: 752.826


INFO:tensorflow:loss = 45.41472, step = 1800 (0.133 sec)


INFO:tensorflow:loss = 45.41472, step = 1800 (0.133 sec)


INFO:tensorflow:global_step/sec: 757.018


INFO:tensorflow:global_step/sec: 757.018


INFO:tensorflow:loss = 46.133186, step = 1900 (0.132 sec)


INFO:tensorflow:loss = 46.133186, step = 1900 (0.132 sec)


INFO:tensorflow:global_step/sec: 700.757


INFO:tensorflow:global_step/sec: 700.757


INFO:tensorflow:loss = 34.684982, step = 2000 (0.143 sec)


INFO:tensorflow:loss = 34.684982, step = 2000 (0.143 sec)


INFO:tensorflow:global_step/sec: 741.709


INFO:tensorflow:global_step/sec: 741.709


INFO:tensorflow:loss = 39.637863, step = 2100 (0.135 sec)


INFO:tensorflow:loss = 39.637863, step = 2100 (0.135 sec)


INFO:tensorflow:global_step/sec: 772.066


INFO:tensorflow:global_step/sec: 772.066


INFO:tensorflow:loss = 45.70813, step = 2200 (0.129 sec)


INFO:tensorflow:loss = 45.70813, step = 2200 (0.129 sec)


INFO:tensorflow:global_step/sec: 776.263


INFO:tensorflow:global_step/sec: 776.263


INFO:tensorflow:loss = 39.104668, step = 2300 (0.129 sec)


INFO:tensorflow:loss = 39.104668, step = 2300 (0.129 sec)


INFO:tensorflow:global_step/sec: 768.016


INFO:tensorflow:global_step/sec: 768.016


INFO:tensorflow:loss = 36.262817, step = 2400 (0.130 sec)


INFO:tensorflow:loss = 36.262817, step = 2400 (0.130 sec)


INFO:tensorflow:global_step/sec: 754.04


INFO:tensorflow:global_step/sec: 754.04


INFO:tensorflow:loss = 43.80282, step = 2500 (0.132 sec)


INFO:tensorflow:loss = 43.80282, step = 2500 (0.132 sec)


INFO:tensorflow:global_step/sec: 742.917


INFO:tensorflow:global_step/sec: 742.917


INFO:tensorflow:loss = 48.113125, step = 2600 (0.135 sec)


INFO:tensorflow:loss = 48.113125, step = 2600 (0.135 sec)


INFO:tensorflow:global_step/sec: 753.394


INFO:tensorflow:global_step/sec: 753.394


INFO:tensorflow:loss = 43.442005, step = 2700 (0.133 sec)


INFO:tensorflow:loss = 43.442005, step = 2700 (0.133 sec)


INFO:tensorflow:global_step/sec: 768.985


INFO:tensorflow:global_step/sec: 768.985


INFO:tensorflow:loss = 34.593086, step = 2800 (0.130 sec)


INFO:tensorflow:loss = 34.593086, step = 2800 (0.130 sec)


INFO:tensorflow:global_step/sec: 756.393


INFO:tensorflow:global_step/sec: 756.393


INFO:tensorflow:loss = 38.085594, step = 2900 (0.132 sec)


INFO:tensorflow:loss = 38.085594, step = 2900 (0.132 sec)


INFO:tensorflow:global_step/sec: 792.717


INFO:tensorflow:global_step/sec: 792.717


INFO:tensorflow:loss = 42.41484, step = 3000 (0.126 sec)


INFO:tensorflow:loss = 42.41484, step = 3000 (0.126 sec)


INFO:tensorflow:global_step/sec: 763.25


INFO:tensorflow:global_step/sec: 763.25


INFO:tensorflow:loss = 42.457626, step = 3100 (0.131 sec)


INFO:tensorflow:loss = 42.457626, step = 3100 (0.131 sec)


INFO:tensorflow:global_step/sec: 747.998


INFO:tensorflow:global_step/sec: 747.998


INFO:tensorflow:loss = 52.64791, step = 3200 (0.134 sec)


INFO:tensorflow:loss = 52.64791, step = 3200 (0.134 sec)


INFO:tensorflow:global_step/sec: 733.804


INFO:tensorflow:global_step/sec: 733.804


INFO:tensorflow:loss = 36.78949, step = 3300 (0.136 sec)


INFO:tensorflow:loss = 36.78949, step = 3300 (0.136 sec)


INFO:tensorflow:global_step/sec: 747.473


INFO:tensorflow:global_step/sec: 747.473


INFO:tensorflow:loss = 43.02353, step = 3400 (0.134 sec)


INFO:tensorflow:loss = 43.02353, step = 3400 (0.134 sec)


INFO:tensorflow:global_step/sec: 766.967


INFO:tensorflow:global_step/sec: 766.967


INFO:tensorflow:loss = 42.971584, step = 3500 (0.131 sec)


INFO:tensorflow:loss = 42.971584, step = 3500 (0.131 sec)


INFO:tensorflow:global_step/sec: 759.238


INFO:tensorflow:global_step/sec: 759.238


INFO:tensorflow:loss = 31.898714, step = 3600 (0.133 sec)


INFO:tensorflow:loss = 31.898714, step = 3600 (0.133 sec)


INFO:tensorflow:global_step/sec: 770.209


INFO:tensorflow:global_step/sec: 770.209


INFO:tensorflow:loss = 43.47151, step = 3700 (0.128 sec)


INFO:tensorflow:loss = 43.47151, step = 3700 (0.128 sec)


INFO:tensorflow:global_step/sec: 750.127


INFO:tensorflow:global_step/sec: 750.127


INFO:tensorflow:loss = 40.073875, step = 3800 (0.133 sec)


INFO:tensorflow:loss = 40.073875, step = 3800 (0.133 sec)


INFO:tensorflow:global_step/sec: 731.607


INFO:tensorflow:global_step/sec: 731.607


INFO:tensorflow:loss = 33.494003, step = 3900 (0.137 sec)


INFO:tensorflow:loss = 33.494003, step = 3900 (0.137 sec)


INFO:tensorflow:global_step/sec: 753.01


INFO:tensorflow:global_step/sec: 753.01


INFO:tensorflow:loss = 40.401936, step = 4000 (0.133 sec)


INFO:tensorflow:loss = 40.401936, step = 4000 (0.133 sec)


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...


INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 4071...


INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpwufx88ji/model.ckpt.


INFO:tensorflow:Saving checkpoints for 4071 into /tmp/tmpwufx88ji/model.ckpt.


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...


INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 4071...


INFO:tensorflow:Loss for final step: 51.911263.


INFO:tensorflow:Loss for final step: 51.911263.


value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_3:0\022\tworkclass"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_5:0\022\teducation"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_7:0\022\016marital-status"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\013\n\tConst_9:0\022\noccupation"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_11:0\022\014relationship"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_13:0\022\004race"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_15:0\022\003sex"



value: "\n\014\n\nConst_17:0\022\016native-country"



value: "\n\014\n\nConst_17:0\022\016native-country"



INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Saver not created because there are no variables in the graph to restore


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']


INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']


INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']


INFO:tensorflow:Signatures INCLUDED in export for Regress: ['regression']


INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']


INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Train: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: None


INFO:tensorflow:Signatures INCLUDED in export for Eval: None


INFO:tensorflow:Restoring parameters from /tmp/tmpwufx88ji/model.ckpt-4071


2021-12-04 10:44:22.080737: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:22.080796: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO:tensorflow:Restoring parameters from /tmp/tmpwufx88ji/model.ckpt-4071


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets added to graph.


INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1638614661/assets


INFO:tensorflow:Assets written to: /tmp/estimator/exported_model_dir/temp-1638614661/assets


INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1638614661/saved_model.pb


INFO:tensorflow:SavedModel written to: /tmp/estimator/exported_model_dir/temp-1638614661/saved_model.pb


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Done calling model_fn.


INFO:tensorflow:Starting evaluation at 2021-12-04T10:44:23Z


INFO:tensorflow:Starting evaluation at 2021-12-04T10:44:23Z


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Graph was finalized.


INFO:tensorflow:Restoring parameters from /tmp/tmpwufx88ji/model.ckpt-4071


2021-12-04 10:44:23.300547: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory
2021-12-04 10:44:23.300668: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
INFO:tensorflow:Restoring parameters from /tmp/tmpwufx88ji/model.ckpt-4071


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Done running local_init_op.


INFO:tensorflow:Evaluation [1628/16281]


INFO:tensorflow:Evaluation [1628/16281]


INFO:tensorflow:Evaluation [3256/16281]


INFO:tensorflow:Evaluation [3256/16281]


INFO:tensorflow:Evaluation [4884/16281]


INFO:tensorflow:Evaluation [4884/16281]


INFO:tensorflow:Evaluation [6512/16281]


INFO:tensorflow:Evaluation [6512/16281]


INFO:tensorflow:Evaluation [8140/16281]


INFO:tensorflow:Evaluation [8140/16281]


INFO:tensorflow:Evaluation [9768/16281]


INFO:tensorflow:Evaluation [9768/16281]


INFO:tensorflow:Evaluation [11396/16281]


INFO:tensorflow:Evaluation [11396/16281]


INFO:tensorflow:Evaluation [13024/16281]


INFO:tensorflow:Evaluation [13024/16281]


INFO:tensorflow:Evaluation [14652/16281]


INFO:tensorflow:Evaluation [14652/16281]


INFO:tensorflow:Evaluation [16280/16281]


INFO:tensorflow:Evaluation [16280/16281]


INFO:tensorflow:Evaluation [16281/16281]


INFO:tensorflow:Evaluation [16281/16281]


INFO:tensorflow:Inference Time : 12.76048s


INFO:tensorflow:Inference Time : 12.76048s


INFO:tensorflow:Finished evaluation at 2021-12-04-10:44:35


INFO:tensorflow:Finished evaluation at 2021-12-04-10:44:35


INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.85123765, accuracy_baseline = 0.76377374, auc = 0.9019859, auc_precision_recall = 0.9672531, average_loss = 0.32398567, global_step = 4071, label/mean = 0.76377374, loss = 0.32398567, precision = 0.8828477, prediction/mean = 0.75662553, recall = 0.9284278


INFO:tensorflow:Saving dict for global step 4071: accuracy = 0.85123765, accuracy_baseline = 0.76377374, auc = 0.9019859, auc_precision_recall = 0.9672531, average_loss = 0.32398567, global_step = 4071, label/mean = 0.76377374, loss = 0.32398567, precision = 0.8828477, prediction/mean = 0.75662553, recall = 0.9284278


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpwufx88ji/model.ckpt-4071


INFO:tensorflow:Saving 'checkpoint_path' summary for global step 4071: /tmp/tmpwufx88ji/model.ckpt-4071


{'accuracy': 0.85123765,
 'accuracy_baseline': 0.76377374,
 'auc': 0.9019859,
 'auc_precision_recall': 0.9672531,
 'average_loss': 0.32398567,
 'global_step': 4071,
 'label/mean': 0.76377374,
 'loss': 0.32398567,
 'precision': 0.8828477,
 'prediction/mean': 0.75662553,
 'recall': 0.9284278}


##What we did
In this example we used `tf.Transform` to preprocess a dataset of census data, and train a model with the cleaned and transformed data.  We also created an input function that we could use when we deploy our trained model in a production environment to perform inference.  By using the same code for both training and inference we avoid any issues with data skew.  Along the way we learned about creating an Apache Beam transform to perform the transformation that we needed for cleaning the data. We also saw how to use this transformed data to train a model using either `tf.keras` or `tf.estimator`.  This is just a small piece of what TensorFlow Transform can do!  We encourage you to dive into `tf.Transform` and discover what it can do for you.