# Product recommendation with Google Tensorflow 
#### Dataset download > 
* #### [Instacart](https://www.kaggle.com/c/instacart-market-basket-analysis)

#### Concepts, tools, libraries used >
* #### [Wide & Deep](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html)
* #### [Tensorflow](https://www.tensorflow.org/)
* #### [Petastorm](https://github.com/uber/petastorm)
* #### [Hyperopt](https://github.com/hyperopt/hyperopt)
* #### [MLFlow](https://mlflow.org/)


This is a series of three notebooks. This is notebook #3. The purpose of this notebook is to train, evaluate & deploy a "Wide & Deep" collaborative filter recommender using features engineered in the prior notebook.  
This notebook was run on a Synapse CPU based pool. One step, for hyperparameter search, takes few hours, you may get faster results with GPU based pools.
Before you run this notebook on Synapse, you need to define packages needed (Tensorflow, Petastorm etc) in a separate file and load them.

In [1]:
import pyspark.sql.functions as f
from pyspark.sql.types import *

import tensorflow as tf
from tensorflow.python.saved_model import tag_constants

from petastorm.spark import SparkDatasetConverter, make_spark_converter
from petastorm import TransformSpec

from hyperopt import hp, fmin, tpe, SparkTrials, STATUS_OK, space_eval

import mlflow
from mlflow.tracking import MlflowClient

import platform

import numpy as np
import pandas as pd

import datetime
import os
import requests

StatementMeta(CPUPool, 16, 1, Finished, Available)

## Step 1: Prepare the Data

In our last notebook, both user and product features were prepared along with labels indicating whether a specific user-product combination was purchased in our training period.  Here, we will retrieve those data, combining them for input into our model:

In [2]:
# retrieve features and labels
product_features = spark.table('instacart.product_features')
user_features = spark.table('instacart.user_features')
labels = spark.table('instacart.labels')

# assemble full feature set
labeled_features = (
  labels
  .join(product_features, on='product_id')
  .join(user_features, on='user_id')
  )

# display results
display(labeled_features.limit(10))

StatementMeta(CPUPool, 16, 2, Finished, Available)

SynapseWidget(Synapse.DataFrame, 9c543449-07ae-40f7-9b4d-2f7abf2c73bd)

Because of the large number of features, we'll need to capture some metadata on our fields.  This metadata will help us setup our data inputs in later steps:

In [3]:
# identify label column
label_col = 'label'

# identify categorical feature columns
cat_features = ['aisle_id','department_id','user_id','product_id']

# capture keys for each of the categorical feature columns
cat_keys={}
for col in cat_features:
  cat_keys[col] = (
    labeled_features
      .selectExpr('{0} as key'.format(col))
      .distinct()
      .orderBy('key')
      .groupBy()
        .agg(f.collect_list('key').alias('keys'))
      .collect()[0]['keys']
    )

# all other columns (except id) are continous features
num_features = labeled_features.drop(*(['id',label_col]+cat_features)).columns

StatementMeta(CPUPool, 16, 3, Finished, Available)

Now we can split our data into training, validation & testing sets.  We pre-split this here versus dynamically splitting so that we might perform a stratified sample on the label.  The stratified sample will help ensure the under-represented positive class (indicating a specific product was purchased in the training period) is consistently present in our data splits:

In [4]:
instance_count = labeled_features.count()
positive_count = labels.filter(f.expr('label=1')).count()

print('{0:.2f}% positive class across {1} instances'.format(100 * positive_count/instance_count, instance_count))

StatementMeta(CPUPool, 16, 4, Finished, Available)

9.99% positive class across 13863737 instances


In [5]:
# fraction to hold for training
train_fraction = 0.6

# sample data, stratifying on labels, for training
train = (
  labeled_features
    .sampleBy(label_col, fractions={0: train_fraction, 1: train_fraction})
  )

# split remaining data into validation & testing datasets (with same stratification)
valid = (
  labeled_features
    .join(train, on='id', how='leftanti') # not in()
    .sampleBy(label_col, fractions={0:0.5, 1:0.5})
  )

test = (
  labeled_features
    .join(train, on='id', how='leftanti') # not in()
    .join(valid, on='id', how='leftanti') # not in()
  )

StatementMeta(CPUPool, 16, 5, Finished, Available)

The training, validation & testing datasets currently exist as Spark Dataframes and may be quite large.  Converting our data to a pandas Dataframe may result in an out of memory error, so instead, we'll convert our Spark Dataframe into a [Petastorm](https://petastorm.readthedocs.io/en/latest/) dataset. Petastorm is a library that caches Spark data to Parquet and provides high-speed, batched access to that data to libraries such as Tensorflow and PyTorch:

**NOTE** Petastorm may complain that a given cached file is too small.  Use the repartition() method to adjust the number of cached files generated with each dataset, but play with the count to determine the best number of files for your scenario.

In [6]:
# configure temp cache for petastorm files
spark.conf.set(SparkDatasetConverter.PARENT_CACHE_DIR_URL_CONF, 'abfss://recommender@salabcommercedatalake.dfs.core.windows.net/instacart/pstorm_cache') # the file:// prefix is required by petastorm

# persist dataframe data to petastorm cache location
train_pstorm = make_spark_converter(train.repartition(4))  
valid_pstorm = make_spark_converter(valid.repartition(4)) 
test_pstorm = make_spark_converter(test.repartition(4)) 

StatementMeta(CPUPool, 16, 6, Finished, Available)

Converting floating-point columns to float32
Converting floating-point columns to float32
Converting floating-point columns to float32


To make the data in the Petastorm cache accessible, we will need to define specs that read the data and transform it into the format expected by Tensorflow.  This format requires features to be presented as a dictionary and the label to be presented as a scalar value:

In [7]:
def get_data_specs(epochs=1, batch_size=128):
  
  # define functions to transform data into req'ed format
  def get_input_fn(dataset_context_manager):
    
    # re-structure a row as ({features}, label)
    def _to_tuple(row): 
      features = {}
      for col in cat_features + num_features:
        features[col] = getattr(row, col)
      return features, getattr(row, label_col)
    
    def fn(): # called by estimator to perform row structure conversion
      return dataset_context_manager.__enter__().map(_to_tuple)
    
    return fn

  # access petastorm cache as tensorflow dataset
  train_ds = train_pstorm.make_tf_dataset(batch_size=batch_size)
  valid_ds = valid_pstorm.make_tf_dataset()
  
  # define spec to return transformed data for model training & evaluation
  train_spec = tf.estimator.TrainSpec(
                input_fn=get_input_fn(train_ds), 
                max_steps=int( (train_pstorm.dataset_size * epochs) / batch_size )
                )
  eval_spec = tf.estimator.EvalSpec(
                input_fn=get_input_fn(valid_ds)
                )
  
  return train_spec, eval_spec

StatementMeta(CPUPool, 16, 7, Finished, Available)

We can verify our specs by retrieving a row as follows.  Note that the default batch size for the training (first) spec is 128 records:

In [8]:
# retrieve specs
specs = get_data_specs()

# retrieve first batch from first (training) spec
next(
  iter(
    specs[0].input_fn().take(1)
    )
  )

StatementMeta(CPUPool, 16, 8, Finished, Available)

2022-07-12 03:02:05.862781: E tensorflow/stream_executor/cuda/cuda_driver.cc:271] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2022-07-12 03:02:05.862842: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (vm-70943303): /proc/driver/nvidia/version does not exist
2022-07-12 03:02:05.865684: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.


({'aisle_id': <tf.Tensor: shape=(128,), dtype=int32, numpy=
  array([  6, 106,  31,  24,  17,  83,  78,  53,  83,  84,  61,  99,  24,
         112, 106, 114,  16,  84, 117,  37, 117,  45,  45,  67,  22,  20,
          88,  83,  24,  21,  66, 107,  31, 115,  83,  21,  81,  37,  83,
          24,  93,  78, 116, 115,  52,  85, 112,  86, 120,   7,  32,  81,
          37, 121,  83,  69,  83,  86,  21, 107,  83,  91,  83,  93,  59,
          83,  69,  43, 120,  21,  24,  46,  19,  75,  17,  19,  16,  92,
         123,  83,  88,  88,   3,  58,  92, 105,  52,  45, 108,  38,  83,
          92, 131,  51,  31,  74, 112, 107,  64, 123, 116,  24,  24,  16,
         105,  92,  86,  86,  63, 116, 107,  83,  61,  59,  93, 128,  32,
          83, 112,  79,  77,  24,  88,  83,  88,  84,  39,  24], dtype=int32)>,
  'department_id': <tf.Tensor: shape=(128,), dtype=int32, numpy=
  array([ 2, 12,  7,  4, 13,  4, 19, 16,  4, 16, 19, 15,  4,  3, 12, 17,  4,
         16, 19,  1, 19, 19, 19, 20, 11, 11, 13,  4,

## Step 2: Define the Model

With our data in place, we can now define the wide & deep model.  For this, we will make use of [Tensorflow's DNNLinearCombinedClassifier estimator](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedClassifier) which simplifies the definition of these kinds of models.

The feature inputs for the DNNLinearCombinedClassifier estimator are divided into those associated with a *wide*, linear model and those associated with a *deep* neural network.  The inputs to the wide model are the user and product ID combinations.  In this way, the linear model is being trained to memorize which products are purchased by which users.  These features may be brought into the model as simple categorical features [identified through an ordinal value](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_identity) or [hashed into a smaller number of buckets](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket).  The inclusion of a user-product [crossed hash](https://www.tensorflow.org/api_docs/python/tf/feature_column/crossed_column) allows the model to better understand user-product combinations:

**NOTE** Much of the logic that follows is encapsulated in functions.  This will make distributed processing occurring later in the notebook easier to implement and is fairly standard for most Tensorflow implementations.

In [9]:
def get_wide_features():

  wide_columns = []

  # user_id
  #wide_columns += [tf.feature_column.categorical_column_with_identity(
  #    key='user_id', 
  #    num_buckets=np.max(cat_keys['user_id'])+1 # create one bucket for each value from 0 to max
  #    )]
  wide_columns += [
    tf.feature_column.categorical_column_with_hash_bucket(
       key='user_id', 
       hash_bucket_size=1000,
       dtype=tf.dtypes.int64# create one bucket for each value from 0 to max
       )]

  # product_id
  #wide_columns += [
  #  tf.feature_column.categorical_column_with_identity(
  #    key='product_id', 
  #    num_buckets=np.max(cat_keys['product_id'])+1 # create one bucket for each value from 0 to max
  #    )]
  wide_columns += [
    tf.feature_column.categorical_column_with_hash_bucket(
       key='product_id', 
       hash_bucket_size=100,
       dtype=tf.dtypes.int64 # create one bucket for each value from 0 to max
       )]

  # user-product cross-column (set column spec to ensure presented as int64)
  wide_columns += [
    tf.feature_column.crossed_column(
      [ tf.feature_column.categorical_column_with_identity(key='user_id', num_buckets=np.max(cat_keys['user_id'])+1),
        tf.feature_column.categorical_column_with_identity(key='product_id', num_buckets=np.max(cat_keys['product_id'])+1)
        ], 
      hash_bucket_size=1000
      )] 

  return wide_columns

StatementMeta(CPUPool, 16, 9, Finished, Available)

The feature inputs for the deep (neural network) component of the model are the features that describe our users and products in more generalized ways. By avoiding specific user and product IDs, the deep model is trained to learn attributes that signal preferences between users and products. For the categorical features, an [embedding](https://www.tensorflow.org/api_docs/python/tf/feature_column/embedding_column) is used to succintly capture the feature data.  The number of dimensions in the embedding is based on guidance in [this tutorial](https://tensorflow2.readthedocs.io/en/stable/tensorflow/g3doc/tutorials/wide_and_deep/):

In [10]:
def get_deep_features():
  
  deep_columns = []

  # categorical features
  for col in cat_features:

    # don't use user ID or product ID
    if col not in ['user_id','product_id']:

      # base column definition
      col_def = tf.feature_column.categorical_column_with_identity(
        key=col, 
        num_buckets=np.max(cat_keys[col])+1 # create one bucket for each value from 0 to max
        )

      # define embedding on base column def
      deep_columns += [tf.feature_column.embedding_column(
                          col_def, 
                          dimension=int(np.max(cat_keys[col])**0.25)
                          )] 

  # continous features
  for col in num_features:
    deep_columns += [tf.feature_column.numeric_column(col)]  
    
  return deep_columns

StatementMeta(CPUPool, 16, 10, Finished, Available)

With our features defined, we can now assemble the estimator:

**NOTE** The optimizers are passed as classes to address an issue identified [here](https://stackoverflow.com/questions/58108945/cannot-do-incremental-training-with-dnnregressor).

In [11]:
def get_model(hidden_layers, hidden_layer_nodes_initial_count, hidden_layer_nodes_count_decline_rate, dropout_rate):  
  
  # determine hidden_units structure
  hidden_units = [None] * int(hidden_layers)
  for i in range(int(hidden_layers)):
    # decrement the nodes by the decline rate
    hidden_units[i] = int(hidden_layer_nodes_initial_count * (hidden_layer_nodes_count_decline_rate**i))
 
  # get features
  wide_features = get_wide_features()
  deep_features = get_deep_features()
    
  # define model
  estimator = tf.estimator.DNNLinearCombinedClassifier(
    linear_feature_columns=wide_features,
    linear_optimizer=tf.keras.optimizers.Ftrl,
    dnn_feature_columns=deep_features,
    dnn_hidden_units=hidden_units,
    dnn_dropout=dropout_rate,
    dnn_optimizer=tf.keras.optimizers.Adagrad
    )

  return estimator

StatementMeta(CPUPool, 16, 11, Finished, Available)

## Step 3: Tune the Model

To tune the model, we need to define an evaluation metric.  By default, the DNNLinearCombinedClassifier seeks to minimize the [softmax (categorical) cross entropy](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits) metric which examines the distance between a predicted class probability and the actual class label.  (You can think of this metric as seeking more accurate and confident class predictions.)

We'll tune our model around this metric but it might be nice to provide a more traditional metric to assist us with evaluation of the end result.  For recommenders where the goal is to present products in order from most likely to least likely to be selected, [mean average precision @ k (MAP@K)](https://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf) is often used.  This metric examines the average precision associated with a top-*k* number of recommendations.  The closer the value of MAP@K to 1.0, the better aligned those recommendations are with a customer's product selections. 

To calculate MAP@K, we are [repurposing code presented by NVIDIA](https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow/Recommendation/WideAndDeep/utils/metrics.py) with their implementation of a wide and deep recommender for ad-placement:

In [12]:
# Adapted from: https://github.com/NVIDIA/DeepLearningExamples/blob/master/TensorFlow/Recommendation/WideAndDeep/utils/metrics.py
def map_custom_metric(features, labels, predictions):
  
  user_ids = tf.reshape(features['user_id'], [-1])
  predictions = predictions['probabilities'][:, 1]

  # sort user IDs 
  sorted_ids = tf.argsort(user_ids)
  
  # resort values to align with sorted user IDs
  user_ids = tf.gather(user_ids, indices=sorted_ids)
  predictions = tf.gather(predictions, indices=sorted_ids)
  labels = tf.gather(labels, indices=sorted_ids)

  # get unique user IDs in dataset
  _, user_ids_idx, user_ids_items_count = tf.unique_with_counts(
      user_ids, 
      out_idx=tf.int64
      )
  
  # remove any user duplicates
  pad_length = 30 - tf.reduce_max(user_ids_items_count)
  pad_fn = lambda x: tf.pad(x, [(0, 0), (0, pad_length)])
  preds = tf.RaggedTensor.from_value_rowids(
      predictions, user_ids_idx).to_tensor()
  labels = tf.RaggedTensor.from_value_rowids(
      labels, user_ids_idx).to_tensor()
  labels = tf.argmax(labels, axis=1)

  # calculate average precision at k
  return {
      'map@k': tf.compat.v1.metrics.average_precision_at_k(
          predictions=pad_fn(preds),
          labels=labels,
          k=10,
          name="streaming_map")
        }

StatementMeta(CPUPool, 16, 12, Finished, Available)

We can now bring together all of our logic to define our model:

In [13]:
def train_and_evaluate_model(hparams):
  
  # retrieve the basic model
  model = get_model(
    hparams['hidden_layers'], 
    hparams['hidden_layer_nodes_initial_count'], 
    hparams['hidden_layer_nodes_count_decline_rate'], 
    hparams['dropout_rate']
    )
  
  # add map@k metric
  model = tf.estimator.add_metrics(model, map_custom_metric)
  
  # retrieve data specs
  train_spec, eval_spec = get_data_specs( int(hparams['epochs']), int(hparams['batch_size']))
  
  # train and evaluate
  results = tf.estimator.train_and_evaluate(model, train_spec, eval_spec)
  
  # return loss metric
  return {'loss': results[0]['loss'], 'status': STATUS_OK}

StatementMeta(CPUPool, 16, 13, Finished, Available)

We can now give our model a test run, just to make sure all the moving parts are working together:

In [14]:
hparams = {
  'hidden_layers':2,
  'hidden_layer_nodes_initial_count':100,
  'hidden_layer_nodes_count_decline_rate':0.5,
  'dropout_rate':0.25,
  'epochs':1,
  'batch_size':128
  }

train_and_evaluate_model(hparams)

StatementMeta(CPUPool, 16, 14, Finished, Available)

Using temporary folder as model directory: /tmp/tmpdb9a_gq0
From /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/tensorflow/python/training/training_util.py:396: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
From /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/petastorm/tf_utils.py:383: calling DatasetV1.from_generator (from tensorflow.python.data.ops.dataset_ops) with output_types is deprecated and will be removed in a future version.
Instructions for updating:
Use output_signature instead
From /home/trusted-service-user/cluster-env/env/lib/python3.8/site-packages/keras/optimizers/optimizer_v2/adagrad.py:86: calling Constant.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in

2022-07-12 01:34:45.402346: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-07-12 01:34:45.414939: W tensorflow/core/common_runtime/forward_type_inference.cc:231] Type inference failed. This indicates an invalid graph that escaped type checking. Error message: INVALID_ARGUMENT: expected compatible input types, but input 1:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT64
    }
  }
}
 is neither a subtype nor a supertype of the combined inputs preceding it:
type_id: TFT_OPTIONAL
args {
  type_id: TFT_PRODUCT
  args {
    type_id: TFT_TENSOR
    args {
      type_id: TFT_INT32
    }
  }
}

	while inferring type of node 'dnn/zero_fraction/cond/output/_18'
2022-07-12 03:02:15.368196: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:354] MLIR V1 optimization pass is not enabled
2022-07-12 03:02:15.379665: W tensorflow/core/common_runtime/forward_typ

{'loss': 0.33366948, 'status': 'ok'}

With a successful test run completed, let's now perform hyperparameter tuning on the model. We will use [hyperopt](https://docs.databricks.com/applications/machine-learning/automl-hyperparam-tuning/index.html#hyperparameter-tuning-with-hyperopt) to ensure this work is distributed in a manner that allows us to manage the total time required for this operation.

Regarding the hyperparameters, we will play with the number of hidden units as well as the drop-out rate for the deep neural-network portion of the model.  While we will have the option to tune the number of epochs and the batch size for training, we will leave those set to fixed values at this time:

In [15]:
# Define Hyperparameter Search Space
search_space = {
  'hidden_layers': hp.quniform('hidden_layers', 1, 5, 1)  # determines number of hidden layers
  ,'hidden_layer_nodes_initial_count': hp.quniform('hidden_layer_nodes_initial', 50, 201, 10)  # determines number of nodes in first hidden layer
  ,'hidden_layer_nodes_count_decline_rate': hp.quniform('hidden_layer_nodes_count_decline_rate', 0.0, 0.51, 0.05) # determines how number of nodes decline in layers below first hidden layer
  ,'dropout_rate': hp.quniform('dropout_rate', 0.0, 0.51, 0.05)
  ,'epochs': hp.quniform('epochs', 3, 4, 1) # fixed value for now
  ,'batch_size': hp.quniform('batch_size', 128, 129, 1) # fixed value for now
  }

StatementMeta(CPUPool, 16, 15, Finished, Available)

In [16]:
# Perform Hyperparameter Search
argmin = fmin(
  fn=train_and_evaluate_model,
  space=search_space,
  algo=tpe.suggest,
  max_evals=100,
  trials=SparkTrials(parallelism=sc.defaultParallelism) # set to the number of executors for CPU-based clusters OR number of workers for GPU-based clusters
  )

StatementMeta(CPUPool, 16, 16, Finished, Available)

100%|██████████| 100/100 [2:02:41<00:00, 73.61s/trial, best loss: 0.2978660464286804]   
Total Trials: 100: 100 succeeded, 0 failed, 0 cancelled.


In [17]:
# Show Optimized Hyperparameters
space_eval(search_space, argmin)

StatementMeta(CPUPool, 16, 17, Finished, Available)

{'batch_size': 129.0,
 'dropout_rate': 0.0,
 'epochs': 4.0,
 'hidden_layer_nodes_count_decline_rate': 0.45,
 'hidden_layer_nodes_initial_count': 90.0,
 'hidden_layers': 4.0}

## Step 4: Evaluate the Model

Based on our optimized parameters, we can now train a final version of our model and explore the metrics associated with it:

In [18]:
hparams = space_eval(search_space, argmin)

model = get_model(
    hparams['hidden_layers'], 
    hparams['hidden_layer_nodes_initial_count'], 
    hparams['hidden_layer_nodes_count_decline_rate'], 
    hparams['dropout_rate']
    )
model = tf.estimator.add_metrics(model, map_custom_metric)

#train_spec, eval_spec = get_data_specs(hparams['epochs'],hparams['batch_size'])

train_spec, eval_spec = get_data_specs( int(hparams['epochs']), int(hparams['batch_size']))

results = tf.estimator.train_and_evaluate(model, train_spec, eval_spec)

StatementMeta(CPUPool, 16, 18, Finished, Available)

Using temporary folder as model directory: /tmp/tmpu3f56yav
It seems that global step (tf.train.get_global_step) has not been increased. Current value (could be stable): 22609 vs previous value: 22609. You could increase the global step by passing tf.train.get_global_step() to Optimizer.apply_gradients or Optimizer.minimize.
Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x7f41b7e250a0>


In [19]:
results[0]

StatementMeta(CPUPool, 16, 19, Finished, Available)

{'accuracy': 0.899375,
 'accuracy_baseline': 0.899375,
 'auc': 0.68152326,
 'auc_precision_recall': 0.18358128,
 'average_loss': 0.307229,
 'label/mean': 0.100625,
 'loss': 0.307229,
 'map@k': 1.0,
 'precision': 0.0,
 'prediction/mean': 0.10303449,
 'recall': 0.0,
 'global_step': 257846}

Using our test data, which the model did not see during hyperparameter tuning, we can better assess model performance.  Our test data, also stored in Petastorm, requires access to a function to re-organize it for evaluation.  In addition, we need to explicitly define the number of data steps over which the data should be evaluated (or the evaluation step will run indefinitely):

In [20]:
# Borrowed from get_data_specs() (defined above)
# ---------------------------------------------------------
# define functions to transform data into req'ed format
def get_input_fn(dataset_context_manager):

  def _to_tuple(row): # re-structure a row as ({features}, label)
    features = {}
    for col in cat_features + num_features:
      features[col] = getattr(row, col)
    return features, getattr(row, label_col)

  def fn(): # called by estimator to perform row structure conversion
    return dataset_context_manager.__enter__().map(_to_tuple)

  return fn
# ---------------------------------------------------------

# define batch size and number of steps
batch_size = 128
steps = int(test_pstorm.dataset_size/batch_size)

# retrieve test data
test_ds = test_pstorm.make_tf_dataset(batch_size=batch_size)

# evaulate against test data
results = model.evaluate(get_input_fn(test_ds), steps=steps)

StatementMeta(CPUPool, 16, 20, Finished, Available)

In [21]:
# show results
results

StatementMeta(CPUPool, 16, 21, Finished, Available)

{'accuracy': 0.9001098,
 'accuracy_baseline': 0.90010875,
 'auc': 0.68144107,
 'auc_precision_recall': 0.18812457,
 'average_loss': 0.30603984,
 'label/mean': 0.09989122,
 'loss': 0.30603984,
 'map@k': 0.9998661255879108,
 'precision': 1.0,
 'prediction/mean': 0.10357017,
 'recall': 1.083142e-05,
 'global_step': 257846}

Our model appears to produce similar results for the testing holdout.  We should feel confident moving it into the application infrastructure for live testing.
As a next step, with our model trained and evaluated, we may need to move it into our application infrastructure. To do this, we will need to persist the model in a manner that enables deployment. For this, we can make use of MLflow, for example. 