# Project Goal
    
Build a basic Tensorflow Pipeline that automatically executes tasks from ingestion to serving. Scenario, uses sythentically generated patient data (Heart Rate, Temperature, Respiratory Rate, White Blood Cell Count) that has been labelled with a 1 (Septic) or 0 (Not-Septic). reate and run a TFX pipeline to train a model to predict septic patients based on biological markers. The pipeline will consist of three essential TFX components: ExampleGen, Trainer and Pusher. The pipeline includes the most minimal ML workflow like importing data, training a model and exporting the trained TFRS ranking model.

## Overview of steps
1. Install required software
1. Configure pipeline variables
1. Prepare the raw data 
1. Write the training pipeline (data ingestion, model training, model pushing)
1. Run the training pipeline
1. Push the model

# Import required software

In [1]:
# commented out due to ODH notebook image error on direct pip commands. Use %horus error.
# run below commands in terminal
#!pip install --upgrade pip
#!pip install tensorflow tfx --progress-bar

In [2]:
import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext

TensorFlow version: 2.9.1
TFX version: 1.9.0


# Configure Pipeline Variables

There are some variables used to define a pipeline. You can customize these variables as you want. By default all output from the pipeline will be generated under the current directory. Instead of using the SchemaGen component to generate a schema, for this tutorial we will create a hardcoded schema.

In [3]:
import os

PIPELINE_NAME = "basic_pipeline"
MODEL_NAME = "sepsis"

# Output directory to store artifacts generated from the pipeline.
PIPELINE_ROOT = os.path.join('../pipeline', PIPELINE_NAME)
# Path to a SQLite DB file to use as an MLMD storage.
METADATA_PATH = os.path.join('../pipeline', 'metadata.db')
# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('../models', MODEL_NAME)

# Path to the training data
DATA_TRAIN = '../data/training_data'
# File name
TRAIN='septic_data_labelled.csv'
RAW_DATA = os.path.join(DATA_TRAIN,TRAIN)

from absl import logging
logging.set_verbosity(logging.INFO)  # Set default logging level

In [4]:
print(PIPELINE_ROOT)
print(METADATA_PATH)
print(SERVING_MODEL_DIR)
print(DATA_TRAIN)
print(RAW_DATA)

../pipeline/basic_pipeline
../pipeline/metadata.db
../models/sepsis
../data/training_data
../data/training_data/septic_data_labelled.csv


# Prepare and examine the training data

There are four numeric features in this dataset:

Heart Rate
Temperature
Respiratory Rate
White Blood Cell Count

There is a label of 0 or 1 indicating Not-Septic or Septic

We will build a model which predicts sepsis. 

## View the data

You should be able to see five values. For example, the first example means patient is not septic.

In [5]:
!head {RAW_DATA}

HR,Temp,Resp,WBC,isSeptic
40.0,110.0,12.0,4.54,0
41.5,79.0,26.0,4.23,0
41.9,61.0,14.0,18.13,0
40.1,89.0,20.0,3.40,0
35.6,136.0,13.0,9.21,0
35.8,83.0,26.0,8.95,0
35.6,83.0,18.0,21.70,0
35.4,85.0,14.0,3.92,0
38.3,238.0,25.0,6.96,0


## Write model training code

This model training code will be saved to a separate file.

In this tutorial we will use Generic Trainer of TFX which support Keras-based models. You need to write a Python file containing run_fn function, which is the entrypoint for the Trainer component.

In [6]:
_vitals_trainer_module_file = '../src/vitals_trainer.py'

In [7]:
%%writefile {_vitals_trainer_module_file}

from typing import List
from absl import logging
import tensorflow as tf
from tensorflow import keras
from tensorflow_transform.tf_metadata import schema_utils

from tfx import v1 as tfx
from tfx_bsl.public import tfxio
from tensorflow_metadata.proto.v0 import schema_pb2

_FEATURE_KEYS = [
    'HR', 'Temp', 'Resp', 'WBC'
]

_LABEL_KEY = 'isSeptic'

_TRAIN_BATCH_SIZE = 20
_EVAL_BATCH_SIZE = 10

# Since we're not generating or creating a schema, we will instead create
# a feature spec.  Since there are a fairly small number of features this is
# manageable for this dataset.
_FEATURE_SPEC = {
    **{
        feature: tf.io.FixedLenFeature(shape=[1], dtype=tf.float32)
           for feature in _FEATURE_KEYS
       },
    _LABEL_KEY: tf.io.FixedLenFeature(shape=[1], dtype=tf.int64)
}


def _input_fn(file_pattern: List[str],
              data_accessor: tfx.components.DataAccessor,
              schema: schema_pb2.Schema,
              batch_size: int = 200) -> tf.data.Dataset:
  """Generates features and label for training.

  Args:
    file_pattern: List of paths or patterns of input tfrecord files.
    data_accessor: DataAccessor for converting input to RecordBatch.
    schema: schema of the input data.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch

  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  return data_accessor.tf_dataset_factory(
      file_pattern,
      tfxio.TensorFlowDatasetOptions(
          batch_size=batch_size, label_key=_LABEL_KEY),
      schema=schema).repeat()


def _build_keras_model() -> tf.keras.Model:
  """Creates a DNN Keras model for classifying patient data.

  Returns:
    A Keras Model.
  """
  # The model below is built with Functional API, please refer to
  # https://www.tensorflow.org/guide/keras/overview for all API options.
  inputs = [keras.layers.Input(shape=(1,), name=f) for f in _FEATURE_KEYS]
  d = keras.layers.concatenate(inputs)
  for _ in range(2):
    d = keras.layers.Dense(8, activation='relu')(d)
  outputs = keras.layers.Dense(3)(d)

  model = keras.Model(inputs=inputs, outputs=outputs)
  model.compile(
      optimizer=keras.optimizers.Adam(1e-2),
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[keras.metrics.SparseCategoricalAccuracy()])

  model.summary(print_fn=logging.info)
  return model


# TFX Trainer will call this function.
def run_fn(fn_args: tfx.components.FnArgs):
  """Train the model based on given args.

  Args:
    fn_args: Holds args used to train the model as name/value pairs.
  """

  # This schema is usually either an output of SchemaGen or a manually-curated
  # version provided by pipeline author. A schema can also derived from TFT
  # graph if a Transform component is used. In the case when either is missing,
  # `schema_from_feature_spec` could be used to generate schema from very simple
  # feature_spec, but the schema returned would be very primitive.
  schema = schema_utils.schema_from_feature_spec(_FEATURE_SPEC)

  train_dataset = _input_fn(
      fn_args.train_files,
      fn_args.data_accessor,
      schema,
      batch_size=_TRAIN_BATCH_SIZE)
  eval_dataset = _input_fn(
      fn_args.eval_files,
      fn_args.data_accessor,
      schema,
      batch_size=_EVAL_BATCH_SIZE)

  model = _build_keras_model()
  model.fit(
      train_dataset,
      steps_per_epoch=fn_args.train_steps,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps)

  # save a model's architecture, weights, and training configuration in a single file/folder
  # This allows you to export a model so it can be used without access to the original Python code*. Since the optimizer-state is recovered, you can resume training from exactly where you left off.
  # The result of the training should be saved in `fn_args.serving_model_dir`
  # directory.
  model.save(
      # output the trained model to a the desired location given by FnArgs
      fn_args.serving_model_dir, 
      save_format='tf')

Overwriting ../src/vitals_trainer.py


## Write a pipeline definition

We define a function to create a TFX pipeline. A Pipeline object represents a TFX pipeline which can be run using one of pipeline orchestration systems that TFX supports.

In [8]:
_pipeline_file = '../src/pipeline.py'

In [9]:
#commented out writefile due to function not loading inside notebook
#%%writefile {_pipeline_file}

def _create_pipeline(pipeline_name: str,
                     pipeline_root: str,
                     data_root: str,
                     module_file: str,
                     serving_model_dir: str,
                     metadata_path: str) -> tfx.dsl.Pipeline:
  """Creates a three component patient pipeline with TFX."""
  # Brings data into the pipeline.
  example_gen = tfx.components.CsvExampleGen(input_base=DATA_TRAIN)

  # Uses user-provided Python function that trains a model.
  trainer = tfx.components.Trainer(
      module_file=os.path.abspath(_vitals_trainer_module_file),
      examples=example_gen.outputs['examples'],
      train_args=tfx.proto.TrainArgs(num_steps=100),
      eval_args=tfx.proto.EvalArgs(num_steps=5))

  # Pushes the model to a filesystem destination.
  pusher = tfx.components.Pusher(
      model=trainer.outputs['model'],
      push_destination=tfx.proto.PushDestination(
          filesystem=tfx.proto.PushDestination.Filesystem(
              base_directory=SERVING_MODEL_DIR)))

  # Following three components will be included in the pipeline.
  components = [
      example_gen,
      trainer,
      pusher,
  ]

  return tfx.dsl.Pipeline(
      pipeline_name=PIPELINE_NAME,
      pipeline_root=PIPELINE_ROOT,
      metadata_connection_config=tfx.orchestration.metadata
      .sqlite_metadata_connection_config(metadata_path),
      components=components)

# Run the pipeline

You should see "INFO:absl:Component Pusher is finished." at the end of the logs if the pipeline finished successfully. Because Pusher component is the last component of the pipeline.

The pusher component pushes the trained model to the SERVING_MODEL_DIR which is the models directory.

In [10]:
# if .ipynb_checkpoints exists, then pipeline will error due to split header mismatch
!rm -rf ../data/{training_data,serving_data}/.ipynb_checkpoints

In [11]:
tfx.orchestration.LocalDagRunner().run(
  _create_pipeline(
      pipeline_name=PIPELINE_NAME,
      pipeline_root=PIPELINE_ROOT,
      data_root=DATA_TRAIN,
      module_file=_vitals_trainer_module_file,
      serving_model_dir=SERVING_MODEL_DIR,
      metadata_path=METADATA_PATH))

INFO:absl:Generating ephemeral wheel package for '/opt/app-root/src/mlops-basic/src/vitals_trainer.py' (including modules: ['vitals_trainer', 'vitals_constants', 'vitals_transform']).
INFO:absl:User module package has hash fingerprint version 7a4f88ddd355534f2c448d0aaf5e38d35e174b568cbfa061cdee8ec483f6e629.
INFO:absl:Executing: ['/opt/app-root/bin/python3.8', '/tmp/tmphj94c967/_tfx_generated_setup.py', 'bdist_wheel', '--bdist-dir', '/tmp/tmponoto1xg', '--dist-dir', '/tmp/tmp1f9v_fgl']
INFO:absl:Successfully built user code wheel distribution at '../pipeline/basic_pipeline/_wheels/tfx_user_code_Trainer-0.0+7a4f88ddd355534f2c448d0aaf5e38d35e174b568cbfa061cdee8ec483f6e629-py3-none-any.whl'; target user module is 'vitals_trainer'.
INFO:absl:Full user module path is 'vitals_trainer@../pipeline/basic_pipeline/_wheels/tfx_user_code_Trainer-0.0+7a4f88ddd355534f2c448d0aaf5e38d35e174b568cbfa061cdee8ec483f6e629-py3-none-any.whl'
INFO:absl:Using deployment config:
 executor_specs {
  key: "CsvExam

INFO:absl:Processing input csv data ../data/training_data/* to TFExample.
INFO:absl:Examples generated.
INFO:absl:Value type <class 'NoneType'> of key version in exec_properties is not supported, going to drop it
INFO:absl:Value type <class 'list'> of key _beam_pipeline_args in exec_properties is not supported, going to drop it
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 4 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "../pipeline/basic_pipeline/CsvExampleGen/examples/4"
custom_properties {
  key: "input_fingerprint"
  value {
    string_value: "split:single_split,num_files:1,total_bytes:2389061,xor_checksum:1658278245,sum_checksum:1658278245"
  }
}
custom_properties {
  key: "name"
  value {
    string_value: "basic_pipeline:2022-07-21T00:24:47.584952:CsvExampleGen:examples:0"
  }
}
custom_properties {
  key: "span"
  value {
    int_value: 0
  }






INFO:tensorflow:Assets written to: ../pipeline/basic_pipeline/Trainer/model/5/Format-Serving/assets


INFO:tensorflow:Assets written to: ../pipeline/basic_pipeline/Trainer/model/5/Format-Serving/assets
INFO:absl:Training complete. Model written to ../pipeline/basic_pipeline/Trainer/model/5/Format-Serving. ModelRun written to ../pipeline/basic_pipeline/Trainer/model_run/5
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 5 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'model': [Artifact(artifact: uri: "../pipeline/basic_pipeline/Trainer/model/5"
custom_properties {
  key: "name"
  value {
    string_value: "basic_pipeline:2022-07-21T00:24:47.584952:Trainer:model:0"
  }
}
custom_properties {
  key: "tfx_version"
  value {
    string_value: "1.9.0"
  }
}
name: "basic_pipeline:2022-07-21T00:24:47.584952:Trainer:model:0"
, artifact_type: name: "Model"
base_type: MODEL
)], 'model_run': [Artifact(artifact: uri: "../pipeline/basic_pipeline/Trainer/model_run/5"
custom_properties {
  key: "name"
  va

In [12]:
# List files in created model directory.
!ls -R {SERVING_MODEL_DIR}

../models/sepsis:
1658362528  1658362900	1658363161

../models/sepsis/1658362528:
assets	keras_metadata.pb  saved_model.pb  variables

../models/sepsis/1658362528/assets:

../models/sepsis/1658362528/variables:
variables.data-00000-of-00001  variables.index

../models/sepsis/1658362900:
assets	keras_metadata.pb  saved_model.pb  variables

../models/sepsis/1658362900/assets:

../models/sepsis/1658362900/variables:
variables.data-00000-of-00001  variables.index

../models/sepsis/1658363161:
assets	keras_metadata.pb  saved_model.pb  variables

../models/sepsis/1658363161/assets:

../models/sepsis/1658363161/variables:
variables.data-00000-of-00001  variables.index


# BulkInferrer

In [13]:
# Here, we create an InteractiveContext using default parameters. This will
# use a temporary directory with an ephemeral ML Metadata database instance.
# To use your own pipeline root or database, the optional properties
# `pipeline_root` and `metadata_connection_config` may be passed to
# InteractiveContext. Calls to InteractiveContext are no-ops outside of the
# notebook.
context = InteractiveContext(pipeline_root=PIPELINE_ROOT)



In [14]:
INFERENCE_ROOT='../data/serving_data'
INFERENCE_EXAMPLES = os.path.join(INFERENCE_ROOT, 'septic_data_unlabelled.csv')
!head {INFERENCE_EXAMPLES}

Temp,HR,RR,WBC
39.37,82.65,15.97,6.23
39.88,91.09,19.25,14.43

In [15]:
from tfx.proto import example_gen_pb2

output = tfx.proto.Output(
             split_config=example_gen_pb2.SplitConfig(splits=[
                 tfx.proto.SplitConfig.Split(name='unlabelled', hash_buckets=1)             ]))

inference_example_gen = tfx.components.CsvExampleGen(
    # input_base an external directory containing the CSV files.
    input_base=INFERENCE_ROOT, 
    # To customize the train/eval split ratio which ExampleGen will output, set the output_config for ExampleGen component. 
    output_config=output)
context.run(inference_example_gen, enable_cache=True)

INFO:absl:Running driver for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen
INFO:absl:Generating examples.
INFO:absl:Processing input csv data ../data/serving_data/* to TFExample.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen
INFO:absl:MetadataStore with DB connection initialized


0,1
.execution_id,2
.component,"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } CsvExampleGen at 0x7fddf562dbe0.inputs{}.outputs['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7fddf560d7f0.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0.exec_properties['input_base']../data/serving_data['input_config']{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""*""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 1,  ""name"": ""unlabelled""  }  ]  } }['output_data_format']6['output_file_format']5['custom_config']None['range_config']None['span']0['version']None['input_fingerprint']split:single_split,num_files:1,total_bytes:63,xor_checksum:1658357963,sum_checksum:1658357963"
.component.inputs,{}
.component.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7fddf560d7f0.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
.inputs,{}
.outputs,"['examples'] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7fddf560d7f0.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"
.exec_properties,"['input_base']../data/serving_data['input_config']{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""*""  }  ] }['output_config']{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 1,  ""name"": ""unlabelled""  }  ]  } }['output_data_format']6['output_file_format']5['custom_config']None['range_config']None['span']0['version']None['input_fingerprint']split:single_split,num_files:1,total_bytes:63,xor_checksum:1658357963,sum_checksum:1658357963"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7fddf560d7f0.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,../pipeline/basic_pipeline/CsvExampleGen/examples/2
.span,0
.split_names,"[""unlabelled""]"
.version,0

0,1
['input_base'],../data/serving_data
['input_config'],"{  ""splits"": [  {  ""name"": ""single_split"",  ""pattern"": ""*""  }  ] }"
['output_config'],"{  ""split_config"": {  ""splits"": [  {  ""hash_buckets"": 1,  ""name"": ""unlabelled""  }  ]  } }"
['output_data_format'],6
['output_file_format'],5
['custom_config'],
['range_config'],
['span'],0
['version'],
['input_fingerprint'],"split:single_split,num_files:1,total_bytes:63,xor_checksum:1658357963,sum_checksum:1658357963"

0,1
['examples'],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Channel of type 'Examples' (1 artifact) at 0x7fddf560d7f0.type_nameExamples._artifacts[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
.type_name,Examples
._artifacts,"[0] function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
[0],"function toggleTfxObject(element) {  var objElement = element.parentElement;  if (objElement.classList.contains('collapsed')) {  objElement.classList.remove('collapsed');  objElement.classList.add('expanded');  } else {  objElement.classList.add('collapsed');  objElement.classList.remove('expanded');  } } Artifact of type 'Examples' (uri: ../pipeline/basic_pipeline/CsvExampleGen/examples/2) at 0x7fde54272bb0.type<class 'tfx.types.standard_artifacts.Examples'>.uri../pipeline/basic_pipeline/CsvExampleGen/examples/2.span0.split_names[""unlabelled""].version0"

0,1
.type,<class 'tfx.types.standard_artifacts.Examples'>
.uri,../pipeline/basic_pipeline/CsvExampleGen/examples/2
.span,0
.split_names,"[""unlabelled""]"
.version,0


In [16]:
# https://github.com/tensorflow/tfx/issues/2478#issuecomment-770373362

bulk_inferrer = tfx.components.BulkInferrer(
    examples=inference_example_gen.outputs['examples'],
    model=trainer.outputs['model'],
    model_blessing=evaluator.outputs['blessing'],
    data_spec=tfx.proto.DataSpec(example_splits=['unlabelled']),
    model_spec=tfx.proto.ModelSpec())

context.run(bulk_inferrer)

NameError: name 'trainer' is not defined

In [None]:
# Get the URI of the output artifact representing the transformed examples, which is a directory
# train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')
inference_uri = bulk_inferrer.outputs['inference_result'].get()[0].uri

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(inference_uri, name)
                      for name in os.listdir(inference_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 1 records and decode them.
for tfrecord in dataset.take(2):
  serialized_example = tfrecord.numpy()
  inference_example = tf.train.Example()
  inference_example.ParseFromString(serialized_example)
  pp.pprint(example)

In [None]:
# Get the URI of the output artifact representing the transformed examples, which is a directory
# train_uri = os.path.join(transform.outputs['transformed_examples'].get()[0].uri, 'Split-train')
inference_uri = bulk_inferrer.outputs['inference_result'].get()[0].uri

# Get the list of files in this directory (all compressed TFRecord files)
tfrecord_filenames = [os.path.join(inference_uri, name)
                      for name in os.listdir(inference_uri)]

# Create a `TFRecordDataset` to read these files
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")

# Iterate over the first 1 records and decode them.
for tfrecord in dataset.take(2):
  serialized_example = tfrecord.numpy()
  inference_example = tf.train.Example()
  inference_example.ParseFromString(serialized_example)
  pp.pprint(example)