# Hands-on Exercise Part 2: ML Metadata and TF Serving

And this second exercise we will run the complete pipeline using the same code as in the previous exercise. This time we clone it directly and execute the complete pipeline with the LocalDagRunner. We will also take a look at the ML Metadata store and TF Serving.

But first, lets again install TFX and *RESTART RUNTIME* thereafter

In [1]:
! pip install -U tfx

Lets now import the required libraries.

In [2]:
import tfx
from tfx import v1 as tfxv1
import os
from time import time
import numpy as np

import logging
logging.disable(logging.WARNING)

## 1. Complete pipeleine run with LocalDagRunner

To obtain the same setting as at the end of th previous exercise, we run the complete pipeline with the LocalDagRunner. The code for the individual components is identical and we clone it directly from Github.

In [3]:
! git clone https://github.com/roman807/tfx-training.git

We can now cd into the tfx-training directory and take a look at the files.

In [4]:
os.chdir('tfx-training')
os.listdir()

Next, we:

1.   Import the create_pipeline function from the `laptop_pipeline.py` module
2.   Define the required paths
3.   Run the complete pipeline locally with the LocalDagRunner

In [5]:
from laptop_pipeline import create_pipeline

DATA_ROOT = 'data'
SERVING_MODEL_DIR = 'serving_model'                      
PIPELINE_NAME = 'laptop_pipeline'   # where we will save the final model for deployment with TF serving
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.db')

start = time()
tfxv1.orchestration.LocalDagRunner().run(
  create_pipeline(
      data_root=DATA_ROOT,
      transform_module_file='utils/transform_module.py',
      trainer_module_file='utils/trainer_module.py',
      pipeline_name=PIPELINE_NAME,
      pipeline_root=PIPELINE_ROOT,
      serving_model_dir=SERVING_MODEL_DIR,
      metadata_path=METADATA_PATH,
      ))
print(f'\ncompleted pipeline run in  {np.round(time()-start)}s')

Lets confirm that all the components are here as expected

In [8]:
os.listdir('pipelines/laptop_pipeline')

## 2. ML Metadata store

When we ran our pipeline with the LocalDagRunner, the runner created the ML Metadata store locally at our specified path as a SQLite database. Let's create a connection to the metadata store and explore the content.

In [5]:
import ml_metadata as mlmd
from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = METADATA_PATH
connection_config.sqlite.connection_mode = 3 # READWRITE_OPENCREATE
store = metadata_store.MetadataStore(connection_config)

See the various available calls for the metadata store object

In [7]:
[call for call in dir(store) if '__' not in call]

As a reminder:
*  Artifacts: information about inputs/outputs
*  Executions: records of component runs & runtime parameters
*  Context: conceptual group of artifacts and executions in a workflow

Explore the Artifacts, Executions and Contexts by individually uncommenting them and running the cell below.


In [9]:
# store.get_artifact_types()
# store.get_artifacts()
# store.get_artifacts_by_type('Examples')
# store.get_artifacts_by_type('Schema')
# store.get_artifacts_by_type('#####')

# store.get_execution_types()
# store.get_executions()

# store.get_contexts()  

### Get the schema for later inference requests
Now, let's make use of the metadata store for the next part of this exercise. For an inference request with TF serving we'll need the schema of the raw data samples (tf.Examples).

We first get the location of the schema from the metadata store and then load and parse the schema.pbtxt file with tfx io_utils 

In [35]:
from tensorflow_metadata.proto.v0 import schema_pb2

# get the schema uri from the metadata store:
schema_uri = store.get_artifacts_by_type('Schema')[0].uri

# load and parse the schema:
schema_filename = os.path.join(schema_uri, "schema.pbtxt")
schema = tfx.utils.io_utils.parse_pbtxt_file(file_name=schema_filename,
                                             message=schema_pb2.Schema())

## 3. TensorFlow Serving

Finally, lets explore how to spin up a server with TensorFlow Serving and deploy a model for inference requests

First, we install tensorflow-serving using command line tools:

In [10]:
!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update

In [11]:
! apt-get install tensorflow-model-server

Lets now set the location of our latest pushed model as an environment variable for the server

In [46]:
latest_pushed_model = os.path.join(SERVING_MODEL_DIR, max(os.listdir(SERVING_MODEL_DIR)))
os.environ["MODEL_DIR"] = os.path.join(os.getcwd(), os.path.split(latest_pushed_model)[0])

Spin up TF serving server on localhost / port 8501

In [47]:
%%bash --bg 
nohup tensorflow_model_server \
  --rest_api_port=8501 \
  --model_name=laptop_price_predictor \
  --model_base_path="${MODEL_DIR}" >server.log 2>&1

Lets take a look at the server.log to verify that our server runs as expected

In [12]:
! tail server.log

Now we define helper functions to prepare a sample for inference. We use the TF Serving REST API that expects requests in JSON format. Meanwhile our TensorFlow model expects input in Protobuf format.

Therefore, we first serialize our input to Protobuf (`_make_serialized_examples`), then encode our examples with a b64-encoder for JSON serialization and send them to the server with the requests library (`do_inference`).

In [49]:
import base64
import json
import requests

import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import dataset_metadata
from tensorflow_transform.tf_metadata import schema_utils
from tensorflow_transform import coders as tft_coders

import laptop_constants

def _get_raw_feature_spec(schema):
  return schema_utils.schema_as_feature_spec(schema).feature_spec

def _make_proto_coder(schema):
  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_schema = schema_utils.schema_from_feature_spec(raw_feature_spec)
  return tft_coders.ExampleProtoCoder(raw_schema)

def make_serialized_examples(example_jsons, schema):
  """Parses examples from CSV file and returns seralized proto examples."""
  filtered_features = [
      feature for feature in schema.feature if feature.name != laptop_constants.LABEL_KEY
  ]
  del schema.feature[:]
  schema.feature.extend(filtered_features)

  proto_coder = _make_proto_coder(schema)

  serialized_examples = []
  for sample in example_jsons:
    one_example = {}
    for feature in schema.feature:
      name = feature.name
      if sample[name]:
        if feature.type == schema_pb2.FLOAT:
          one_example[name] = [float(sample[name])]
        elif feature.type == schema_pb2.INT:
          one_example[name] = [int(sample[name])]
        elif feature.type == schema_pb2.BYTES:
          one_example[name] = [sample[name].encode('utf8')]
      else:
        one_example[name] = []

    serialized_example = proto_coder.encode(one_example)
    serialized_examples.append(serialized_example)
  
  return serialized_examples

In [50]:
def do_inference(server_addr, model_name, serialized_examples):
  """Sends requests to the model and prints the results.
  Args:
    server_addr: network address of model server in "host:port" format
    model_name: name of the model as understood by the model server
    serialized_examples: serialized examples of data to do inference on
  """
  parsed_server_addr = server_addr.split(':')

  host=parsed_server_addr[0]
  port=parsed_server_addr[1]
  json_examples = []
  
  for serialized_example in serialized_examples:
    # The encoding follows the guidelines in:
    # https://www.tensorflow.org/tfx/serving/api_rest
    example_bytes = base64.b64encode(serialized_example).decode('utf-8')
    predict_request = '{ "b64": "%s" }' % example_bytes
    json_examples.append(predict_request)

  json_request = '{ "instances": [' + ','.join(map(str, json_examples)) + ']}'

  server_url = 'http://' + host + ':' + port + '/v1/models/' + model_name + ':predict'
  response = requests.post(
      server_url, data=json_request, timeout=5.0)
  response.raise_for_status()
  prediction = response.json()
  print(json.dumps(prediction, indent=4))

Lets define an example of a laptop and predict the price. Play around by changing some Paramers (e.g. Company from Apple -> Lenovo, Inches increase, different Ram, etc.) and see how the estimated price changes accordingly.

In [13]:
example_jsons = [
  {
      'laptop_ID': 1, 
      'Company': 'Apple', 
      'Product': 'MacBook',
      'TypeName': 'Ultrabook',
      'Inches': 13.3, 
      'ScreenResolution': '1999x900', 
      'Cpu': 'Intel Core i5 2.3GHz',
      'Ram': '8GB', 
      'Memory': '256GB SSD', 
      'Gpu': 'Intel Iris Plus Graphics 640', 
      'OpSys': 'macOS', 
      'Weight': '1.9kg', 
   
  }
]
serialized_examples = make_serialized_examples(
    example_jsons=example_jsons,
    schema=schema)

do_inference(server_addr='127.0.0.1:8501', 
     model_name='laptop_price_predictor',
     serialized_examples=serialized_examples)

Congrats! You finished this exercise and saw how to run a complete pipeline, use the ML metadatastore to retrieve artifacts and run inference with TF Serving