# TFX Components Walk-through

The primary goal of this lab is to develop a high level understanding of core TFX components.

You will utilize  **TFX Interactive Context** to work with the TFX components interactivelly in a Jupyter notebook environment.

Working in an interactive notebook is useful when doing initial data exploration, experimenting with models, and designing ML pipelines. You should be aware that there are differences in the way interactive notebooks are orchestrated, and how they access metadata artifacts.

In a production deployment of TFX on GCP, you will use an orchestrator such as Kubeflow Pipelines, or Cloud Composer. In an interactive mode, the notebook itself is the orchestrator, running each TFX component as you execute the notebook cells.

In a production deployment, ML Metadata will be managed in a scalabe database like CloudSQL, and artifacts in apersistent store such as Google Cloud Storage. In an interactive mode, both properties and payloads are stored in the local file system of the Jupyter host.

You will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and use TFX  to analyze, understand and pre-process the dataset and train, analyze, validate and deploy the multi-class classification model.


The lab is designed to be instructor led. The instructor will walk you through the lab and provide commentary about each step. 

In [None]:
import absl
import os
import tempfile

import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
import tfx

from pprint import pprint
from tensorflow_metadata.proto.v0 import schema_pb2, statistics_pb2, anomalies_pb2
from tensorflow_transform.tf_metadata import schema_utils
from tfx.components import CsvExampleGen
from tfx.components import BigQueryExampleGen
from tfx.components import Evaluator
from tfx.components import ExampleValidator
from tfx.components import ModelValidator
from tfx.components import Pusher
from tfx.components import SchemaGen
from tfx.components import StatisticsGen
from tfx.components import Trainer
from tfx.components import Transform
from tfx.components.common_nodes.importer_node import ImporterNode
from tfx.orchestration import metadata
from tfx.orchestration import pipeline
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
from tfx.proto import evaluator_pb2
from tfx.proto import example_gen_pb2
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.proto.evaluator_pb2 import SingleSlicingSpec
from tfx.utils.dsl_utils import external_input


print("Tensorflow Version:", tf.__version__)
print("TFX Version:", tfx.__version__)

tf.enable_eager_execution()

## Configure lab settings

Set constants, location paths and other environment settings. 

In [None]:
PIPELINE_NAME='tfx-covertype-classifier'
PIPELINE_ROOT=os.path.join(os.sep, 'home', 'artifact-store', PIPELINE_NAME)
os.makedirs(PIPELINE_ROOT, exist_ok=True)

SERVING_MODEL_DIR=os.path.join(os.sep, 'home', 'serving_model')
os.makedirs(PIPELINE_ROOT, exist_ok=True)

DATA_ROOT = 'gs://workshop-datasets/covertype/full'

## Creating Interactive Context

TFX Interactive Context allows you to create and run TFX Components in an interactive mode. It is designed to support experimentation and development in a Jupyter Notebook environment. It is an experimental feature and major changes to interface and functionality are expected. When creating the interactive context you can specifiy the following parameters:
- `pipeline_name` - Optional name of the pipeline for ML Metadata tracking purposes. If not specified, a name will be generated for you.
- `pipeline_root` - Optional path to the root of the pipeline's outputs. If not specified, an ephemeral temporary directory will be created and used.
- `metadata_connection_config` - Optional `metadata_store_pb2.ConnectionConfig` instance used to configure connection to a ML Metadata connection. If not specified, an ephemeral SQLite MLMD connection contained in the pipeline_root directory with file name "metadata.sqlite" will be used.


In [None]:
context = InteractiveContext(
    pipeline_name=PIPELINE_NAME,
    pipeline_root=PIPELINE_ROOT,
    metadata_connection_config=None
)

## Ingesting data using ExampleGen

In any ML development process the first step  is to ingest the training and test datasets. The `ExampleGen` component ingests data into a TFX pipeline. It consumes external files/services to generate a set file files in the `TFRecord` format,  which will be used by other TFX components. It can also shuffle the data and split into an arbitrary number of partitions.

### Configure CsvExampleGen

In this exercise, you use the `CsvExampleGen` specialization of `ExampleGen` to ingest CSV files from the GCS location. The component is configured to split the input data into two splits - `train` and `eval` - using 4:1 ratio.  

In [None]:
output_config = example_gen_pb2.Output(
    split_config=example_gen_pb2.SplitConfig(splits=[
        example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=4),
        example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=1)
    ]))

example_gen = tfx.components.CsvExampleGen(
    instance_name='Data_Extraction_Spliting',
    input=external_input(DATA_ROOT),
    output_config=output_config
)

### Run the CsvExampleGen component

In [None]:
context.run(example_gen)

### Examine the ingested data

In [None]:
train_uri = example_gen.outputs['examples'].get()[0].uri
tfrecord_filenames = [os.path.join(train_uri, name)
                      for name in os.listdir(train_uri)]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
decoder = tfdv.TFExampleDecoder()
for tfrecord in dataset.take(2):
  serialized_example = tfrecord.numpy()
  example = decoder.decode(serialized_example)
  pprint(example)

## Generating statistics using StatisticsGen

The `StatisticsGen`  component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generates statistics for each split in the `ExampleGen` component's output. In our case there are two splits: `train` and `eval`.

### Configure and  run the `StatisticsGen` component

In [None]:
statistics_gen = tfx.components.StatisticsGen(
    instance_name='Statistics_Generation',
    examples=example_gen.outputs['examples'])

context.run(statistics_gen)

### Visualize statistics

The generated statistics can be visualized using the `tfdv.visualize_statistics()` function from the [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started) library or using a utility method of the `InteractiveContext` object. In fact, most of the artifacts generated by the TFX components can be visualized using `InteractiveContext`.

In [None]:
context.show(statistics_gen.outputs['statistics'])

## Infering data schema using SchemaGen

Some TFX components use a description input data called a schema. The schema is an instance of `schema.proto`. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. `SchemaGen` automatically generates the schema by inferring types, categories, and ranges from data statistics. The auto-generated schema is best-effort and only tries to infer basic properties of the data. It is expected that developers review and modify it as needed. `SchemaGen` uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).

The `SchemaGen` component generates the schema using the statistics for the `train` split. The statistics for other splits are ignored.

### Configure and run the `SchemaGen` components

In [None]:
schema_gen = SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)

context.run(schema_gen)

### Visualize the inferred schema

In [None]:
context.show(schema_gen.outputs['schema'])

## Updating the auto-generated schema

In most cases the auto-generated schemas must be fine tuned manually using insights from data exploration and/or domain knowledge about the data. For example, you know that in the `covertype` dataset there are seven types of forest cover (coded using 1-7 range) and that the value of the `Slope` feature should be in the 0-90 range. You can manually add these constraints to the auto-generated schema.



### Load the auto-generated schema proto file

In [None]:
schema_proto_path = '{}/{}'.format(schema_gen.outputs['schema'].get()[0].uri, 'schema.pbtxt')
schema = tfdv.load_schema_text(schema_proto_path)

### Modify the schema

You can use the protocol buffer APIs to modify the schema. 

In [None]:
tfdv.set_domain(schema, 'Cover_Type', schema_pb2.IntDomain(name='Cover_Type', min=1, max=7, is_categorical=True))
tfdv.set_domain(schema, 'Slope',  schema_pb2.IntDomain(name='Slope', min=0, max=90))

tfdv.display_schema(schema=schema)

#### Save the updated schema 

In [None]:
schema_dir = '/home/covertype_schema'
tf.io.gfile.makedirs(schema_dir)
schema_file = os.path.join(schema_dir, 'schema.pbtxt')

tfdv.write_schema_text(schema, schema_file)

!cat {schema_file}

## Importing the updated schema using ImporterNode

The `ImporterNode` component allows you to import an external artifact, including the schema file, so it can be used by other TFX components in your workflow. 


### Configure and run the `ImporterNode` component

In [None]:
schema_importer = ImporterNode(
    instance_name='Schema_Importer',
    source_uri=schema_dir,
    artifact_type=tfx.types.standard_artifacts.Schema,
    reimport=False
)

context.run(schema_importer)

### Visualize the imported schema

In [None]:
context.show(schema_importer.outputs['result'])

## Validating data with ExampleValidator

The `ExampleValidator` component identifies anomalies in data.  It identifies anomalies by comparing data statistics computed by the `StatisticsGen` component against a schema generated by `SchemaGen` or imported by `ImporterNode`.

`ExampleValidator` can detect different classes of anomalies. For example it can:

- perform validity checks by comparing data statistics against a schema 
- detect training-serving skew by comparing training and serving data.
- detect data drift by looking at a series of data.


The `ExampleValidator` component validates the data in the `eval` split only. Other splits are ignored. 

### Configure and run the `ExampleValidator` component


In [None]:
example_validator = ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_importer.outputs['result'],
    instance_name="Data_Validation"
)

context.run(example_validator)

### Examine the output of `ExampleValidator`

The output artifact of the ExampleValidator is the `anomalies.pbtxt` file describing an anomalies_pb2.Anomalies protobuf.

In [None]:
train_uri = example_validator.outputs['anomalies'].get()[0].uri
anomalies_filename = os.path.join(train_uri, "anomalies.pbtxt")
!cat $anomalies_filename

### Visualize validation results

The file `anomalies.pbtxt` can be visualized using `context.show`.

In [None]:
context.show(example_validator.outputs['output'])

In our case no anomalies were detected in the `eval` split.

For a detailed deep dive into data validation and schema generation refer to the `lab-31-tfdv-structured-data` lab.

## Preprocessing data with Transform

The `Transform` component performs data transformation and feature engineering. The `Transform` component consumes `tf.Examples` emitted from the `ExampleGen` component and emits the transformed feature data and the `SavedModel` graph that was used to process the data. The emitted `SavedModel`  can then be used by serving components to make sure that the same data pre-processing logic is applied at training and serving.

The `Transform` component requires more code than many other components because of the arbitrary complexity of the feature engineering that you may need for the data and/or model that you're working with. It requires code files to be available which define the processing needed.

### Define the pre-processing module

To configure `Transform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a  python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.

In most cases, your implementation of the `preprocessing_fn` makes extensive use of [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) for performing feature engineering on your dataset.

In [None]:
_transform_module = 'covertype_transform.py'

In [None]:
%%writefile {_transform_module}

# Copyright 2019 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#            http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Covertype dataset transformation routines."""

import tensorflow as tf
import tensorflow_transform as tft

NUMERIC_FEATURES_KEYS = [
    'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology',
    'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways',
    'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm',
    'Horizontal_Distance_To_Fire_Points'
]

CATEGORICAL_FEATURES_KEYS = ['Wilderness_Area', 'Soil_Type']

LABEL_KEY = 'Cover_Type'


def _transformed_name(key):
  return key + '_xf'


def _fill_in_missing(x):
  """Replaces missing values and coverts a SparseTensor to a DenseTensor."""

  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)


def preprocessing_fn(inputs):
  """Preprocesses Covertype Dataset."""

  outputs = {}

  # Scale numerical features
  for key in NUMERIC_FEATURES_KEYS:
    outputs[_transformed_name(key)] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  # Generate vocabularies and maps categorical features
  for key in CATEGORICAL_FEATURES_KEYS:
    outputs[_transformed_name(key)] = tft.compute_and_apply_vocabulary(
        x=_fill_in_missing(inputs[key]), num_oov_buckets=1, vocab_filename=key)

  # Convert Cover_Type from 1-7 to 0-6
  outputs[_transformed_name(LABEL_KEY)] = _fill_in_missing(
      inputs[LABEL_KEY]) - 1

  return outputs

### Configure and run the `Transform` component.

In [None]:
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_importer.outputs['result'],
    module_file=_transform_module)

context.run(transform)

### Examine the `Transform` component's outputs

The Transform component has 2 outputs:

- `transform_output` - contains the graph that can perform the preprocessing operations (this graph will be included in the serving and evaluation models).
- `transformed_examples` - contains the preprocessed training and evaluation data.

Take a peek at the `transform_output` artifact: it points to a directory containing 3 subdirectories:

In [None]:
os.listdir(transform.outputs['transform_output'].get()[0].uri)

The `transform_fn` subdirectory contains the actual preprocessing graph. The `metadata` subdirectory contains the schema of the original data. The `transformed_metadata` subdirectory contains the schema of the preprocessed data.

The `transformed_examples` folder contains `TFRecord` files with transformed data.

In [None]:
transform_uri = transform.outputs['transformed_examples'].get()[1].uri
tfrecord_filenames = [os.path.join(transform_uri, name)
                      for name in os.listdir(transform_uri)]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
decoder = tfdv.TFExampleDecoder()
for tfrecord in dataset.take(3):
  serialized_example = tfrecord.numpy()
  example = decoder.decode(serialized_example)
  pprint(example)

### Train with the `Trainer` component

The `Trainer` component trains a model using TensorFlow.

`Trainer` takes:

- tf.Examples used for training and eval.
- A user provided module file that defines the trainer logic.
- A data schema created by `SchemaGen` or imported by `ImporterNode`.
- A proto definition of train args and eval args.
- An optional transform graph produced by upstream Transform component.
- An optional base models used for scenarios such as warmstart.

Trainer generates  a `SavedModel` and an `EvalSavedModel`. 

#### Define the trainer module

To configure `Trainer`, you need to encapsulate your training code in a Python module that is then provided to the `Trainer` as an input. The module must include the `trainer_fn` function that must return an `tf.estimator` based estimator. If you prefer to work with `Keras`, you can do so and then convert the Keras model to an estimator using the `tf.keras.model_to_estimator()` function.


In [None]:
_trainer_module_file = 'covertype_trainer.py'

In [None]:
%%writefile {_trainer_module_file}

# Copyright 2019 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#            http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Covertype Classifier training function."""

import tensorflow as tf

import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

NUMERIC_FEATURE_KEYS = [
    'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology',
    'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways',
    'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm',
    'Horizontal_Distance_To_Fire_Points'
]

CATEGORICAL_FEATURE_KEYS = ['Wilderness_Area', 'Soil_Type']

LABEL_KEY = 'Cover_Type'
NUM_CLASSES = 7

EXPORTED_MODEL_NAME = 'covertype-classifier'


def _transformed_name(key):
  return key + '_xf'


def _get_raw_feature_spec(schema):
  return schema_utils.schema_as_feature_spec(schema).feature_spec


def _gzip_reader_fn(filenames):
  """Returns a TFRecord reader that can read gzip'ed files."""
  return tf.data.TFRecordDataset(filenames, compression_type='GZIP')


def _build_estimator(config,
                     numeric_feature_keys,
                     categorical_feature_keys,
                     hidden_units,
                     warm_start_from=None):
  """Build an estimator for predicting forest cover based on cartographic data."""

  num_feature_columns = [
      tf.feature_column.numeric_column(key) for key in numeric_feature_keys
  ]
  categorical_feature_columns = [
      tf.feature_column.categorical_column_with_identity(
          key, num_buckets=num_buckets, default_value=0)
      for key, num_buckets in categorical_feature_keys
  ]

  return tf.estimator.DNNLinearCombinedClassifier(
      config=config,
      n_classes=NUM_CLASSES,
      linear_feature_columns=categorical_feature_columns,
      dnn_feature_columns=num_feature_columns,
      dnn_hidden_units=hidden_units or [100, 70, 50, 25],
      warm_start_from=warm_start_from)


def _input_fn(filenames, feature_specs, label_key, batch_size=200):
  """Generates features and labels for training or evaluation."""

  dataset = tf.data.experimental.make_batched_features_dataset(
      file_pattern=filenames,
      batch_size=batch_size,
      features=feature_specs,
      label_key=label_key,
      reader=_gzip_reader_fn)

  return dataset


def _example_serving_receiver_fn(tf_transform_output, schema, label_key):
  """Builds the serving graph."""

  raw_feature_spec = _get_raw_feature_spec(schema)
  raw_feature_spec.pop(label_key)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec, default_batch_size=None)
  serving_input_receiver = raw_input_fn()

  transformed_features = tf_transform_output.transform_raw_features(
      serving_input_receiver.features)

  return tf.estimator.export.ServingInputReceiver(
      transformed_features, serving_input_receiver.receiver_tensors)


def _eval_input_receiver_fn(tf_transform_output, schema, label_key):
  """Builds everything needed for the tf-model-analysis to run the model."""

  # Notice that the inputs are raw features, not transformed features here.
  raw_feature_spec = _get_raw_feature_spec(schema)

  raw_input_fn = tf.estimator.export.build_parsing_serving_input_receiver_fn(
      raw_feature_spec, default_batch_size=None)
  serving_input_receiver = raw_input_fn()

  features = serving_input_receiver.features.copy()
  transformed_features = tf_transform_output.transform_raw_features(features)

  # NOTE: Model is driven by transformed features (since training works on the
  # materialized output of TFT, but slicing will happen on raw features.
  features.update(transformed_features)

  return tfma.export.EvalInputReceiver(
      features=features,
      receiver_tensors=serving_input_receiver.receiver_tensors,
      labels=transformed_features[label_key])


def trainer_fn(hparams, schema):
  """Builds the objects required by TFX Transform."""

  train_batch_size = 40
  eval_batch_size = 40
  hidden_units = [128, 64]

  # Retrieve transformed feature specs
  tf_transform_output = tft.TFTransformOutput(hparams.transform_output)
  transformed_feature_spec = (
      tf_transform_output.transformed_feature_spec().copy())

  print(transformed_feature_spec)
  print(type(transformed_feature_spec))

  # Prepare transformed feature name lists
  # For categorical features retrieve vocabulary sizes
  transformed_label_key = _transformed_name(LABEL_KEY)
  transformed_numeric_feature_keys = [
      _transformed_name(key) for key in NUMERIC_FEATURE_KEYS
  ]
  transformed_categorical_feature_keys = [
      (_transformed_name(key),
       tf_transform_output.num_buckets_for_transformed_feature(
           _transformed_name(key))) for key in CATEGORICAL_FEATURE_KEYS
  ]

  # Create a training input function
  train_input_fn = lambda: _input_fn(
      filenames=hparams.train_files,
      feature_specs=tf_transform_output.transformed_feature_spec().copy(),
      batch_size=train_batch_size,
      label_key=transformed_label_key)

  # Create an evaluation input function
  eval_input_fn = lambda: _input_fn(
      filenames=hparams.eval_files,
      feature_specs=tf_transform_output.transformed_feature_spec().copy(),
      batch_size=eval_batch_size,
      label_key=transformed_label_key)

  # Create a training specification
  train_spec = tf.estimator.TrainSpec(
      train_input_fn, max_steps=hparams.train_steps)

  # Create an evaluation specifaction
  serving_receiver_fn = lambda: _example_serving_receiver_fn(
      tf_transform_output, schema, LABEL_KEY)
  exporter = tf.estimator.FinalExporter(EXPORTED_MODEL_NAME,
                                        serving_receiver_fn)

  eval_spec = tf.estimator.EvalSpec(
      eval_input_fn,
      steps=hparams.eval_steps,
      exporters=[exporter],
      name=EXPORTED_MODEL_NAME)

  # Create runtime config
  run_config = tf.estimator.RunConfig(
      save_checkpoints_steps=999, keep_checkpoint_max=1)

  run_config = run_config.replace(model_dir=hparams.serving_model_dir)

  # Build an estimator
  estimator = _build_estimator(
      hidden_units=hidden_units,
      numeric_feature_keys=transformed_numeric_feature_keys,
      categorical_feature_keys=transformed_categorical_feature_keys,
      config=run_config,
      warm_start_from=hparams.warm_start_from)

  # Create an input receiver for TFMA processing
  receiver_fn = lambda: _eval_input_receiver_fn(tf_transform_output, schema,
                                                transformed_label_key)

  return {
      'estimator': estimator,
      'train_spec': train_spec,
      'eval_spec': eval_spec,
      'eval_input_receiver_fn': receiver_fn
  }

#### Create and run the Trainer component

In [None]:
trainer = Trainer(
    module_file=_trainer_module_file,
    examples=transform.outputs['transformed_examples'],
    schema=schema_importer.outputs['result'],
    transform_output=transform.outputs['transform_output'],
    train_args=trainer_pb2.TrainArgs(num_steps=10000),
    eval_args=trainer_pb2.EvalArgs(num_steps=5000))

context.run(trainer)

## Analyzing training runs with TensorBoard

In this step you will analyze the training run with [TensorBoard.dev](https://blog.tensorflow.org/2019/12/introducing-tensorboarddev-new-way-to.html). `TensorBoard.dev` is a managed service that enables you to easily host, track and share your ML experiments.

*There are some issues with the support for TensorBoard in the current release of AI Platform Notebooks. This is the reason for using TensorBoard.dev. When the issues are addressed the lab will be updated to use the built-in support for TensorBoard.*

### Retrieve the location of TensorBoard logs

In [None]:
train_uri = trainer.outputs['model'].get()[0].uri
logs_path = os.path.join(train_uri, 'serving_model_dir')
print(logs_path)

### Upload the logs and start TensorBoard.dev

1. Open a new JupyterLab terminal window

2. From the terminal window, execute the following command
```
tensorboard dev upload --logdir [YOUR_LOGDIR]
```

Where [YOUR_LOGDIR] is an URI retrieved by the previous cell.

You will be asked to authorize `TensorBoard.dev` using your Google account. If you don't have a Google account or you don't want to authorize `TensorBoard.dev` you can skip this exercise.

After the authorization process completes, follow the link provided to view your experiment.

## Evaluating trained models with Evaluator
The `Evaluator` component analyzes model performance using the [TensorFlow Model Analysis library](https://www.tensorflow.org/tfx/model_analysis/get_started). It runs inference requests on particular subsets of the test dataset, based on which slices are defined by the developer. Knowing which slices should be analyzed requires domain knowledge of what is important in this particular use case or domain. 


### Configure and run the Evaluator component

In [None]:
model_analyzer = Evaluator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'],
)
context.run(model_analyzer)

### Visualize evaluation results
You can visualize the evaluation results using the `tfma.view.render_slicing_metrics()` function from TensorFlow Model Analysis library.

*Currently, there is an issue in the JupyterLab on AI Platform Notebooks that prevents `tfma.view.render_slicing_metrics()` from rendering. We will keep monitoring the issue and update this part of the lab as required.*

In [None]:
#evaluation_uri = model_analyzer.outputs['output'].get()[0].uri
#eval_result = tfma.load_eval_result(evaluation_uri)
#tfma.view.render_slicing_metrics(eval_result)

## Validating the model with the ModelValidator Component

The `ModelValidator` Component helps you validate your exported models, ensuring that they are "good enough" to be pushed to production.

`ModelValidator` compares new models against a baseline (such as the currently serving model) to determine if they're "good enough" relative to the baseline. It does so by evaluating both models on an eval dataset and computing their performance on metrics (e.g. AUC, loss). If the new model's metrics meet developer-specified criteria relative to the baseline model (e.g. AUC is not lower), the model is "blessed" (marked as good), indicating to the Pusher that it is ok to push the model to production.

Note: Currently developers can only specify criteria metrics for the whole evaluation split (dataset). A future version will support more granular criteria such as slices.


### Configure and run the `ModelValidator` component

In [None]:
model_validator = ModelValidator(
    examples=example_gen.outputs['examples'],
    model=trainer.outputs['model'])
context.run(model_validator)

### Examine the output of `ModelValidator`

In [None]:
model_validator.outputs

In [None]:
blessing_uri = model_validator.outputs.blessing.get()[0].uri
!ls -l {blessing_uri}

## Deploying models with Pusher

The `Pusher` component checks whether a model has been "blessed", and if so, deploys it by pushing the model to a well known file destination.



### Configure and run the `Pusher` component

In [None]:
_serving_model_dir = os.path.join(os.sep, 'home', 'serving_model', 'covertype_classifier')

pusher = Pusher(
    model=trainer.outputs['model'],
    model_blessing=model_validator.outputs['blessing'],
    push_destination=pusher_pb2.PushDestination(
        filesystem=pusher_pb2.PushDestination.Filesystem(
            base_directory=_serving_model_dir)))

context.run(pusher)

### Examine the output of `Pusher`

In [None]:
pusher.outputs

In [None]:
latest_pushed_model = os.path.join(_serving_model_dir, max(os.listdir(_serving_model_dir)))
!saved_model_cli show --dir {latest_pushed_model} --all

## Next steps

This concludes the lab `lab-31-tfx-components-walkthrough`. The next labs in the series will guide through developing a TFX pipeline, deploying and running the pipeline on **Kubeflow Pipelines** and automating the pipeline build and deployment processes with **Cloud Build**.