# TFX Components Walk-through

The goal of this lab is provide a high level overview of the core TFX components.

You will work with the [Covertype Data Set](https://github.com/jarokaz/mlops-labs/blob/master/datasets/covertype/README.md) and build a TFX pipeline that trains and deploy a multi-class classification model to predict the type of forest cover from cartographic features.


**Setup Note:**
Currently, TFMA visualizations do not render properly in JupyterLab. It is recommended to run this notebook in Jupyter Classic Notebook. To switch to Classic Notebook select *Launch Classic Notebook* from the *Help* menu.

In [1]:
import absl
import os
import tempfile
import time

import tensorflow as tf
import tensorflow_data_validation as tfdv
import tensorflow_model_analysis as tfma
import tfx

from tensorflow_metadata.proto.v0 import schema_pb2, statistics_pb2, anomalies_pb2
from tensorflow_transform.tf_metadata import schema_utils

from ml_metadata.metadata_store import metadata_store
from ml_metadata.proto import metadata_store_pb2

from tfx.components import CsvExampleGen
from tfx.components import Evaluator
from tfx.components import ExampleValidator
from tfx.components import Pusher
from tfx.components import ResolverNode
from tfx.components import SchemaGen
from tfx.components import StatisticsGen
from tfx.components import Trainer
from tfx.components import Transform
from tfx.components.base import executor_spec
from tfx.components.common_nodes.importer_node import ImporterNode
from tfx.components.trainer import executor as trainer_executor

from tfx.dsl.experimental import latest_blessed_model_resolver

from tfx.orchestration.beam.beam_dag_runner import BeamDagRunner
from tfx.orchestration.ai_platform_pipelines import ai_platform_pipelines_dag_runner
from tfx.orchestration.metadata import sqlite_metadata_connection_config
from tfx.orchestration.pipeline import Pipeline
from tfx.orchestration import metadata

from tfx.proto import evaluator_pb2
from tfx.proto import example_gen_pb2
from tfx.proto import infra_validator_pb2
from tfx.proto import pusher_pb2
from tfx.proto import trainer_pb2
from tfx.proto.evaluator_pb2 import SingleSlicingSpec

from tfx.utils.dsl_utils import external_input

from tfx.types import Channel
from tfx.types.standard_artifacts import Examples
from tfx.types.standard_artifacts import ExampleStatistics
from tfx.types.standard_artifacts import ExampleAnomalies
from tfx.types.standard_artifacts import Model
from tfx.types.standard_artifacts import ModelEvaluation
from tfx.types.standard_artifacts import ModelBlessing
from tfx.types.standard_artifacts import InfraBlessing



## Set up the environment
### Verify TFX SDK Version

**Note**: this lab was developed and tested with the following TF ecosystem package versions:

`Tensorflow Version: 2.3.0`  
`TFX Version: 0.23.0.caip20200818`  
`TFDV Version: 0.23.0`  
`TFMA Version: 0.23.0`



In [4]:
print("Tensorflow Version:", tf.__version__)
print("TFX Version:", tfx.__version__)
print("TFDV Version:", tfdv.__version__)
print("TFMA Version:", tfma.VERSION_STRING)

absl.logging.set_verbosity(absl.logging.INFO)

Tensorflow Version: 2.3.0
TFX Version: 0.23.0.caip20200818
TFDV Version: 0.23.0
TFMA Version: 0.23.0


If the versions above do not match, update your packages in the current Jupyter kernel. 

### Update `PATH` with the location of TFX SDK.

In [5]:
os.environ['PATH'] += os.pathsep + '/home/jupyter/.local/bin'

## Configure lab settings

Set constants, location paths and other environment settings. 

In [6]:
PROJECT_ID = 'mlops-dev-env'
GCS_BUCKET_NAME = 'mlops-dev-workspace'
MODEL_STORE = f'gs://{GCS_BUCKET_NAME}/model-store'

### Settings for Beam runner

In [7]:
LOCAL_PIPELINE_ROOT= os.path.join(os.sep, 'home', 'jupyter', 'pipeline-root')
METADATA_PATH = os.path.join(LOCAL_PIPELINE_ROOT, 'metadata.sqlite')
metadata_connection_config=sqlite_metadata_connection_config(METADATA_PATH)
connection_config = metadata_store_pb2.ConnectionConfig()
connection_config.sqlite.filename_uri = METADATA_PATH
connection_config.sqlite.connection_mode = 3 # READWRITE_OPENCREATE

### Settings for Managed Pipelines runner

In [8]:
GCS_PIPELINE_ROOT = f'gs://{GCS_BUCKET_NAME}/pipeline-root'
BASE_IMAGE = f'gcr.io/caip-pipelines-assets/tfx:{tfx.__version__}'

### Source dataset

In [9]:
DATA_ROOT = 'gs://workshop-datasets/covertype/small'

## Configure TFX Pipeline

### Ingesting data using ExampleGen

In any ML development process the first step  is to ingest the training and test datasets. The `ExampleGen` component ingests data into a TFX pipeline. It consumes external files/services to generate a set file files in the `TFRecord` format,  which will be used by other TFX components. It can also shuffle the data and split into an arbitrary number of partitions.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/ExampleGen.png width="300">


In this exercise, you use the `CsvExampleGen` specialization of `ExampleGen` to ingest CSV files from a GCS location. The component is configured to split the input data into two splits - `train` and `eval` - using 4:1 ratio.  

In [10]:
output_config = example_gen_pb2.Output(
    split_config=example_gen_pb2.SplitConfig(splits=[
        example_gen_pb2.SplitConfig.Split(name='train', hash_buckets=4),
        example_gen_pb2.SplitConfig.Split(name='eval', hash_buckets=1)
    ]))

example_gen = tfx.components.CsvExampleGen(
    instance_name='Data_Extraction_Spliting',
    input_base=DATA_ROOT,
    output_config=output_config
)

#### Create a simple pipeline to test the component

In [11]:
pipeline_name = 'example_gen_only'
components = [example_gen]

pipeline = Pipeline(
    pipeline_name=pipeline_name,
    pipeline_root=f'{LOCAL_PIPELINE_ROOT}/{pipeline_name}',
    metadata_connection_config=metadata_connection_config,
    components=components
)

#### Run the pipeline

In [12]:
BeamDagRunner().run(pipeline)



INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting depends on [].
INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting is scheduled.
INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting is running.
INFO:absl:Running driver for CsvExampleGen.Data_Extraction_Spliting
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:Running executor for CsvExampleGen.Data_Extraction_Spliting
INFO:absl:Generating examples.
INFO:absl:Processing input csv data gs://workshop-datasets/covertype/small/* to TFExample.
INFO:absl:Examples generated.
INFO:absl:Running publisher for CsvExampleGen.Data_Extraction_Spliting
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting is finished.


#### Examine artifacts generated by the pipeline run

In [13]:
with metadata.Metadata(connection_config) as store:
    examples_artifacts = store.get_artifacts_by_type(Examples.TYPE_NAME)

for element in examples_artifacts:
    print(element.uri)
    print(element.properties)

INFO:absl:MetadataStore with DB connection initialized


/home/jupyter/pipeline-root/example_gen_only/CsvExampleGen.Data_Extraction_Spliting/examples/1
{'split_names': string_value: "[\"train\", \"eval\"]"
}


#### Examine the ingested data

In [14]:
examples_uri = examples_artifacts[-1].uri
tfrecord_filenames = [os.path.join(examples_uri, 'train', name)
                      for name in os.listdir(os.path.join(examples_uri, 'train'))]
dataset = tf.data.TFRecordDataset(tfrecord_filenames, compression_type="GZIP")
for tfrecord in dataset.take(2):
  example = tf.train.Example()
  example.ParseFromString(tfrecord.numpy())
  for name, feature in example.features.feature.items():
    if feature.HasField('bytes_list'):
        value = feature.bytes_list.value
    if feature.HasField('float_list'):
        value = feature.float_list.value
    if feature.HasField('int64_list'):
        value = feature.int64_list.value
    print('{}: {}'.format(name, value))
  print('******')

Horizontal_Distance_To_Hydrology: [648]
Elevation: [3142]
Slope: [9]
Horizontal_Distance_To_Roadways: [757]
Wilderness_Area: [b'Commanche']
Vertical_Distance_To_Hydrology: [101]
Soil_Type: [b'C7757']
Cover_Type: [1]
Hillshade_3pm: [157]
Hillshade_Noon: [247]
Horizontal_Distance_To_Fire_Points: [1871]
Hillshade_9am: [223]
Aspect: [183]
******
Horizontal_Distance_To_Fire_Points: [451]
Hillshade_Noon: [227]
Hillshade_9am: [245]
Aspect: [124]
Horizontal_Distance_To_Hydrology: [60]
Elevation: [1967]
Horizontal_Distance_To_Roadways: [124]
Slope: [16]
Wilderness_Area: [b'Cache']
Vertical_Distance_To_Hydrology: [9]
Soil_Type: [b'C2704']
Cover_Type: [2]
Hillshade_3pm: [105]
******


### Generating statistics using StatisticsGen

The `StatisticsGen`  component generates data statistics that can be used by other TFX components. StatisticsGen uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started). `StatisticsGen` generate statistics for each split in the `ExampleGen` component's output. In our case there two splits: `train` and `eval`.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/StatisticsGen.png width="200">

In [15]:
statistics_gen = tfx.components.StatisticsGen(
    instance_name='Statistics_Generation',
    examples=example_gen.outputs['examples'])

INFO:absl:Excluding no splits because exclude_splits is not set.


### Infering data schema using SchemaGen

Some TFX components use a description input data called a schema. The schema is an instance of `schema.proto`. It can specify data types for feature values, whether a feature has to be present in all examples, allowed value ranges, and other properties. `SchemaGen` automatically generates the schema by inferring types, categories, and ranges from data statistics. The auto-generated schema is best-effort and only tries to infer basic properties of the data. It is expected that developers review and modify it as needed. `SchemaGen` uses [TensorFlow Data Validation](https://www.tensorflow.org/tfx/data_validation/get_started).

The `SchemaGen` component generates the schema using the statistics for the `train` split. The statistics for other splits are ignored.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/SchemaGen.png width="200">

In [16]:
schema_gen = SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)

INFO:absl:Excluding no splits because exclude_splits is not set.


### Validating data with ExampleValidator

The `ExampleValidator` component identifies anomalies in data.  It identifies anomalies by comparing data statistics computed by the `StatisticsGen` component against a schema generated by `SchemaGen` or imported by `ImporterNode`.

`ExampleValidator` can detect different classes of anomalies. For example it can:

- perform validity checks by comparing data statistics against a schema 
- detect training-serving skew by comparing training and serving data.
- detect data drift by looking at a series of data.


The `ExampleValidator` component validates the data in the `eval` split only. Other splits are ignored. 

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/ExampleValidator.png width="350">

In [17]:
example_validator = ExampleValidator(
    statistics=statistics_gen.outputs['statistics'],
    schema=schema_gen.outputs['schema'],
    instance_name="Data_Validation"
)

INFO:absl:Excluding no splits because exclude_splits is not set.


### Preprocessing data with Transform

The `Transform` component performs data transformation and feature engineering. The `Transform` component consumes `tf.Examples` emitted from the `ExampleGen` component and emits the transformed feature data and the `SavedModel` graph that was used to process the data. The emitted `SavedModel`  can then be used by serving components to make sure that the same data pre-processing logic is applied at training and serving.

The `Transform` component requires more code than many other components because of the arbitrary complexity of the feature engineering that you may need for the data and/or model that you're working with. It requires code files to be available which define the processing needed.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Transform.png width="400">

#### Define the pre-processing module

To configure `Trainsform`, you need to encapsulate your pre-processing code in the Python `preprocessing_fn` function and save it to a  python module that is then provided to the Transform component as an input. This module will be loaded by transform and the `preprocessing_fn` function will be called when the `Transform` component runs.

In most cases, your implementation of the `preprocessing_fn` makes extensive use of [TensorFlow Transform](https://www.tensorflow.org/tfx/guide/tft) for performing feature engineering on your dataset.

In [18]:
FEATURES_MODULE = 'features.py'

In [38]:
%%writefile {FEATURES_MODULE}
# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Covertype model  taxi model features."""
import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

NUMERIC_FEATURE_KEYS = [
    'Elevation', 'Aspect', 'Slope', 'Horizontal_Distance_To_Hydrology',
    'Vertical_Distance_To_Hydrology', 'Horizontal_Distance_To_Roadways',
    'Hillshade_9am', 'Hillshade_Noon', 'Hillshade_3pm',
    'Horizontal_Distance_To_Fire_Points'
]

CATEGORICAL_FEATURE_KEYS = ['Wilderness_Area', 'Soil_Type']

LABEL_KEY = 'Cover_Type'
NUM_CLASSES = 7

def transformed_name(key):
  return key + '_xf'

Writing features.py


In [39]:
TRANSFORM_MODULE = 'preprocessing.py'

In [40]:
%%writefile {TRANSFORM_MODULE}
# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Covertype preprocessing.
This file defines a template for TFX Transform component.
"""

import tensorflow as tf
import tensorflow_transform as tft

import features

def _fill_in_missing(x):
  """Replace missing values in a SparseTensor.
  Fills in missing values of `x` with '' or 0, and converts to a dense tensor.
  Args:
    x: A `SparseTensor` of rank 2.  Its dense shape should have size at most 1
      in the second dimension.
  Returns:
    A rank 1 tensor where missing values of `x` have been filled in.
  """
  default_value = '' if x.dtype == tf.string else 0
  return tf.squeeze(
      tf.sparse.to_dense(
          tf.SparseTensor(x.indices, x.values, [x.dense_shape[0], 1]),
          default_value),
      axis=1)

def preprocessing_fn(inputs):
  """Preprocesses Covertype Dataset."""

  outputs = {}

  # Scale numerical features
  for key in features.NUMERIC_FEATURE_KEYS:
    outputs[features.transformed_name(key)] = tft.scale_to_z_score(
        _fill_in_missing(inputs[key]))

  # Generate vocabularies and maps categorical features
  for key in features.CATEGORICAL_FEATURE_KEYS:
    outputs[features.transformed_name(key)] = tft.compute_and_apply_vocabulary(
        x=_fill_in_missing(inputs[key]), num_oov_buckets=1, vocab_filename=key)

  # Convert Cover_Type to dense tensor
  outputs[features.transformed_name(features.LABEL_KEY)] = _fill_in_missing(
      inputs[features.LABEL_KEY])

  return outputs


Writing preprocessing.py


Configure the component.

In [22]:
transform = Transform(
    examples=example_gen.outputs['examples'],
    schema=schema_gen.outputs['schema'],
    module_file=TRANSFORM_MODULE)



### Train with the `Trainer` component

The `Trainer` component trains a model using TensorFlow.

`Trainer` takes:

- tf.Examples used for training and eval.
- A user provided module file that defines the trainer logic.
- A data schema created by `SchemaGen` or imported by `ImporterNode`.
- A proto definition of train args and eval args.
- An optional transform graph produced by upstream Transform component.
- An optional base models used for scenarios such as warmstart.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Trainer.png width="400">


#### Define the trainer module

To configure `Trainer`, you need to encapsulate your training code in a Python module that is then provided to the `Trainer` as an input. 


In [23]:
TRAINER_MODULE_FILE = 'model.py'

In [41]:
%%writefile {TRAINER_MODULE_FILE}
# Copyright 2020 Google LLC. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""The Covertype classifier DNN keras model."""

import absl
import os

import tensorflow as tf
import tensorflow_model_analysis as tfma
import tensorflow_transform as tft
from tensorflow_transform.tf_metadata import schema_utils

import features

HIDDEN_UNITS = [16, 8]
LEARNING_RATE = 0.001
TRAIN_BATCH_SIZE = 64
EVAL_BATCH_SIZE = 64

LOCAL_LOG_DIR = '/tmp/logs'


def _gzip_reader_fn(filenames):
  """Small utility returning a record reader that can read gzip'ed files."""
  return tf.data.TFRecordDataset(filenames, compression_type='GZIP')


def _get_serve_tf_examples_fn(model, tf_transform_output):
  """Returns a function that parses a serialized tf.Example and applies TFT."""

  model.tft_layer = tf_transform_output.transform_features_layer()

  @tf.function
  def serve_tf_examples_fn(serialized_tf_examples):
    """Returns the output to be used in the serving signature."""
    feature_spec = tf_transform_output.raw_feature_spec()
    feature_spec.pop(features.LABEL_KEY)
    parsed_features = tf.io.parse_example(serialized_tf_examples, feature_spec)

    transformed_features = model.tft_layer(parsed_features)

    return model(transformed_features)

  return serve_tf_examples_fn


def _input_fn(file_pattern, tf_transform_output, batch_size=200):
  """Generates features and label for tuning/training.
  Args:
    file_pattern: input tfrecord file pattern.
    tf_transform_output: A TFTransformOutput.
    batch_size: representing the number of consecutive elements of returned
      dataset to combine in a single batch
  Returns:
    A dataset that contains (features, indices) tuple where features is a
      dictionary of Tensors, and indices is a single Tensor of label indices.
  """
  transformed_feature_spec = (
      tf_transform_output.transformed_feature_spec().copy())

  dataset = tf.data.experimental.make_batched_features_dataset(
      file_pattern=file_pattern,
      batch_size=batch_size,
      features=transformed_feature_spec,
      reader=_gzip_reader_fn,
      label_key=features.transformed_name(features.LABEL_KEY))

  return dataset

def _build_keras_model(tf_transform_output, hidden_units, learning_rate):
  """Creates a DNN Keras model for classifying taxi data.
  Args:
    hidden_units: [int], the layer sizes of the DNN (input layer first).
  Returns:
    A keras Model.
  """

  numeric_columns = [
      tf.feature_column.numeric_column(
          key=features.transformed_name(key), 
          shape=())
      for key in features.NUMERIC_FEATURE_KEYS
  ]

  categorical_columns = [
      tf.feature_column.categorical_column_with_identity(
          key=features.transformed_name(key), 
          num_buckets=tf_transform_output.num_buckets_for_transformed_feature(features.transformed_name(key)), 
          default_value=0)
      for key in features.CATEGORICAL_FEATURE_KEYS
  ]

  indicator_columns = [
      tf.feature_column.indicator_column(categorical_column)
      for categorical_column in categorical_columns
  ]

  model = _wide_and_deep_classifier(
      # TODO(b/139668410) replace with premade wide_and_deep keras model
      wide_columns=indicator_columns,
      deep_columns=numeric_columns,
      dnn_hidden_units=hidden_units,
      learning_rate=learning_rate)
  return model


def _wide_and_deep_classifier(wide_columns, deep_columns, dnn_hidden_units, learning_rate):
  """Builds a simple keras wide and deep model.
  Args:
    wide_columns: Feature columns wrapped in indicator_column for wide (linear)
      part of the model.
    deep_columns: Feature columns for deep part of the model.
    dnn_hidden_units: [int], the layer sizes of the hidden DNN.
  Returns:
    A Wide and Deep Keras model
  """
  
  input_layers = {
      column.key: tf.keras.layers.Input(name=column.key, shape=(), dtype=tf.float32)
      for column in deep_columns
  }
  
  input_layers.update({
      column.categorical_column.key: tf.keras.layers.Input(name=column.categorical_column.key, shape=(), dtype=tf.int32)
      for column in wide_columns
  })
    
  deep = tf.keras.layers.DenseFeatures(deep_columns)(input_layers)
  for numnodes in dnn_hidden_units:
    deep = tf.keras.layers.Dense(numnodes)(deep)
  wide = tf.keras.layers.DenseFeatures(wide_columns)(input_layers)

  output = tf.keras.layers.Dense(features.NUM_CLASSES, activation='softmax')(
               tf.keras.layers.concatenate([deep, wide]))

  model = tf.keras.Model(input_layers, output)
  model.compile(
      loss='sparse_categorical_crossentropy',
      optimizer=tf.keras.optimizers.Adam(lr=learning_rate),
      metrics=[tf.keras.metrics.SparseCategoricalAccuracy()])
  model.summary(print_fn=absl.logging.info)
  return model

def _copy_tensorboard_logs(local_path, saved_model_path):
    """Copies Tensorboard logs to the subfolder in the GCS SavedModel location."""

    if saved_model_path[0:5] == 'gs://':
        pattern = '{}/*/events.out.tfevents.*'.format(local_path)
        dest_path = saved_model_path.rstrip('/') + '/' + 'logs'
        local_files = tf.io.gfile.glob(pattern)
        dest_log_files = [local_file.replace(local_path, dest_path) for local_file in local_files]
        for local_file, dest_file in zip(local_files, dest_log_files):
            tf.io.gfile.copy(local_file, dest_file)

# TFX Trainer will call this function.
def run_fn(fn_args):
  """Trains a model based on given args.
  Args:
    fn_args: Holds args used to train the model as name/value pairs.
  """
  
  tf_transform_output = tft.TFTransformOutput(fn_args.transform_output)
    
  train_dataset = _input_fn(fn_args.train_files, tf_transform_output, TRAIN_BATCH_SIZE)
  eval_dataset = _input_fn(fn_args.eval_files, tf_transform_output, EVAL_BATCH_SIZE)
    
  model = _build_keras_model(
      tf_transform_output=tf_transform_output,
      hidden_units=HIDDEN_UNITS,
      learning_rate=LEARNING_RATE
  )

  tensorboard_callback = tf.keras.callbacks.TensorBoard(
      log_dir=LOCAL_LOG_DIR, update_freq='batch')
  callbacks = [ 
      tensorboard_callback
  ]

  model.fit(
      train_dataset,
      steps_per_epoch=fn_args.train_steps,
      validation_data=eval_dataset,
      validation_steps=fn_args.eval_steps,
      verbose=2,
      callbacks=callbacks)
    
  signatures = {
      'serving_default':
          _get_serve_tf_examples_fn(model,
                                    tf_transform_output).get_concrete_function(
                                        tf.TensorSpec(
                                            shape=[None],
                                            dtype=tf.string,
                                            name='examples')),
  }
  
  model.save(fn_args.serving_model_dir, save_format='tf', signatures=signatures)
  _copy_tensorboard_logs(LOCAL_LOG_DIR, fn_args.serving_model_dir)
    


Writing model.py



As of the 0.23 release of TFX, the `Trainer` component only supports passing a single field - `num_steps` - through the `train_args` and `eval_args` arguments. 

In [25]:
trainer = Trainer(
    custom_executor_spec=executor_spec.ExecutorClassSpec(trainer_executor.GenericExecutor),
    module_file=TRAINER_MODULE_FILE,
    transformed_examples=transform.outputs["transformed_examples"],
    schema=schema_gen.outputs["schema"],
    transform_graph=transform.outputs["transform_graph"],
    train_args=trainer_pb2.TrainArgs(num_steps=5000),
    eval_args=trainer_pb2.EvalArgs(num_steps=1000))

### Evaluating trained models with Evaluator
The `Evaluator` component analyzes model performance using the [TensorFlow Model Analysis library](https://www.tensorflow.org/tfx/model_analysis/get_started). It runs inference requests on particular subsets of the test dataset, based on which slices are defined by the developer. Knowing which slices should be analyzed requires domain knowledge of what is important in this particular use case or domain. 

The `Evaluator` can also optionally validate a newly trained model against a previous model. In this lab, you only train one model, so the Evaluator automatically will label the model as "blessed".


<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Evaluator.png width="400">

You can use the `ResolverNode` to pick the previous model to compare against.  The model resolver is only required if performing model validation in addition to evaluation. In this case we validate against the latest blessed model. If no model has been blessed before (as in this case) the evaluator will make our candidate the first blessed model.

In [26]:
model_resolver = ResolverNode(
      instance_name='latest_blessed_model_resolver',
      resolver_class=latest_blessed_model_resolver.LatestBlessedModelResolver,
      model=Channel(type=Model),
      model_blessing=Channel(type=ModelBlessing))

Configure evaluation metrics and slices.

In [27]:
accuracy_threshold = tfma.MetricThreshold(
                value_threshold=tfma.GenericValueThreshold(
                    lower_bound={'value': 0.5},
                    upper_bound={'value': 0.99}),
                change_threshold=tfma.GenericChangeThreshold(
                    absolute={'value': 0.0001},
                    direction=tfma.MetricDirection.HIGHER_IS_BETTER),
                )

metrics_specs = tfma.MetricsSpec(
                   metrics = [
                       tfma.MetricConfig(class_name='SparseCategoricalAccuracy',
                           threshold=accuracy_threshold),
                       tfma.MetricConfig(class_name='ExampleCount')])

eval_config = tfma.EvalConfig(
    model_specs=[
        tfma.ModelSpec(label_key='Cover_Type')
    ],
    metrics_specs=[metrics_specs],
    slicing_specs=[
        tfma.SlicingSpec(),
        tfma.SlicingSpec(feature_keys=['Wilderness_Area'])
    ]
)
eval_config

model_specs {
  label_key: "Cover_Type"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "Wilderness_Area"
}
metrics_specs {
  metrics {
    class_name: "SparseCategoricalAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
        upper_bound {
          value: 0.99
        }
      }
      change_threshold {
        absolute {
          value: 0.0001
        }
        direction: HIGHER_IS_BETTER
      }
    }
  }
  metrics {
    class_name: "ExampleCount"
  }
}

In [28]:
model_analyzer = Evaluator(
    examples=example_gen.outputs.examples,
    model=trainer.outputs.model,
    baseline_model=model_resolver.outputs.model,
    eval_config=eval_config
)

### Deploying models with Pusher

The `Pusher` component checks whether a model has been "blessed", and if so, deploys it by pushing the model to a well known file destination.

<img src=https://github.com/GoogleCloudPlatform/mlops-on-gcp/raw/master/images/Pusher.png width="400">



In [29]:
serving_model_dir = '{}/covertype_classifier'.format(MODEL_STORE)
pusher = Pusher(
    model=trainer.outputs['model'],
    model_blessing=model_analyzer.outputs['blessing'],
    push_destination=pusher_pb2.PushDestination(
        filesystem=pusher_pb2.PushDestination.Filesystem(
            base_directory=serving_model_dir)))

## Running the pipeline

### Create the pipeline

In [30]:
pipeline_name = 'covertype-end-to-end'
components = [example_gen, 
              statistics_gen, 
              schema_gen,
              example_validator,
              transform,
              trainer,
              model_resolver,
              model_analyzer,
              pusher]

pipeline = Pipeline(
    pipeline_name=pipeline_name,
    pipeline_root=f'{LOCAL_PIPELINE_ROOT}/{pipeline_name}',
    metadata_connection_config=metadata_connection_config,
    components=components
)

### Run the pipeline locally using Beam Runner

In [31]:
BeamDagRunner().run(pipeline)

INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting depends on [].
INFO:absl:Component CsvExampleGen.Data_Extraction_Spliting is scheduled.
INFO:absl:Component ResolverNode.latest_blessed_model_resolver depends on [].
INFO:absl:Component ResolverNode.latest_blessed_model_resolver is scheduled.
INFO:absl:Component StatisticsGen.Statistics_Generation depends on ['Run[CsvExampleGen.Data_Extraction_Spliting]'].
INFO:absl:Component StatisticsGen.Statistics_Generation is scheduled.
INFO:absl:Component SchemaGen depends on ['Run[StatisticsGen.Statistics_Generation]'].
INFO:absl:Component SchemaGen is scheduled.
INFO:absl:Component ExampleValidator.Data_Validation depends on ['Run[StatisticsGen.Statistics_Generation]', 'Run[SchemaGen]'].
INFO:absl:Component ExampleValidator.Data_Validation is scheduled.
INFO:absl:Component Transform depends on ['Run[SchemaGen]', 'Run[CsvExampleGen.Data_Extraction_Spliting]'].
INFO:absl:Component Transform is scheduled.
INFO:absl:Component Trainer depend

Instructions for updating:
Use eager execution and: 
`tf.data.TFRecordDataset(path)`


INFO:absl:Processing schema from statistics for split eval.
INFO:absl:Schema written to /home/jupyter/pipeline-root/covertype-end-to-end/SchemaGen/schema/5/schema.pbtxt.
INFO:absl:Running publisher for SchemaGen
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component SchemaGen is finished.
INFO:absl:Component Transform is running.
INFO:absl:Running driver for Transform
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Transform


Instructions for updating:
Schema is a deprecated, use schema_utils.schema_from_feature_spec to create a `Schema`


INFO:absl:We decided to produce LargeList and LargeBinary types.
INFO:absl:We decided to produce LargeList and LargeBinary types.
INFO:absl:We decided to produce LargeList and LargeBinary types.
INFO:absl:Feature Aspect has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Cover_Type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Elevation has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_3pm has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_9am has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_Noon has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Fire_Points has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Hydrology has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Roadways has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Slope has no shape. Setting to VarLenSparseTensor

Instructions for updating:
Use ref() instead.


INFO:absl:Feature Aspect has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Cover_Type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Elevation has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_3pm has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_9am has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Hillshade_Noon has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Fire_Points has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Hydrology has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Horizontal_Distance_To_Roadways has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Slope has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Soil_Type has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Vertical_Distance_To_Hydrology has no shape. Setting to VarLenSparseTensor.
INFO:absl:Feature Wilderne



  child 0, item: int64
Elevation: large_list<item: int64>
  child 0, item: int64
Hillshade_3pm: large_list<item: int64>
  child 0, item: int64
Hillshade_9am: large_list<item: int64>
  child 0, item: int64
Hillshade_Noon: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Fire_Points: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Roadways: large_list<item: int64>
  child 0, item: int64
Slope: large_list<item: int64>
  child 0, item: int64
Soil_Type: large_list<item: large_binary>
  child 0, item: large_binary
Vertical_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Wilderness_Area: large_list<item: large_binary>
  child 0, item: large_binary, tensor_representations={'Hillshade_9am': varlen_sparse_tensor {
  column_name: "Hillshade_9am"
}
, 'Aspect': varlen_sparse_tensor {
  column_name: "Aspect"
}
, 'Horizontal_Distance_To_Hydrology': varlen_s

Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /home/jupyter/pipeline-root/covertype-end-to-end/Transform/transform_graph/6/.temp_path/tftransform_tmp/40eb5f4f2ce344199a38c0b93b5b85a9/saved_model.pb
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
'Counter' object has no attribute 'name'
INFO:tensorflow:SavedModel written to: /home/jupyter/pipeline-root/covertype-end-to-end/Transform/transform_graph/6/.temp_path/tftransform_tmp/2dd12b6929e14c93ad3f5eecda06ef88/saved_model.pb


  name: "Aspect"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Cover_Type"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Elevation"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Hillshade_3pm"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Hillshade_9am"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Hillshade_Noon"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
    min_count: 1
  }
}
feature {
  name: "Horizontal_Distance_To_Fire_Points"
  value_count {
    min: 1
    max: 1
  }
  type: INT
  presence {
    min_fraction: 1.0
   



  child 0, item: int64
Cover_Type: large_list<item: int64>
  child 0, item: int64
Elevation: large_list<item: int64>
  child 0, item: int64
Hillshade_3pm: large_list<item: int64>
  child 0, item: int64
Hillshade_9am: large_list<item: int64>
  child 0, item: int64
Hillshade_Noon: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Fire_Points: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Roadways: large_list<item: int64>
  child 0, item: int64
Slope: large_list<item: int64>
  child 0, item: int64
Soil_Type: large_list<item: large_binary>
  child 0, item: large_binary
Vertical_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Wilderness_Area: large_list<item: large_binary>
  child 0, item: large_binary, tensor_representations={'Cover_Type': varlen_sparse_tensor {
  column_name: "Cover_Type"
}
, 'Soil_Type': varlen_sparse_tensor {
  column_name: "



  child 0, item: int64
Cover_Type: large_list<item: int64>
  child 0, item: int64
Elevation: large_list<item: int64>
  child 0, item: int64
Hillshade_3pm: large_list<item: int64>
  child 0, item: int64
Hillshade_9am: large_list<item: int64>
  child 0, item: int64
Hillshade_Noon: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Fire_Points: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Horizontal_Distance_To_Roadways: large_list<item: int64>
  child 0, item: int64
Slope: large_list<item: int64>
  child 0, item: int64
Soil_Type: large_list<item: large_binary>
  child 0, item: large_binary
Vertical_Distance_To_Hydrology: large_list<item: int64>
  child 0, item: int64
Wilderness_Area: large_list<item: large_binary>
  child 0, item: large_binary, tensor_representations={'Hillshade_9am': varlen_sparse_tensor {
  column_name: "Hillshade_9am"
}
, 'Aspect': varlen_sparse_tensor {
  column_name

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:Assets written to: /home/jupyter/pipeline-root/covertype-end-to-end/Transform/transform_graph/6/.temp_path/tftransform_tmp/5790db65ed174aff878bab9e60ebdf9e/assets
INFO:tensorflow:SavedModel written to: /home/jupyter/pipeline-root/covertype-end-to-end/Transform/transform_graph/6/.temp_path/tftransform_tmp/5790db65ed174aff878bab9e60ebdf9e/saved_model.pb
value: "\n\013\n\tConst_3:0\022\017Wilderness_Area"

value: "\n\013\n\tConst_5:0\022\tSoil_Type"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
value: "\n\013\n\tConst_3:0\022\017Wilderness_Area"

value: "\n\013\n\tConst_5:0\022\tSoil_Type"

INFO:tensorflow:Saver not created because there are no variables in the graph to restore
value: "\n\013\n\tConst_3:0\022\017Wild

INFO:absl:Running publisher for Transform
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Transform is finished.
INFO:absl:Component Trainer is running.
INFO:absl:Running driver for Trainer
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Trainer
INFO:absl:Train on the 'train' split when train_args.splits is not set.
INFO:absl:Evaluate on the 'eval' split when eval_args.splits is not set.
INFO:absl:Training model.
INFO:absl:Model: "functional_1"
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Layer (type)                    Output Shape         Param #     Connected to                     
INFO:absl:Aspect_xf (InputLayer)          [(None,)]            0                                            
INFO:absl:__________________________________________________________________________________________________
INFO:absl:Elevation_xf (InputLayer)       [(None,)

Instructions for updating:
use `tf.profiler.experimental.stop` instead.
5000/5000 - 26s - loss: 0.7066 - sparse_categorical_accuracy: 0.7040 - val_loss: 0.6493 - val_sparse_categorical_accuracy: 0.7203
INFO:tensorflow:Saver not created because there are no variables in the graph to restore
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /home/jupyter/pipeline-root/covertype-end-to-end/Trainer/model/7/serving_model_dir/assets


INFO:absl:Training complete. Model written to /home/jupyter/pipeline-root/covertype-end-to-end/Trainer/model/7/serving_model_dir. ModelRun written to /home/jupyter/pipeline-root/covertype-end-to-end/Trainer/model_run/7
INFO:absl:Running publisher for Trainer
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Trainer is finished.
INFO:absl:Component Evaluator is running.
INFO:absl:Running driver for Evaluator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for Evaluator
INFO:absl:Request was made to ignore the baseline ModelSpec and any change thresholds. This is likely because a baseline model was not provided: updated_config=
model_specs {
  label_key: "Cover_Type"
}
slicing_specs {
}
slicing_specs {
  feature_keys: "Wilderness_Area"
}
metrics_specs {
  metrics {
    class_name: "SparseCategoricalAccuracy"
    threshold {
      value_threshold {
        lower_bound {
          value: 0.5
        }
        upper_bound {
        



INFO:absl:Evaluation complete. Results written to /home/jupyter/pipeline-root/covertype-end-to-end/Evaluator/evaluation/8.
INFO:absl:Checking validation results.
INFO:absl:Blessing result True written to /home/jupyter/pipeline-root/covertype-end-to-end/Evaluator/blessing/8.
INFO:absl:Running publisher for Evaluator
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Component Evaluator is finished.
INFO:absl:Component ExampleValidator.Data_Validation is running.
INFO:absl:Running driver for ExampleValidator.Data_Validation
INFO:absl:MetadataStore with DB connection initialized
INFO:absl:Running executor for ExampleValidator.Data_Validation
INFO:absl:Validating schema against the computed statistics for split train.
INFO:absl:Validation complete for split train. Anomalies written to /home/jupyter/pipeline-root/covertype-end-to-end/ExampleValidator.Data_Validation/anomalies/9/train.
INFO:absl:Validating schema against the computed statistics for split eval.
INFO:absl:Validat

### Examine the artifacts created by the pipeline run

#### Visualize statistics

In [32]:
with metadata.Metadata(connection_config) as store:
    stats_artifacts = store.get_artifacts_by_type(ExampleStatistics.TYPE_NAME)

stats_path = stats_artifacts[-1].uri
train_stats_file = os.path.join(stats_path, 'train', 'stats_tfrecord')
eval_stats_file = os.path.join(stats_path, 'eval', 'stats_tfrecord')
train_stats = tfdv.load_statistics(train_stats_file)
eval_stats = tfdv.load_statistics(eval_stats_file)
tfdv.visualize_statistics(lhs_statistics=eval_stats, rhs_statistics=train_stats,
                          lhs_name='EVAL_DATASET', rhs_name='TRAIN_DATASET')

INFO:absl:MetadataStore with DB connection initialized


#### Visualize model evaluations

In [33]:
with metadata.Metadata(connection_config) as store:
    model_eval_artifacts = store.get_artifacts_by_type(ModelEvaluation.TYPE_NAME)

model_eval_path = model_eval_artifacts[-1].uri
eval_result = tfma.load_eval_result(model_eval_path)
tfma.view.render_slicing_metrics(
    eval_result, slicing_column='Wilderness_Area')

INFO:absl:MetadataStore with DB connection initialized


SlicingMetricsViewer(config={'weightedExamplesColumn': 'example_count'}, data=[{'slice': 'Wilderness_Area:Cach…

## Submit a pipeline run to Managed Pipelines

### Create custom docker image

Write the Dockerfile.

In [47]:
%%writefile Dockerfile
FROM gcr.io/caip-pipelines-assets/tfx:latest
WORKDIR /pipeline
COPY ./*.py ./
ENV PYTHONPATH="/pipeline:${PYTHONPATH}"

Overwriting Dockerfile


Create the Skaffold build configuration.

In [48]:
TAG = 'latest'
SK_TEMPLATE = "{{{{.IMAGE_NAME}}}}:{}".format(TAG)
CUSTOM_IMAGE = f'gcr.io/{PROJECT_ID}/caip-tfx-custom-demo'

skaffold_template = f"""
apiVersion: skaffold/v2beta3
kind: Config
metadata:
  name: my-pipeline
build:
  artifacts:
  - image: '{CUSTOM_IMAGE}'
    context: .
    docker:
      dockerfile: Dockerfile
  tagPolicy:
    envTemplate:
      template: "{{SK_TEMPLATE}}"
"""

with open('skaffold.yaml', 'w') as f:
    f.write(skaffold_template.format(**globals()))

!cat skaffold.yaml


apiVersion: skaffold/v2beta3
kind: Config
metadata:
  name: my-pipeline
build:
  artifacts:
  - image: 'gcr.io/mlops-dev-env/caip-tfx-custom-demo'
    context: .
    docker:
      dockerfile: Dockerfile
  tagPolicy:
    envTemplate:
      template: "{{.IMAGE_NAME}}:latest"


Build and push the image

In [49]:
!skaffold build 

Generating tags...
 - gcr.io/mlops-dev-env/caip-tfx-custom-demo -> WARN[0000] {{.IMAGE_NAME}} is deprecated, envTemplate's template should only specify the tag value. See https://skaffold.dev/docs/pipeline-stages/taggers/ 
gcr.io/mlops-dev-env/caip-tfx-custom-demo:latest
Checking cache...
 - gcr.io/mlops-dev-env/caip-tfx-custom-demo: Not found. Building
Building [gcr.io/mlops-dev-env/caip-tfx-custom-demo]...
Sending build context to Docker daemon  14.85kB
Step 1/4 : FROM gcr.io/caip-pipelines-assets/tfx:latest
 ---> 19c14dda2bb5
Step 2/4 : WORKDIR /pipeline
 ---> Using cache
 ---> 73280faa1be8
Step 3/4 : COPY ./*.py ./
 ---> Using cache
 ---> 1157ab40c72a
Step 4/4 : ENV PYTHONPATH="/pipeline:${PYTHONPATH}"
 ---> Running in 163d88f504ea
 ---> 310c1af00c54
Successfully built 310c1af00c54
Successfully tagged gcr.io/mlops-dev-env/caip-tfx-custom-demo:latest
The push refers to repository [gcr.io/mlops-dev-env/caip-tfx-custom-demo]

ed8503093cd8: Preparing 
096a31859079: Preparing 
040e39b76

### Create the pipeline

In [50]:
pipeline_name = 'covertype-end-to-end'
pipeline_root = f'{GCS_PIPELINE_ROOT}/{pipeline_name}'

components = [example_gen, 
              statistics_gen, 
              schema_gen,
              example_validator,
              transform,
              trainer,
              model_resolver,
              model_analyzer,
              pusher]

pipeline = Pipeline(
    pipeline_name=pipeline_name,
    pipeline_root=pipeline_root,
    components=components
)

### Configure Manage Pipelines runner

In [51]:
pipeline_display_name = 'Covertype Classifier Training'

runner = ai_platform_pipelines_dag_runner.AIPlatformPipelinesDagRunner(
    config=ai_platform_pipelines_dag_runner.AIPlatformPipelinesDagRunnerConfig(
        project_id=PROJECT_ID,
        display_name=pipeline_display_name,
        default_image=CUSTOM_IMAGE))

### Submit run

In [52]:
api_key = ''

runner.run(pipeline, api_key=api_key)

INFO:absl:Compiled JSON request: {"name": "projects/mlops-dev-env/pipelineJobs/covertype-end-to-end_20200901042132", "displayName": "Covertype Classifier Training", "spec": {"pipelineContext": "covertype-end-to-end", "steps": {"ResolverNode.latest_blessed_model_resolver": {"resolver": {"resolverPolicy": "LATEST_BLESSED_MODEL"}, "cachePolicy": {}}, "ExampleValidator.Data_Validation": {"task": {"inputs": {"schema": {"stepOutput": {"step": "SchemaGen", "output": "schema"}}, "statistics": {"stepOutput": {"step": "StatisticsGen.Statistics_Generation", "output": "statistics"}}}, "executionProperties": {"exclude_splits": {"stringValue": "[]"}}, "outputs": {"anomalies": {"artifact": {"customProperties": {"custom:pipeline_name": {"stringValue": "covertype-end-to-end"}, "tfx_type": {"stringValue": "tfx.types.standard_artifacts.ExampleAnomalies"}, "type_name": {"stringValue": "ExampleAnomalies"}, "custom:producer_component": {"stringValue": "ExampleValidator.Data_Validation"}, "custom:name": {"st

Execution triggered. Job name: projects/mlops-dev-env/pipelineJobs/covertype-end-to-end_20200901042132


'projects/mlops-dev-env/pipelineJobs/covertype-end-to-end_20200901042132'

## License

<font size=-1>Licensed under the Apache License, Version 2.0 (the \"License\");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [https://www.apache.org/licenses/LICENSE-2.0](https://www.apache.org/licenses/LICENSE-2.0)

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the License for the specific language governing permissions and limitations under the License.</font>