# Model Analysis and Validation

The model analysis component performs deep analysis of the training results and helps you validate your exported models, ensuring that they are "good enough" to be pushed to production.

This evaluation step helps guarantee that the model is promoted for serving only if it satisfies the quality criteria. The criteria include improved performance compared to previous models and fair performance on various data subsets. The output of this step is a set of performance metrics (our evaluation metric F1score for all parts/operations) and a decision on whether to promote the model to production.

# Installing Libraries & Dependencies

In [1]:
#!pip3 install -r requirements.txt

In [2]:
!cat requirements.txt

tensorflow==2.2.0
tensorflow_model_analysis==0.22.0
apache_beam[gcp]==2.20.0
pyarrow==0.16.0
tfx-bsl==0.22.0
google-cloud-storage==1.28.0

In [3]:
import warnings
warnings.filterwarnings('ignore')
warnings.filterwarnings('ignore', category=DeprecationWarning)
warnings.filterwarnings('ignore', category=FutureWarning)

import logging
logging.getLogger('tensorflow').setLevel(logging.ERROR)

# TFMA

When the model is exported after the training step, it must be evaluated on the test dataset to assess the model quality before deciding whether the model should be deployed. 

TensorFlow Model Analysis (TFMA), is a library for evaluating TensorFlow models. It allows us to evaluate our models in a distributed manner, using the same metrics defined in our trainer. These metrics can be computed over different slices/segments (countries, engine types, symptoms, error codes, …) of data and visualized. We also track model performance over time so that we can be aware of and react to changes.

# Enable TFMA visualization in Jupyter Notebook: 

If running in a local Jupyter notebook, then these Jupyter extensions must be installed in the environment before running Jupyter. 

We don't need to install the Jupyter extensions inside the Kubeflow model analysis component.

Note: If Jupyter notebook is already installed in our home directory, add --user to these commands. If Jupyter is installed as root, or using a virtual environment, the parameter --sys-prefix might be required.

In [4]:
#!jupyter nbextension enable --py widgetsnbextension --user
#!jupyter nbextension install --py --symlink tensorflow_model_analysis --user
#!jupyter nbextension enable tensorflow_model_analysis --user --py 

# Importing Librarires

In [5]:
import json
import os
import logging
import re
import pandas as pd
import tensorflow as tf
import tensorflow_model_analysis as tfma
import apache_beam as beam

tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

from tensorflow.python.lib.io import file_io
from prettytable import PrettyTable
from ipywidgets.embed import embed_data
from io import BytesIO

import pkg_resources
from google.cloud import storage

print('TF version: {}'.format(tf.__version__))
print('TFMA version: {}'.format(pkg_resources.get_distribution("tensorflow_model_analysis").version))
print('INFO: Beam version -- {}'.format(pkg_resources.get_distribution("apache_beam").version))
print('INFO: Pyarrow version -- {}'.format(pkg_resources.get_distribution("pyarrow").version))


TF version: 2.2.0
TFMA version: 0.22.0
INFO: Beam version -- 2.20.0
INFO: Pyarrow version -- 0.16.0


# Input Arguments

Example of input arguments for the model analysis component

In [40]:
PROJECT = "irn-70656-dev-1307100302"
REGION = 'europe-west1'
BUCKET = "bike-sharing-pipeline-metadata"
PIPELINE_VERSION = "v0_1"
DATA_VERSION = "200909_154702"
MODEL_VERSION = "200909_163139"
TRIAL_ID = None
RUNNER = "DirectRunner" # DirectRunner or DataflowRunner to run on Dataflow

# Setting Paths 

Setting up some globals for the gcs files

In [None]:
# Set up some globals for gcs file
HANDLER = 'gs://' # ../ for local data, gs:// for cloud data

BASE_DIR = os.path.join(HANDLER, BUCKET, PIPELINE_VERSION)
RUN_DIR = os.path.join(BASE_DIR, 'run', DATA_VERSION)
DATA_DIR = os.path.join(RUN_DIR, 'data_transform')
MODEL_DIR = os.path.join(RUN_DIR, 'model_training', MODEL_VERSION, str(TRIAL_ID) if TRIAL_ID is not None else "")
OUTPUT_DIR = os.path.join(RUN_DIR, 'model_analysis', MODEL_VERSION)

In [45]:
# Set up some globals for gcs file
HANDLER = 'gs://' # ../ for local data, gs:// for cloud data

BASE_DIR = HANDLER + BUCKET+'/'+PIPELINE_VERSION
RUN_DIR = BASE_DIR+'/run/'+DATA_VERSION
DATA_DIR = RUN_DIR+'/data_transform'
MODEL_DIR = RUN_DIR+'/model_training/' + MODEL_VERSION + ('/' + str(TRIAL_ID) if TRIAL_ID is not None else "") 
OUTPUT_DIR = RUN_DIR+'/model_analysis/' + MODEL_VERSION

In [9]:
TEST_PATH = DATA_DIR+'/test*'

In [47]:
# Features, labels, and key columns
NUMERIC_FEATURE_KEYS=["temp", "atemp", "humidity", "windspeed"] 
CATEGORICAL_FEATURE_KEYS=["season", "weather", "daytype"] 
KEY_COLUMN = "datetime"
LABEL_COLUMN = "count"

def transformed_name(key):
    return key 

# Functions

In [10]:
def list_dirs(path, pattern):
    """Function that returns all files in GCS directory corresponding to some pattern."""

    runs = tf.io.gfile.listdir(path)
    runs = [re.sub('/', '', run) for run in runs]
    runs = [re.match(pattern, run).group(0)
            for run in runs if re.match(pattern, run) != None]
    runs.sort()

    return runs

# 2- Evaluation slices

## 2.1- Specify model to use for evaluation

Now, we can specify the EvalSavedModel we saved previously in order to use it for evaluation.

In [11]:
# Specify model to use for evaluation
eval_saved_model_path = MODEL_DIR+'/eval_saved_model/'
eval_saved_model_path = eval_saved_model_path + list_dirs(eval_saved_model_path, '(\d)+')[-1]

eval_shared_model = tfma.default_eval_shared_model(eval_saved_model_path)

In [12]:
def copy_gcs_dir(bucket_name, source_dir, destination_dir, mute=False):
    """Copies a GCS directory with all its blobs.
    
    Args:
        bucket_name (string): bucket name
        source_dir: path to directory containing the blobs to copy (without gs://{bucket_name}/)
        destination_dir: path directory where to copy blobs (without gs://{bucket_name}/)
    """

    # Create storage client and set bucket
    storage_client = storage.Client()
    storage_bucket = storage_client.bucket(bucket_name)

    # List all blobs in source directory
    generic_source_dir = source_dir.replace('gs://'+bucket_name+'/', '')
    generic_destination_dir = destination_dir.replace('gs://'+bucket_name+'/', '')

    blobs = storage_client.list_blobs(bucket_name, prefix=generic_source_dir)

    # Copy all blobs from source_dir to destination_dir
    for blob in blobs:

        # Define path inside destination directory
        short_destination = blob.name.replace(generic_source_dir, '')

        # Copy blob
        blob_copy = storage_bucket.copy_blob(
            blob, storage_bucket, generic_destination_dir+'/'+short_destination)

        if not mute:
            print("INFO: Blob {} in bucket {} copied to blob {} in bucket {}.\n".format(
                blob.name, bucket_name, blob_copy.name, bucket_name))


In [13]:
# copy eval_saved_model to output_dir
copy_gcs_dir(BUCKET, eval_saved_model_path+'/', OUTPUT_DIR+'/eval_saved_model', mute=False)

INFO: Blob v0_1/run/200909_154702/model_training/200909_163139/eval_saved_model/1599669181/ in bucket bike-sharing-pipeline-metadata copied to blob v0_1/run/200909_154702/model_analysis/200909_163139/eval_saved_model/ in bucket bike-sharing-pipeline-metadata.

INFO: Blob v0_1/run/200909_154702/model_training/200909_163139/eval_saved_model/1599669181/assets/ in bucket bike-sharing-pipeline-metadata copied to blob v0_1/run/200909_154702/model_analysis/200909_163139/eval_saved_model/assets/ in bucket bike-sharing-pipeline-metadata.

INFO: Blob v0_1/run/200909_154702/model_training/200909_163139/eval_saved_model/1599669181/assets/daytype in bucket bike-sharing-pipeline-metadata copied to blob v0_1/run/200909_154702/model_analysis/200909_163139/eval_saved_model/assets/daytype in bucket bike-sharing-pipeline-metadata.

INFO: Blob v0_1/run/200909_154702/model_training/200909_163139/eval_saved_model/1599669181/assets/season in bucket bike-sharing-pipeline-metadata copied to blob v0_1/run/20090

## 2.2- Defining slices of evaluation

To define the slice you want to visualize you create a `tfma.slicer.SingleSliceSpec`. 

Let's create a whole list of SliceSpecs, which will allow us to select any of the slices in the list. The list includes all the catagorical variables as well as variable length catagorical variables. With TFMA, we can also create feature crosses to analyze combinations of features. Let's add to the list two SliceSpec to look at a cross of `engineType` and `engineIndex`, as well as at a cross of `gearboxType` and `gearboxIndex`.

In [14]:
feature_slices = [[x] for x in CATEGORICAL_FEATURE_KEYS]

# Defining slices of evaluation
# An empty spec is required for the 'Overall' slice 
slices = [tfma.slicer.SingleSliceSpec()] \
       + [tfma.slicer.SingleSliceSpec(columns=x) for x in feature_slices]

## 2.3- Write tfma evaluation results

### Method 1: using run_model_analysis

We can now use `tfma.run_model_analysis` for evaluation. Since this uses Beam's local runner, it's mainly for local, small-scale experimentation. 

`tfma.run_model_analysis` requires an EvalSavedModel, and a list SliceSpecs.

It will create an EvalResult, and use it to create a SlicingMetricsViewer using tfma.view.render_slicing_metrics, which will render a visualization of our dataset using the slice we created.

In [15]:
# This assumes your data is a TFRecords file containing records in the format
# your model is expecting, e.g. tf.train.Example if you're using
# tf.parse_example in your model.

#eval_dir = OUTPUT_DIR+ \
#    '/eval_result'

#eval_result = tfma.run_model_analysis(
#    eval_shared_model=eval_shared_model,
#    data_location=TEST_PATH,
#    file_format='tfrecords',
#    slice_spec=slices,
#    output_path=eval_dir
#)

### Method 2: Apache Beam pipeline

Apache Beam is required to run distributed analysis. By default, Apache Beam runs in local mode but can also run in distributed mode using **Google Cloud Dataflow**.

For **distributed evaluation**, construct an Apache Beam pipeline using a distributed runner. In the pipeline, use the tfma.ExtractEvaluateAndWriteResults for evaluation and to write out the results. The results can be loaded for visualization using tfma.load_eval_result. 

In [16]:
eval_dir = OUTPUT_DIR+ \
    '/eval_result'

with beam.Pipeline(runner='DirectRunner') as p:
    _ = (p
        # You can change the source as appropriate, e.g. read from BigQuery.
        | 'ReadData' >> beam.io.ReadFromTFRecord(TEST_PATH)
        | 'ExtractEvaluateAndWriteResults' >>
        tfma.ExtractEvaluateAndWriteResults(
            eval_shared_model=eval_shared_model,
            output_path=eval_dir,
            compute_confidence_intervals=False,
            slice_spec=slices
        ))

print("INFO: TFMA Evaluation Job exported to {}".format(eval_dir))

# Load slices evaluation results
eval_result = tfma.load_eval_result(eval_dir)
print("INFO: TFMA Evaluation Results loaded from {}".format(eval_result))



INFO: TFMA Evaluation Job exported to gs://bike-sharing-pipeline-metadata/v0_1/run/200909_154702/model_analysis/200909_163139/eval_result
INFO: TFMA Evaluation Results loaded from EvalResult(slicing_metrics=[((), {'': {'': {'rmse': {'doubleValue': 339.1432800292969}, 'average_loss': {'doubleValue': 115018.15625}, 'label/mean': {'doubleValue': 260.27410888671875}, 'post_export_metrics/example_count': {'doubleValue': 2178.0}, 'mae': {'doubleValue': 247.89231872558594}, 'prediction/mean': {'doubleValue': 0.11727461218833923}}}}), ((('season', 'fall'),), {'': {'': {'average_loss': {'doubleValue': 139115.953125}, 'label/mean': {'doubleValue': 289.0715026855469}, 'post_export_metrics/example_count': {'doubleValue': 811.0}, 'mae': {'doubleValue': 288.7800598144531}, 'prediction/mean': {'doubleValue': 0.2914578914642334}, 'rmse': {'doubleValue': 372.9825134277344}}}}), ((('weather', 'clear'),), {'': {'': {'prediction/mean': {'doubleValue': 0.13379088044166565}, 'rmse': {'doubleValue': 360.0813

After creating our EvalResult, we can use it to create a SlicingMetricsViewer using `tfma.view.render_slicing_metrics`, which will render a visualization of our dataset using the slices we created.

To show the overall evaluation metrics viewer, we can simply use:

In [17]:
tfma.view.render_slicing_metrics(eval_result)

SlicingMetricsViewer(config={'weightedExamplesColumn': 'post_export_metrics/example_count'}, data=[{'slice': '…

To use tfma.view.render_slicing_metrics over a specified slice, we can either use the name of the column (by setting slicing_column) or provide a tfma.slicer.SingleSliceSpec (by setting slicing_spec).

In [18]:
tfma.view.render_slicing_metrics(eval_result, slicing_column='season')

SlicingMetricsViewer(config={'weightedExamplesColumn': 'post_export_metrics/example_count'}, data=[{'slice': '…

In [19]:
tfma.view.render_slicing_metrics(eval_result, slicing_spec=tfma.slicer.SingleSliceSpec(columns=['weather']))

SlicingMetricsViewer(config={'weightedExamplesColumn': 'post_export_metrics/example_count'}, data=[{'slice': '…

<span style="color:red">Note that widgets are not visualized correctly in JupyterLab, But we will be able to see them later using HTML</span>

# 3- Tracking Model Performance Over Time

Our training dataset will be used for training our model, and will hopefully be representative of our test dataset and the inference data that will be sent to our model in production. However, in many cases the distribution of the new data will start to change enough so that the performance of our model will start to decay with time.

That means that we need to monitor and measure our model's performance on an ongoing basis, so that you can be aware of and react to changes.

Model drift can occur when there is some form of change to feature data or target dependencies. We can broadly classify these changes into the following three categories: concept drift, data drift, and upstream data changes.

* **Concept Drift**: When statistical properties of the target variable change, the very concept of what you are trying to predict changes as well. 

* **Data Drift**: The features used to train a model are selected from the input data. When statistical properties of this input data change, it will have a downstream impact on the model’s quality. 

* **Upstream Data Changes**: Sometimes there can be operational changes in the data pipeline upstream which could have an impact on the model quality. For example, changes to feature encoding such as switching from Fahrenheit to Celsius 

Given that there will be such changes after a model is deployed to production, your best course of action is to monitor for changes and take action when changes occur. Having a feedback loop from a monitoring system, and refreshing models over time, will help avoid model staleness.

![](./model_decay.jpg)

At every pipeline run, the trained model is evaluated in the model analysis (overall evaluation and slices evaluation) component using TFMA and the results are saved to `eval_slices_result_dir`.

At each pipeline run, the EvalResult is saved to the following directory:

gs://[bucket]/[family_id]/[pipeline_version]/run/[preproc_version]/analysis/tfma/eval_slices_result/
eval_slices_result__[pipeline_version]\_\_[preproc_version]\_\_[hypertune_version].

It doesn't make sense to compare models with different pipeline versions, as these models could have completely different objectives, labels... We will be comparing only all models that has the same pipeline version of our current trained model.

Get list of all evaluation results EvalResult of previous evaluated models within the same pipeline version automatically from GCS directory.

In [20]:
print("INFO: Comparing with previous trial results.")

# Get list of all runs within the same pipeline version
eval_dirs = []
runs = list_dirs(BASE_DIR+'/run', '(\d){6}_(\d){6}')

# Get list of all evaluation results (only best hypertune trials)
for run in runs:
    run_eval_dir = BASE_DIR+'/run/'+run+'/model_analysis/'
    models = list_dirs(run_eval_dir, '(\d){6}_(\d){6}')
    for model in models: 
        model_eval_dir = run_eval_dir + model + '/eval_result'
        if tf.io.gfile.exists(model_eval_dir + '/eval_config.json'):
            eval_dirs = eval_dirs + [model_eval_dir]

INFO: Comparing with previous trial results.


In [21]:
eval_dirs

['gs://bike-sharing-pipeline-metadata/v0_1/run/200909_154702/model_analysis/200909_163139/eval_result']

We can load the list of historical evaluation results by providing their file paths to `tfma.load_eval_results`.

In [22]:
# Load best trial evaluation results
eval_results = tfma.load_eval_results(
    eval_dirs,
    tfma.constants.MODEL_CENTRIC_MODE
)

To show the overall time series of all provided evaluation results, we can simply use:

In [23]:
tfma.view.render_time_series(eval_results)

TimeSeriesViewer(config={'isModelCentric': True}, data=[{'metrics': {'': {'': {'post_export_metrics/example_co…

<span style="color:red">Note that widgets are not visualized correctly in JupyterLab, But we will be able to see them when embedded into HTML</span>

# 4- How does it look today?

The problem with the above approach is that we are comparing evaluation results of all our historical models in the same pipeline version already done over their corresponding old test datasets. Each model is evaluated over it's own set of test dataset that was created at the same time our old model is created. 

Instead, we need to compare our models on the same new test dataset, in order to compare how all models are doing today on recent data. However, for this particular use case, we have 2 main challenges to address in order to do that.
* We need to compare deployed models only, a presence of an EvalSavedModel or EvalResult path doesn't necessarily mean that this particular model is deployed.
* We need to adapt our new dataset to old models, removing target labels that were not taken into consideration with our old models.

In [24]:
print("INFO: Measuring performance on new test data.")

def get_eval_result(eval_saved_model_dir, output_dir, data_dir, slice_spec):
    """Runs tfma model analysis locally to compute slicing metrics."""

    eval_shared_model = tfma.default_eval_shared_model(eval_saved_model_path=eval_saved_model_dir)

    return tfma.run_model_analysis(
        eval_shared_model=eval_shared_model,
        data_location=data_dir,
        file_format='tfrecords',
        slice_spec=slice_spec,
        output_path=output_dir,
        extractors=None)

INFO: Measuring performance on new test data.


In [25]:
print("INFO: Comparing with previous trial results.")

eval_today_dirs = []
eval_today_results_dict = {}

# Get list of all runs within the same pipeline version
runs = list_dirs(BASE_DIR+'/run', '(\d){6}_(\d){6}')

# Get list of all evaluation results (only best hypertune trials)
for run in runs:
    
    run_eval_saved_model_dir = BASE_DIR+'/run/'+run+'/model_analysis/'
    models = list_dirs(run_eval_saved_model_dir, '(\d){6}_(\d){6}')
    for model in models: 
        model_eval_saved_model_dir = run_eval_saved_model_dir + model + '/eval_saved_model'
        if tf.io.gfile.exists(model_eval_saved_model_dir + '/saved_model.pb'):
            
            run_eval_today_dir = OUTPUT_DIR + '/eval_today_result/eval_result_{}_{}'.format(run, model)
            eval_today_results_dict[run, model] = get_eval_result(model_eval_saved_model_dir, run_eval_today_dir, TEST_PATH, slices)               
            eval_today_dirs = eval_today_dirs + [run_eval_today_dir]
     



INFO: Comparing with previous trial results.


Load evaluation results

In [26]:
eval_today_dirs

['gs://bike-sharing-pipeline-metadata/v0_1/run/200909_154702/model_analysis/200909_163139/eval_today_result/eval_result_200909_154702_200909_163139']

In [27]:
# Load evaluation results on new test data
eval_today_results = tfma.load_eval_results(
    eval_today_dirs,
    tfma.constants.MODEL_CENTRIC_MODE
)

We can view the time series for evaluations on new data as follow:

In [28]:
tfma.view.render_time_series(eval_today_results)

TimeSeriesViewer(config={'isModelCentric': True}, data=[{'metrics': {'': {'': {'post_export_metrics/example_co…

The list of EvalResult paths would look like this:

In [29]:
from pprint import pprint
pprint(eval_today_dirs)

['gs://bike-sharing-pipeline-metadata/v0_1/run/200909_154702/model_analysis/200909_163139/eval_today_result/eval_result_200909_154702_200909_163139']


# 5- Write Evaluation Results to Artifacts

The display functions of tfma render views as widgets in order to display them in Jupyter Notebook. It would be interesting if we can visualize these results in the Pipelines UI.

The Kubeflow Pipelines UI offers built-in support for several types of visualizations, which we can use for this purpose. An output artifact is an output emitted by a pipeline component, which the Kubeflow Pipelines UI understands and can render as rich visualizations. 

It’s useful for pipeline components to include artifacts so that you can provide for performance evaluation, quick decision making for the run, or comparison across different runs. Artifacts also make it possible to understand how the pipeline’s various components work. An artifact can range from a plain textual view of the data to rich interactive visualizations.

To make use of this programmable UI, our pipeline component must write a JSON file to the component’s local filesystem. We can do this at any point during the pipeline execution.

Available output viewers:
* Confusion matrix 
* Markdown 
* ROC curve
* Table
* TensorBoard
* Web app 

The web-app viewer provides flexibility for **rendering our custom tfma output**. We can specify an HTML file that our component creates, and the Kubeflow Pipelines UI renders that HTML in the output page. 

`tfma.view.render_slicing_metrics` renders the slicing metrics view as widget.

`tfma.view.render_time_series` renders the time series view as widget.

To learn more about widgets, refer to the [following link](https://ipywidgets.readthedocs.io/en/latest/examples/Widget%20Basics.html).

Jupyter notebook widgets support embedding the widget in a static HTML file that can be loaded outside of a Jupyter notebook. Jupyter interactive widgets can be serialized and embedded into **static web pages** using the module `ipywidgets.embed`. The `ipywidgets.embed` module provides several functions for embedding widgets into HTML documents programatically.

We can use the function `embed_minimal_html` to create a simple, stand-alone HTML page. But for greater granularity than that afforded by `embed_minimal_html`; as we would like to control the structure of the HTML document in which the widgets are embedded; we can use `embed_data` to get JSON exports of specific parts of the widget state. 

For more information about how can this be done, refer to the [following link](https://ipywidgets.readthedocs.io/en/latest/embedding.html#python-interface).

In [30]:
# html scripts for analysis artifacts
_STATIC_HTML_TEMPLATE = """
<html>
  <head>
    <title>Slicing Metrics</title>
    <!-- Load RequireJS, used by the IPywidgets for dependency management -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/require.js/2.3.4/require.min.js"
            integrity="sha256-Ae2Vz/4ePdIu6ZyI/5ZGsYnb+m0JlOmKPjt6XZ9JJkA="
            crossorigin="anonymous">
    </script>
    <!-- Load IPywidgets bundle for embedding. -->
    <script src="https://unpkg.com/@jupyter-widgets/html-manager@^*/dist/embed-amd.js"
            crossorigin="anonymous">
    </script>
    
    <!-- Load vulcanized tfma code from prebuilt js file. -->
    <script src="https://raw.githubusercontent.com/tensorflow/model-analysis/v0.22.0/tensorflow_model_analysis/static/vulcanized_tfma.js">
    </script> 
    <!-- Load IPywidgets bundle for embedding. -->
    <script>
      require.config({{
        paths: {{
          "tfma_widget_js": "https://raw.githubusercontent.com/tensorflow/model-analysis/v0.22.0/tensorflow_model_analysis/static/index", 
        }} 
      }});
    </script>
    <!-- The state of all the widget models on the page -->
    <script type="application/vnd.jupyter.widget-state+json">
      {manager_state}
    </script>
  </head>
  <body>
    <h1>Slicing Metrics</h1>
    {slicing_widget_views}
    <h1>Comparison with Past Trial Results</h1>
    {ts_widget_views}
    <h1>How Does It Look Today</h1>
    {ts_today_widget_views}
  </body>
</html>
"""
_SLICING_METRICS_WIDGET_TEMPLATE = """
    <div id="slicing-metrics-widget-{0}">
      <script type="application/vnd.jupyter.widget-view+json">
        {1}
      </script>
    </div>
"""
_TS_METRICS_WIDGET_TEMPLATE = """
    <div id="ts-metrics-widget-{0}">
      <script type="application/vnd.jupyter.widget-view+json">
        {1}
      </script>
    </div>
"""
_TS_TODAY_METRICS_WIDGET_TEMPLATE = """
    <div id="ts-today-metrics-widget-{0}">
      <script type="application/vnd.jupyter.widget-view+json">
        {1}
      </script>
    </div>
"""

We use **embed_data** to get JSON exports of specific parts of the widget state.

In [31]:
def generate_static_html_output(eval_result, eval_results, eval_today_results,
     slicing_specs, html_output_dir):
    """W R I T E  D O C S T R I N G!!"""

    # Slicing Metrics Eval Result
    if slicing_specs is not None:
        slicing_metrics_views = [
          tfma.view.render_slicing_metrics(
              eval_result,
              slicing_spec=slicing_spec)
          for slicing_spec in slicing_specs
        ]
    else:
        slicing_metrics_views = [
          tfma.view.render_slicing_metrics(eval_result)
        ]
        
    # Time series Eval Result
    ts_metrics_view = tfma.view.render_time_series(
            eval_results, display_full_path=False)
    ts_today_metrics_view = tfma.view.render_time_series(
            eval_today_results, display_full_path=False)
        
    slicing_data = embed_data(views=slicing_metrics_views)
    manager_state = json.dumps(slicing_data['manager_state'])
    slicing_widget_views = [json.dumps(view) for view in slicing_data['view_specs']]
    slicing_views_html = ""
    
    for idx, view in enumerate(slicing_widget_views):
        slicing_views_html += _SLICING_METRICS_WIDGET_TEMPLATE.format(idx, view)
    
    ts_data = embed_data(views=ts_metrics_view)
    ts_today_data = embed_data(views=ts_today_metrics_view)
    ts_widget_views = [json.dumps(view) for view in ts_data['view_specs']]
    ts_today_widget_views = [json.dumps(view) for view in ts_today_data['view_specs']]
    ts_views_html = ""
    ts_today_views_html = ""

    for idx, view in enumerate(ts_widget_views):
        ts_views_html += _TS_METRICS_WIDGET_TEMPLATE.format(idx, view)
    for idx, view in enumerate(ts_today_widget_views):
        ts_today_views_html += _TS_TODAY_METRICS_WIDGET_TEMPLATE.format(idx, view)       
    
    rendered_template = _STATIC_HTML_TEMPLATE.format(
        manager_state=manager_state,
        slicing_widget_views=slicing_views_html,
        ts_widget_views=ts_views_html,
        ts_today_widget_views=ts_today_views_html)
    
    _OUTPUT_HTML_FILE = "index.html"

    static_html_path = html_output_dir+'/'+_OUTPUT_HTML_FILE
       
    with open(os.path.join(".", _OUTPUT_HTML_FILE), "wb") as f:
        f.write(rendered_template.encode('utf-8'))

    print("INFO: Writing html artifacts to {}".format(static_html_path))    
    # upload file to gs
    tf.io.gfile.copy(
        _OUTPUT_HTML_FILE,
        static_html_path,
        overwrite=True
    )   

    metadata = {
        'outputs' : [
            {
            'type': 'web-app',
            'storage': 'gcs',
            'source': static_html_path,
            }
        ]
    }
    
    with file_io.FileIO('./mlpipeline-ui-metadata.json', 'w') as f:
        json.dump(metadata, f)

In [32]:
generate_static_html_output(eval_result, eval_results, eval_today_results,
     slices, OUTPUT_DIR)
print("INFO: HTML artifacts generated successfully.")

INFO: Writing html artifacts to gs://bike-sharing-pipeline-metadata/v0_1/run/200909_154702/model_analysis/200909_163139/index.html
INFO: HTML artifacts generated successfully.


# OK or KO

In [33]:
eval_today_results_dict.keys()

dict_keys([('200909_154702', '200909_163139')])

In [34]:
dfPerf = pd.DataFrame(columns=['pipeline_version', 'data_version', 'model_version', 'rmse'])

for version in list(eval_today_results_dict.keys()):
    dfPerf = dfPerf.append({'pipeline_version': PIPELINE_VERSION, 
                          'data_version': version[0], 
                          'model_version': version[1], 
                          'eval_data_version': DATA_VERSION,
                          'rmse': eval_today_results_dict[version][0][0][1]['']['']['rmse']['doubleValue']}, ignore_index=True)

In [35]:
dfPerf

Unnamed: 0,pipeline_version,data_version,model_version,rmse,eval_data_version
0,v0_1,200909_154702,200909_163139,339.14328,200909_154702


In [36]:
if dfPerf.loc[dfPerf.model_version == MODEL_VERSION, "rmse"].item() == min(dfPerf["rmse"]):
    deployment_flag = 'OK'
else: 
    deployment_flag = 'KO'



In [37]:
print(deployment_flag)

OK


In [38]:
dfPerf.to_csv("dfPerf.csv", index=False)

dfPerf_dir = OUTPUT_DIR + '/perf_models_today.csv'

# upload file to gs
tf.io.gfile.copy(
    "dfPerf.csv",
    dfPerf_dir,
    overwrite=True
)   
    

In [39]:
dfPerf_schema = dfPerf.columns.to_list()

metadata = {
    'outputs' : [
        {
        'type': 'table',
        'storage': 'gcs',
        'format': 'csv',
        'header': dfPerf_schema,
        'source': dfPerf_dir
        }
    ]
}

with file_io.FileIO('mlpipeline-ui-metadata.json', 'w') as f:
    json.dump(metadata, f)