# Model analysis using TFX Pipeline and TensorFlow Model Analysis 

As you tweak your model or train it with a new dataset, you need to check whether your model has improved or become worse. 

Just checking top-level metrics like accuracy might not be enough

Every trained model should be evaluated before it is pushed to production

We will add an **Evaluator** component to the pipeline created in the previous tutorial. The evaluator component performs deep analysis for your models and compare the new model against a baseline to determine they are "good enough". 

The **Evaluator** is implemented using the TensorFlow Model Analysis library.

In [1]:
import tensorflow as tf
print('TensorFlow version: {}'.format(tf.__version__))
from tfx import v1 as tfx
print('TFX version: {}'.format(tfx.__version__))

TensorFlow version: 2.8.1
TFX version: 1.7.1


### Set up variables

In [2]:
import os

PIPELINE_NAME = "penguin-tfma"

# Output directory to store artifacts generated from the pipeline.
PIPELINE_ROOT = os.path.join('pipelines', PIPELINE_NAME)
# Path to a SQLite DB file to use as an MLMD storage.
METADATA_PATH = os.path.join('metadata', PIPELINE_NAME, 'metadata.db')
# Output directory where created models from the pipeline will be exported.
SERVING_MODEL_DIR = os.path.join('serving_model', PIPELINE_NAME)

from absl import logging
logging.set_verbosity(logging.INFO)  # Set default logging level

### Prepare example data

We will use the same Palmer Penguins dataset. 

There are four numeric features in this dataset which are already normalized to have range [0,1]. We will build a classification model which predicts the species of the penguins. 

Because TFX ExampleGen reads inputs from a directory, we need to create a directory and copy dataset to it.


In [3]:
import urllib.request
import tempfile

DATA_ROOT = tempfile.mkdtemp(prefix='tfx-data')  # Create a temporary directory.
_data_url = 'https://raw.githubusercontent.com/tensorflow/tfx/master/tfx/examples/penguin/data/labelled/penguins_processed.csv'
_data_filepath = os.path.join(DATA_ROOT, "data.csv")
urllib.request.urlretrieve(_data_url, _data_filepath)

('/tmp/tfx-dataog_5ulgf/data.csv', <http.client.HTTPMessage at 0x7fdfb699d6a0>)

### Create a pipeline

We will add an **Evaluator** component to the pipeline we created in the Simple TFX Pipeline Tutorial.