# 05Tools: Model Explainability - Example-Based
## IN ACTIVE DEVELOPMENT - NOT COMPLETE

Model explainability helps understand model outputs = predictions.  There are two approaches here:
- Feature-Based Explanations - columns/features attributions
    - How much did each feature contribute to a specific prediction
    - Uses a baseline for comparison, usually based on a central value for each feature from the training data
    - Helpful for recognizing bias and finding areas for improvement
    - Read more about [feature attributions and methods](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)
    - Examples in [github.com/GoogleClouPlatform/vertex-ai-samples](https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/notebooks/official/explainable_ai)
- Example-Based Explanations - row/example attributions
    - Return similar examples, neighbors, to help understand predictions
    - Along with a prediction, get examples from the source data that are most similar to the prediction to further understand "why?"

This notebook covers example-based explanations.  For a review of feature-based explanations see the notebook [05Tools - Explainability - Feature-Based.ipynb](./05Tools%20-%20Explainability%20-%20Feature-Based.ipynb).

Vertex AI can serve explanations during online and batch predictions.  

### Prerequisites:
-  At least 1 of the notebooks in this series [05, 05a-05i]
    - these each reate a model, add it to the Vertex AI Model Registry, and update a Vertex AI Endpoint

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_explain_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_explain_console.png" width="45%">
</p>

---
## Setup

inputs:

In [52]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [53]:
REGION = 'us-central1'
EXPERIMENT = '05-explanability'
SERIES = '05'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
TRAIN_IMAGE = 'us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-7:latest'
DEPLOY_IMAGE ='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'
TRAIN_COMPUTE = 'n1-standard-4'
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters
EPOCHS = 10
BATCH_SIZE = 100

packages:

In [54]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf
import pkg_resources

from datetime import datetime
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

import json
import numpy as np

clients:

In [55]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [56]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

In [57]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [58]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [59]:
!rm -rf {DIR}
!mkdir -p {DIR}

Experiment Tracking:

In [60]:
FRAMEWORK = 'tf'
TASK = 'encoder'
MODEL_TYPE = 'dnn'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

---
## Get Vertex AI Experiments Tensorboard Instance Name
[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) has managed [Tensorboard](https://www.tensorflow.org/tensorboard) instances that you can track Tensorboard Experiments (a training run or hyperparameter tuning sweep).  

The training job will show up as an experiment for the Tensorboard instance and have the same name as the training job ID.

This code checks to see if a Tensorboard Instance has been created in the project, retrieves it if so, creates it otherwise:

In [61]:
tb = aiplatform.Tensorboard.list(filter=f"labels.series={SERIES}")
if tb:
    tb = tb[0]
else: 
    tb = aiplatform.Tensorboard.create(display_name = SERIES, labels = {'series' : f'{SERIES}'})

In [62]:
tb.resource_name

'projects/1026793852137/locations/us-central1/tensorboards/7179142426307592192'

---
## Setup Vertex AI Experiments

The code in this section initializes the experiment and starts a run that represents this notebook.  Throughout the notebook sections for model training and evaluation information will be logged to the experiment using:
- [.log_params](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_params)
- [.log_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics)
- [.log_time_series_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_time_series_metrics)

In [63]:
aiplatform.init(experiment = EXPERIMENT_NAME, experiment_tensorboard = tb.resource_name)

---
## Training

### Python File for Training

This notebook trains a TensorFlow model with the same inputs as `05` (and `05a` through `05i` but creates a trained encoder.  It proceses the inputs through a series of smaller and smaller hidden layers to create a learned encoding representation.

**Write the script:**

In [64]:
SCRIPT_PATH = f'./{DIR}/train.py'

In [65]:
%%writefile {SCRIPT_PATH}

# package import
from tensorflow.python.framework import dtypes
from tensorflow_io.bigquery import BigQueryClient
import tensorflow as tf
from google.cloud import bigquery
from google.cloud import aiplatform
import argparse
import os
import sys

# import argument to local variables
parser = argparse.ArgumentParser()
# the passed param, dest: a name for the param, default: if absent fetch this param from the OS, type: type to convert to, help: description of argument
parser.add_argument('--epochs', dest = 'epochs', default = 10, type = int, help = 'Number of Epochs')
parser.add_argument('--batch_size', dest = 'batch_size', default = 32, type = int, help = 'Batch Size')
parser.add_argument('--var_target', dest = 'var_target', type=str)
parser.add_argument('--var_omit', dest = 'var_omit', type=str, nargs='*')
parser.add_argument('--project_id', dest = 'project_id', type=str)
parser.add_argument('--bq_project', dest = 'bq_project', type=str)
parser.add_argument('--bq_dataset', dest = 'bq_dataset', type=str)
parser.add_argument('--bq_table', dest = 'bq_table', type=str)
parser.add_argument('--region', dest = 'region', type=str)
parser.add_argument('--experiment', dest = 'experiment', type=str)
parser.add_argument('--series', dest = 'series', type=str)
parser.add_argument('--experiment_name', dest = 'experiment_name', type=str)
parser.add_argument('--run_name', dest = 'run_name', type=str)
args = parser.parse_args()

# clients
bq = bigquery.Client(project = args.project_id)
aiplatform.init(project = args.project_id, location = args.region)

# Vertex AI Experiment
expRun = aiplatform.ExperimentRun.create(run_name = args.run_name, experiment = args.experiment_name)
expRun.log_params({'experiment': args.experiment, 'series': args.series, 'project_id': args.project_id})

# get schema from bigquery source
query = f"SELECT * FROM {args.bq_project}.{args.bq_dataset}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{args.bq_table}'"
schema = bq.query(query).to_dataframe()

# get number of classes from bigquery source
nclasses = bq.query(query = f'SELECT DISTINCT {args.var_target} FROM {args.bq_project}.{args.bq_dataset}.{args.bq_table} WHERE {args.var_target} is not null').to_dataframe()
nclasses = nclasses.shape[0]
expRun.log_params({'data_source': f'bq://{args.bq_project}.{args.bq_dataset}.{args.bq_table}', 'nclasses': nclasses, 'var_split': 'splits', 'var_target': args.var_target})

# Make a list of columns to omit
OMIT = args.var_omit + ['splits']

# use schema to prepare a list of columns to read from BigQuery
selected_fields = schema[~schema.column_name.isin(OMIT)].column_name.tolist()

# all the columns in this data source are either float64 or int64
output_types = [dtypes.float64 if x=='FLOAT64' else dtypes.int64 for x in schema[~schema.column_name.isin(OMIT)].data_type.tolist()]

# remap input data to Tensorflow inputs of features and target
def transTable(row_dict):
    target = row_dict.pop(args.var_target)
    target = tf.one_hot(tf.cast(target, tf.int64), nclasses)
    target = tf.cast(target, tf.float32)
    features = [tf.cast(v, tf.float32) for v in row_dict.values()]
    features = tf.stack(features)
    return(
        features, 
        {
            'logistic': target, 
            'classification': target, 
            'decoder': features}
    )

# function to setup a bigquery reader with Tensorflow I/O
def bq_reader(split):
    reader = BigQueryClient()

    training = reader.read_session(
        parent = f"projects/{args.project_id}",
        project_id = args.bq_project,
        table_id = args.bq_table,
        dataset_id = args.bq_dataset,
        selected_fields = selected_fields,
        output_types = output_types,
        row_restriction = f"splits='{split}'",
        requested_streams = 3
    )
    
    return training

# setup feed for train, validate and test
train = bq_reader('TRAIN').parallel_read_rows().prefetch(1).map(transTable).shuffle(args.batch_size*10).batch(args.batch_size)
validate = bq_reader('VALIDATE').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
test = bq_reader('TEST').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
expRun.log_params({'training.batch_size': args.batch_size, 'training.shuffle': 10*args.batch_size, 'training.prefetch': 1})

# Three targets: logistics, autoencoder, classification from encoder

# inputs
features = tf.keras.layers.Input(shape = (len(selected_fields)-1,), name = 'features')

# normalize here
normalized = tf.keras.layers.BatchNormalization(name = 'batch_normalization_layer')(features)

# logistic
logistic = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'logistic')(normalized)

# encoder
encode = tf.keras.layers.Dense(25, activation = tf.nn.relu)(normalized)#(features)
encode = tf.keras.layers.Dense(20, activation = tf.nn.relu)(encode)
encode = tf.keras.layers.Dense(15, activation = tf.nn.relu, name = 'encoder')(encode)

# classifier
classifier = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'classification')(encode)

# decoder
decode = tf.keras.layers.Dense(20, activation = tf.nn.relu)(encode)
decode = tf.keras.layers.Dense(25, activation = tf.nn.relu)(decode)
decode = tf.keras.layers.Dense(features.shape[1], activation = tf.nn.sigmoid, name = 'decoder')(decode)

# the model
model = tf.keras.Model(
    inputs = features,
    outputs = [logistic, classifier, decode],
    name = args.experiment
)

# compile
model.compile(
    optimizer = tf.keras.optimizers.Adam(), #SGD or Adam
    loss = {
        'logistic': tf.keras.losses.CategoricalCrossentropy(),
        'classification': tf.keras.losses.CategoricalCrossentropy(),
        'decoder': tf.keras.losses.BinaryCrossentropy()
    },
    metrics = {
        'logistic': 'accuracy',
        'classification': 'accuracy',
        'decoder': 'accuracy'
    }
)

# setup tensorboard logs and train
# setup tensorboard logs and train
tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'],
    histogram_freq=1
)
history = model.fit(
    train, 
    epochs = args.epochs, 
    callbacks = [tensorboard_callback], 
    validation_data = validate
)
expRun.log_params({'training.epochs': history.params['epochs']})
for e in range(0, history.params['epochs']):
    expRun.log_time_series_metrics(
        {
            'train_loss': history.history['loss'][e],
            'train_logistic_loss': history.history['logistic_loss'][e],
            'train_classification_loss': history.history['classification_loss'][e],
            'train_decoder_loss': history.history['decoder_loss'][e],
            'train_logistic_accuracy': history.history['logistic_accuracy'][e],
            'train_classification_accuracy': history.history['classification_accuracy'][e],
            'train_decoder_accuracy': history.history['decoder_accuracy'][e],
            'val_loss': history.history['val_loss'][e],
            'val_logistic_loss': history.history['val_logistic_loss'][e],
            'val_classification_loss': history.history['val_classification_loss'][e],
            'val_decoder_loss': history.history['val_decoder_loss'][e],
            'val_logistic_accuracy': history.history['val_logistic_accuracy'][e],
            'val_classification_accuracy': history.history['val_classification_accuracy'][e],
            'val_decoder_accuracy': history.history['val_decoder_accuracy'][e],
        }
    )

# test evaluations:
metrics = model.evaluate(test)
expRun.log_metrics(
    {
        'test_loss': metrics[0],
        'test_logistic_loss': metrics[1],
        'test_classification_loss': metrics[2],
        'test_decoder_loss': metrics[3],
        'test_logistic_accuracy': metrics[4],
        'test_classification_accuracy': metrics[5],
        'test_decoder_accuracy': metrics[6]
    }
)

# val evaluations:
metrics = model.evaluate(validate)
expRun.log_metrics(
    {
        'val_loss': metrics[0],
        'val_logistic_loss': metrics[1],
        'val_classification_loss': metrics[2],
        'val_decoder_loss': metrics[3],
        'val_logistic_accuracy': metrics[4],
        'val_classification_accuracy': metrics[5],
        'val_decoder_accuracy': metrics[6]
    }
)

# training evaluations:
metrics = model.evaluate(train)
expRun.log_metrics(
    {
        'train_loss': metrics[0],
        'train_logistic_loss': metrics[1],
        'train_classification_loss': metrics[2],
        'train_decoder_loss': metrics[3],
        'train_logistic_accuracy': metrics[4],
        'train_classification_accuracy': metrics[5],
        'train_decoder_accuracy': metrics[6]
    }
)

# extract encode layer
encode_model = tf.keras.Model(
    inputs = model.input,
    outputs = model.get_layer('encoder').output,
    name = args.experiment+'_encoder'
)

# output the model save files
encode_model.save(os.getenv("AIP_MODEL_DIR")+'encoder/')
model.save(os.getenv("AIP_MODEL_DIR")+'full/')
expRun.log_params({'model.save': os.getenv("AIP_MODEL_DIR")})
expRun.end_run()

Writing ./temp/05-explanability/train.py


### Setup Training Job

Run the job with [`aiplatform.CustomJob.from_local_script()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob#google_cloud_aiplatform_CustomJob_from_local_script).

In [66]:
CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE),
    "--var_target=" + VAR_TARGET,
    "--var_omit=" + VAR_OMIT,
    "--project_id=" + PROJECT_ID,
    "--bq_project=" + BQ_PROJECT,
    "--bq_dataset=" + BQ_DATASET,
    "--bq_table=" + BQ_TABLE,
    "--region=" + REGION,
    "--experiment=" + EXPERIMENT,
    "--series=" + SERIES,
    "--experiment_name=" + EXPERIMENT_NAME,
    "--run_name=" + RUN_NAME
]

In [67]:
customJob = aiplatform.CustomJob.from_local_script(
    display_name = f'{SERIES}_{EXPERIMENT}_{TIMESTAMP}',
    script_path = SCRIPT_PATH,
    container_uri = TRAIN_IMAGE,
    args = CMDARGS,
    requirements = ['tensorflow_io', f'google-cloud-aiplatform>={aiplatform.__version__}', f"protobuf=={pkg_resources.get_distribution('protobuf').version}"],
    replica_count = 1,
    machine_type = TRAIN_COMPUTE,
    accelerator_type = 'NVIDIA_TESLA_K80',
    accelerator_count = 1,
    base_output_dir = f"{URI}/models/{TIMESTAMP}",
    staging_bucket = f"{URI}/models/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Training script copied to:
gs://statmike-mlops-349915/05/05-explanability/models/20221107225712/aiplatform-2022-11-07-22:57:22.438-aiplatform_custom_trainer_script-0.1.tar.gz.


### Run Training Job

In [68]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/2373693596885843968
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/2373693596885843968')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/2373693596885843968?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7179142426307592192+experiments+2373693596885843968
CustomJob projects/1026793852137/locations/us-central1/customJobs/2373693596885843968 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/2373693596885843968 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/2373693596885843968 current state:
JobState.JOB_STATE_PENDING


In [72]:
customJob.display_name

'05_05-explanability_20221107225712'

In [73]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/2373693596885843968'

Create hyperlinks to job and tensorboard here:

In [74]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Custom Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Custom Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/2373693596885843968/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7179142426307592192+experiments+2373693596885843968


---
---
# need to upload both model versions here
---
---

## Serving

### Upload The Model

In [None]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name

else:
    print('This is a new model, creating in model registry')
    parent_model = ''

if upload_model:
    model = aiplatform.Model.upload(
        display_name = f'{SERIES}_{EXPERIMENT}',
        model_id = f'model_{SERIES}_{EXPERIMENT}',
        parent_model =  modelmatch[0].resource_name,
        serving_container_image_uri = DEPLOY_IMAGE,
        artifact_uri = f"{URI}/models/{TIMESTAMP}/model",
        is_default_version = True,
        version_aliases = [RUN_NAME],
        version_description = RUN_NAME,
        labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}        
    )

>**Note** on Version Aliases:
>Expectation is a name starting with `a-z` that can include `[a-zA-Z0-9-]`
>
>**Retrieve a Model Resource**
>[aiplatform.Model()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model)
>```Python
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}') # retrieves default version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}@time-{TIMESTAMP}') # retrieves specific version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}', version = f'time-{TIMESTAMP}') # retrieves specific version
```

In [21]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/5114853325623263232?project=statmike-mlops-349915


### Vertex AI Experiment Update and Review

In [22]:
expRun = aiplatform.ExperimentRun(run_name = RUN_NAME, experiment = EXPERIMENT_NAME)

In [23]:
expRun.log_params({
    'model.uri': model.uri,
    'model.display_name': model.display_name,
    'model.name': model.name,
    'model.resource_name': model.resource_name,
    'model.version_id': model.version_id,
    'model.versioned_resource_name': model.versioned_resource_name,
    'customJobs.display_name': customJob.display_name,
    'customJobs.resource_name': customJob.resource_name,
    'customJobs.link': job_link,
    'customJobs.tensorboard': board_link
})

Complete the experiment run:

In [24]:
expRun.update_state(state = aiplatform.gapic.Execution.State.COMPLETE)

Retrieve the experiment:

In [25]:
exp = aiplatform.Experiment(experiment_name = EXPERIMENT_NAME)

In [26]:
exp.get_data_frame()

Unnamed: 0,experiment_name,run_name,run_type,state,param.training.epochs,param.project_id,param.model.save,param.customJobs.link,param.var_target,param.training.shuffle,...,metric.test_loss,metric.train_auprc,metric.val_auprc,metric.val_accuracy,time_series_metric.val_auprc,time_series_metric.train_loss,time_series_metric.val_loss,time_series_metric.train_auprc,time_series_metric.val_accuracy,time_series_metric.train_accuracy
0,experiment-05-05a-tf-classification-dnn,run-20221024120130,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/05/05a/models/20221...,https://console.cloud.google.com/vertex-ai/loc...,Class,1000.0,...,0.005175,0.999484,0.999625,0.999221,0.999625,0.00499,0.005194,0.999486,0.999221,0.999211
1,experiment-05-05a-tf-classification-dnn,run-20221024105747,system.ExperimentRun,RUNNING,,statmike-mlops-349915,,,Class,1000.0,...,,,,,,,,,,
2,experiment-05-05a-tf-classification-dnn,run-20220927105742,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/05/05a/models/20220...,https://console.cloud.google.com/vertex-ai/loc...,Class,1000.0,...,0.003754,0.999627,0.999623,0.999256,0.999623,0.003305,0.005162,0.999691,0.999256,0.999434
3,experiment-05-05a-tf-classification-dnn,run-20220926133308,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/05/05a/202209261333...,https://console.cloud.google.com/vertex-ai/loc...,Class,1000.0,...,0.003983,0.999516,0.999576,0.999256,0.999576,0.003341,0.005201,0.999685,0.999256,0.999377
4,experiment-05-05a-tf-classification-dnn,run-20220921144806,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/fraud/models/05/05a...,,Class,1000.0,...,0.003512,0.999593,0.99953,0.999221,0.99953,0.003472,0.005431,0.999696,0.999221,0.999386
5,experiment-05-05a-tf-classification-dnn,run-20220921095821,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/fraud/models/05/05a...,,Class,1000.0,...,0.003337,0.999583,0.999579,0.999363,0.999579,0.003238,0.005702,0.999687,0.999363,0.999382
6,experiment-05-05a-tf-classification-dnn,run-20220827023541,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/fraud/models/05/05a...,https://console.cloud.google.com/vertex-ai/loc...,Class,1000.0,...,0.003704,0.999621,0.999438,0.999292,0.999438,0.003527,0.005493,0.999656,0.999292,0.999412
7,experiment-05-05a-tf-classification-dnn,run-20220826104731,system.ExperimentRun,COMPLETE,10.0,statmike-mlops-349915,gs://statmike-mlops-349915/fraud/models/05/05a...,https://console.cloud.google.com/vertex-ai/loc...,Class,1000.0,...,0.003707,0.999551,0.999529,0.999292,0.999529,0.0033,0.005549,0.999691,0.999292,0.999391


Review the Experiments TensorBoard to compare runs:

In [27]:
print(f"The Experiment TensorBoard Link:\nhttps://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{exp.name}")

The Experiment TensorBoard Link:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+7179142426307592192+experiments+experiment-05-05a-tf-classification-dnn


In [28]:
expRun.get_time_series_data_frame()

Unnamed: 0,step,wall_time,val_auprc,train_loss,val_loss,train_auprc,val_accuracy,train_accuracy
0,1,2022-10-24 12:11:24.854000+00:00,0.999464,0.072239,0.012053,0.997569,0.998902,0.982763
1,2,2022-10-24 12:11:24.924000+00:00,0.999489,0.010422,0.007457,0.999258,0.999256,0.998921
2,3,2022-10-24 12:11:24.998000+00:00,0.999543,0.007516,0.006526,0.999372,0.999186,0.999057
3,4,2022-10-24 12:11:25.071000+00:00,0.999564,0.006592,0.006114,0.999404,0.999186,0.999158
4,5,2022-10-24 12:11:25.134000+00:00,0.999617,0.005945,0.005827,0.999434,0.999186,0.99922
5,6,2022-10-24 12:11:25.194000+00:00,0.99962,0.00568,0.005683,0.999453,0.999221,0.999189
6,7,2022-10-24 12:11:25.261000+00:00,0.999623,0.005423,0.005475,0.999475,0.999221,0.99925
7,8,2022-10-24 12:11:25.335000+00:00,0.999624,0.005266,0.005368,0.99946,0.999221,0.999246
8,9,2022-10-24 12:11:25.403000+00:00,0.999625,0.005241,0.005245,0.999467,0.999256,0.999224
9,10,2022-10-24 12:11:25.465000+00:00,0.999625,0.00499,0.005194,0.999486,0.999221,0.999211


---
---
---
---
---

## Get Vertex AI Endpoint And Deployed Model

In [9]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    print(f"There does not appear to be an endpoint for SERIES = {SERIES}")

Endpoint Exists: projects/1026793852137/locations/us-central1/endpoints/1961322035766362112


In [10]:
endpoint.display_name

'05'

In [11]:
model = aiplatform.Model(
    model_name = endpoint.list_models()[0].model+f'@{endpoint.list_models()[0].model_version_id}'
)

In [12]:
model.display_name

'05_05h'

In [13]:
model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/model_05_05h@1'

In [14]:
model.uri

'gs://statmike-mlops-349915/05/05h/models/20220927230247/6/model'

## Load the Model and Review Signature
Load the model currently on the series endpoint: stored in `model` above

In [15]:
reloaded_model = tf.saved_model.load(model.uri)

2022-10-18 22:58:59.222417: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2299995000 Hz
2022-10-18 22:58:59.222767: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x563d97bf8690 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2022-10-18 22:58:59.222814: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2022-10-18 22:58:59.224041: I tensorflow/core/common_runtime/process_util.cc:146] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.


In [23]:
reloaded_model.signatures.keys()

KeysView(_SignatureMap({'serving_default': <ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F63B812DED0>}))

In [22]:
reloaded_model.signatures['serving_default']

<ConcreteFunction signature_wrapper(Amount, Time, V1, V10, V11, V12, V13, V14, V15, V16, V17, V18, V19, V2, V20, V21, V22, V23, V24, V25, V26, V27, V28, V3, V4, V5, V6, V7, V8, V9) at 0x7F63B812DED0>

In [19]:
reloaded_model.signatures['serving_default'].structured_input_signature

((),
 {'V20': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V20'),
  'Time': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Time'),
  'V10': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V10'),
  'V19': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V19'),
  'Amount': TensorSpec(shape=(None, 1), dtype=tf.float32, name='Amount'),
  'V17': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V17'),
  'V2': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V2'),
  'V7': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V7'),
  'V6': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V6'),
  'V13': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V13'),
  'V28': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V28'),
  'V21': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V21'),
  'V12': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V12'),
  'V16': TensorSpec(shape=(None, 1), dtype=tf.float32, name='V16'),
  'V9': TensorSpec(shape=(None, 1), dtype

In [20]:
reloaded_model.signatures['serving_default'].structured_outputs

{'prediction_layer': TensorSpec(shape=(None, 2), dtype=tf.float32, name='prediction_layer')}

In [28]:
for layer in reloaded_model.layers: layer

AttributeError: '_UserObject' object has no attribute 'layers'

In [29]:
#for v in reloaded_model.signatures['serving_default'].trainable_variables: print(v)

In [35]:
#!saved_model_cli show --dir {DIR}/model --all

---
## Retrieve Records For Prediction & Explanation

In [14]:
n = 1000
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT {n}").to_dataframe()

In [15]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


Remove columns not included as features in the model:

In [16]:
newobs = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')
#newobs[0]

In [17]:
len(newobs)

1000

---
## Example-Based Explanations

**IN DEVELOPMENT**