![ga4](https://www.google-analytics.com/collect?v=2&tid=G-6VDTYWLKX6&cid=1&en=page_view&sid=1&dl=statmike%2Fvertex-ai-mlops%2F05+-+TensorFlow&dt=05Tools+-+Explainability+-+Example-Based.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/05%20-%20TensorFlow/05Tools%20-%20Explainability%20-%20Example-Based.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/05%20-%20TensorFlow/05Tools%20-%20Explainability%20-%20Example-Based.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/05%20-%20TensorFlow/05Tools%20-%20Explainability%20-%20Example-Based.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https%3A//raw.githubusercontent.com/statmike/vertex-ai-mlops/main/05%20-%20TensorFlow/05Tools%20-%20Explainability%20-%20Example-Based.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# 05Tools: Model Explainability - Example-Based
## IN ACTIVE DEVELOPMENT - NOT COMPLETE

Model explainability helps understand model outputs = predictions.  There are two approaches here:
- Feature-Based Explanations - columns/features attributions
    - How much did each feature contribute to a specific prediction
    - Uses a baseline for comparison, usually based on a central value for each feature from the training data
    - Helpful for recognizing bias and finding areas for improvement
    - Read more about [feature attributions and methods](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview#feature_attributions)
    - Examples in [github.com/GoogleClouPlatform/vertex-ai-samples](https://github.com/GoogleCloudPlatform/vertex-ai-samples/tree/main/notebooks/official/explainable_ai)
- Example-Based Explanations - row/example attributions
    - Return similar examples, neighbors, to help understand predictions
    - Along with a prediction, get examples from the source data that are most similar to the prediction to further understand "why?"

This notebook covers example-based explanations.  For a review of feature-based explanations see the notebook [05Tools - Explainability - Feature-Based.ipynb](./05Tools%20-%20Explainability%20-%20Feature-Based.ipynb).

Vertex AI can serve explanations during online and batch predictions.  

### Prerequisites:
-  At least 1 of the notebooks in this series [05, 05a-05i]
    - these each reate a model, add it to the Vertex AI Model Registry, and update a Vertex AI Endpoint

### Conceptual Flow & Workflow
<p align="center">
  <img alt="Conceptual Flow" src="../architectures/slides/05tools_explain_arch.png" width="45%">
&nbsp; &nbsp; &nbsp; &nbsp;
  <img alt="Workflow" src="../architectures/slides/05tools_explain_console.png" width="45%">
</p>

---
## Setup

inputs:

In [9]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [35]:
REGION = 'us-central1'
EXPERIMENT = 'ebe'
SERIES = '05ebe'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
TRAIN_IMAGE = 'us-docker.pkg.dev/vertex-ai/training/tf-gpu.2-7:latest'
DEPLOY_IMAGE ='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-7:latest'
TRAIN_COMPUTE = 'n1-standard-4'
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters
EPOCHS = 10
BATCH_SIZE = 100

packages:

In [36]:
from google.cloud import aiplatform
from google.cloud import bigquery

import tensorflow as tf
import pkg_resources

from datetime import datetime
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

import json
import numpy as np

clients:

In [37]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()

parameters:

In [38]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

In [39]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [40]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [41]:
!rm -rf {DIR}
!mkdir -p {DIR}

Experiment Tracking:

In [42]:
FRAMEWORK = 'tf'
TASK = 'classification'
MODEL_TYPE = 'dnn'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

---
## Get Vertex AI Experiments Tensorboard Instance Name
[Vertex AI Experiments](https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview) has managed [Tensorboard](https://www.tensorflow.org/tensorboard) instances that you can track Tensorboard Experiments (a training run or hyperparameter tuning sweep).  

The training job will show up as an experiment for the Tensorboard instance and have the same name as the training job ID.

This code checks to see if a Tensorboard Instance has been created in the project, retrieves it if so, creates it otherwise:

In [43]:
tb = aiplatform.Tensorboard.list(filter=f"labels.series={SERIES}")
if tb:
    tb = tb[0]
else: 
    tb = aiplatform.Tensorboard.create(display_name = SERIES, labels = {'series' : f'{SERIES}'})

In [44]:
tb.resource_name

'projects/1026793852137/locations/us-central1/tensorboards/2899904743654555648'

---
## Setup Vertex AI Experiments

The code in this section initializes the experiment and starts a run that represents this notebook.  Throughout the notebook sections for model training and evaluation information will be logged to the experiment using:
- [.log_params](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_params)
- [.log_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_metrics)
- [.log_time_series_metrics](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform#google_cloud_aiplatform_log_time_series_metrics)

In [45]:
aiplatform.init(experiment = EXPERIMENT_NAME, experiment_tensorboard = tb.resource_name)

---
## Training

### Python File for Training

This notebook trains a TensorFlow model with the same inputs as `05` (and `05a` through `05i` but creates a trained encoder.  It proceses the inputs through a series of smaller and smaller hidden layers to create a learned encoding representation.

**Write the script:**

In [46]:
SCRIPT_PATH = f'./{DIR}/train.py'

In [47]:
%%writefile {SCRIPT_PATH}

# package import
from tensorflow.python.framework import dtypes
from tensorflow_io.bigquery import BigQueryClient
import tensorflow as tf
from google.cloud import bigquery
from google.cloud import aiplatform
import argparse
import os
import sys

# import argument to local variables
parser = argparse.ArgumentParser()
# the passed param, dest: a name for the param, default: if absent fetch this param from the OS, type: type to convert to, help: description of argument
parser.add_argument('--epochs', dest = 'epochs', default = 10, type = int, help = 'Number of Epochs')
parser.add_argument('--batch_size', dest = 'batch_size', default = 32, type = int, help = 'Batch Size')
parser.add_argument('--var_target', dest = 'var_target', type=str)
parser.add_argument('--var_omit', dest = 'var_omit', type=str, nargs='*')
parser.add_argument('--project_id', dest = 'project_id', type=str)
parser.add_argument('--bq_project', dest = 'bq_project', type=str)
parser.add_argument('--bq_dataset', dest = 'bq_dataset', type=str)
parser.add_argument('--bq_table', dest = 'bq_table', type=str)
parser.add_argument('--region', dest = 'region', type=str)
parser.add_argument('--experiment', dest = 'experiment', type=str)
parser.add_argument('--series', dest = 'series', type=str)
parser.add_argument('--experiment_name', dest = 'experiment_name', type=str)
parser.add_argument('--run_name', dest = 'run_name', type=str)
args = parser.parse_args()

# clients
bq = bigquery.Client(project = args.project_id)
aiplatform.init(project = args.project_id, location = args.region)

# Vertex AI Experiment
expRun = aiplatform.ExperimentRun.create(run_name = args.run_name, experiment = args.experiment_name)
expRun.log_params({'experiment': args.experiment, 'series': args.series, 'project_id': args.project_id})

# get schema from bigquery source
query = f"SELECT * FROM {args.bq_project}.{args.bq_dataset}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{args.bq_table}'"
schema = bq.query(query).to_dataframe()

# get number of classes from bigquery source
nclasses = bq.query(query = f'SELECT DISTINCT {args.var_target} FROM {args.bq_project}.{args.bq_dataset}.{args.bq_table} WHERE {args.var_target} is not null').to_dataframe()
nclasses = nclasses.shape[0]
expRun.log_params({'data_source': f'bq://{args.bq_project}.{args.bq_dataset}.{args.bq_table}', 'nclasses': nclasses, 'var_split': 'splits', 'var_target': args.var_target})

# Make a list of columns to omit
OMIT = args.var_omit + ['splits']

# use schema to prepare a list of columns to read from BigQuery
selected_fields = schema[~schema.column_name.isin(OMIT)].column_name.tolist()

# all the columns in this data source are either float64 or int64
output_types = [dtypes.float64 if x=='FLOAT64' else dtypes.int64 for x in schema[~schema.column_name.isin(OMIT)].data_type.tolist()]

# remap input data to Tensorflow inputs of features and target
def transTable(row_dict):
    target = row_dict.pop(args.var_target)
    target = tf.one_hot(tf.cast(target, tf.int64), nclasses)
    target = tf.cast(target, tf.float32)
    features = [tf.cast(v, tf.float32) for v in row_dict.values()]
    features = tf.stack(features)
    return(
        features, 
        {
            'logistic': target, 
            'classification': target, 
            'decoder': features}
    )

# function to setup a bigquery reader with Tensorflow I/O
def bq_reader(split):
    reader = BigQueryClient()

    training = reader.read_session(
        parent = f"projects/{args.project_id}",
        project_id = args.bq_project,
        table_id = args.bq_table,
        dataset_id = args.bq_dataset,
        selected_fields = selected_fields,
        output_types = output_types,
        row_restriction = f"splits='{split}'",
        requested_streams = 3
    )
    
    return training

# setup feed for train, validate and test
train = bq_reader('TRAIN').parallel_read_rows().prefetch(1).map(transTable).shuffle(args.batch_size*10).batch(args.batch_size)
validate = bq_reader('VALIDATE').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
test = bq_reader('TEST').parallel_read_rows().prefetch(1).map(transTable).batch(args.batch_size)
expRun.log_params({'training.batch_size': args.batch_size, 'training.shuffle': 10*args.batch_size, 'training.prefetch': 1})

# Three targets: logistics, autoencoder, classification from encoder

# inputs
features = tf.keras.layers.Input(shape = (len(selected_fields)-1,), name = 'features')

# normalize here
normalized = tf.keras.layers.BatchNormalization(name = 'batch_normalization_layer')(features)

# logistic
logistic = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'logistic')(normalized)#(normalized)(features)

# encoder
encode = tf.keras.layers.Dense(25, activation = tf.nn.relu)(normalized)#(normalized)(features)
encode = tf.keras.layers.Dense(20, activation = tf.nn.relu)(encode)
encode = tf.keras.layers.Dense(15, activation = tf.nn.relu, name = 'encoder')(encode)

# classifier
classifier = tf.keras.layers.Dense(nclasses, activation = tf.nn.softmax, name = 'classification')(encode)

# decoder
decode = tf.keras.layers.Dense(20, activation = tf.nn.relu)(encode)
decode = tf.keras.layers.Dense(25, activation = tf.nn.relu)(decode)
decode = tf.keras.layers.Dense(features.shape[1], activation = tf.nn.sigmoid, name = 'decoder')(decode)

# the model
model = tf.keras.Model(
    inputs = features,
    outputs = [logistic, classifier, decode],
    name = args.experiment
)

# compile
model.compile(
    optimizer = tf.keras.optimizers.Adam(), #SGD or Adam
    loss = {
        'logistic': tf.keras.losses.CategoricalCrossentropy(),
        'classification': tf.keras.losses.CategoricalCrossentropy(),
        'decoder': tf.keras.losses.BinaryCrossentropy()
    },
    metrics = {
        'logistic': ['accuracy', tf.keras.metrics.AUC(curve = 'PR', name = 'auprc')],
        'classification': ['accuracy', tf.keras.metrics.AUC(curve = 'PR', name = 'auprc')],
        'decoder': tf.keras.metrics.RootMeanSquaredError(name = 'rmse')
    }
)

# setup tensorboard logs and train
# setup tensorboard logs and train
tensorboard_callback = tf.keras.callbacks.TensorBoard(
    log_dir=os.environ['AIP_TENSORBOARD_LOG_DIR'],
    histogram_freq=1
)
history = model.fit(
    train, 
    epochs = args.epochs, 
    callbacks = [tensorboard_callback], 
    validation_data = validate
)
expRun.log_params({'training.epochs': history.params['epochs']})
for e in range(0, history.params['epochs']):
    expRun.log_time_series_metrics(
        {
            'train_loss': history.history['loss'][e],
            'train_logistic_loss': history.history['logistic_loss'][e],
            'train_classification_loss': history.history['classification_loss'][e],
            'train_decoder_loss': history.history['decoder_loss'][e],
            'train_logistic_accuracy': history.history['logistic_accuracy'][e],
            'train_classification_accuracy': history.history['classification_accuracy'][e],
            'train_logistic_auprc': history.history['logistic_auprc'][e],
            'train_classification_auprc': history.history['classification_auprc'][e],
            'train_decoder_rmse': history.history['decoder_rmse'][e],
            'val_loss': history.history['val_loss'][e],
            'val_logistic_loss': history.history['val_logistic_loss'][e],
            'val_classification_loss': history.history['val_classification_loss'][e],
            'val_decoder_loss': history.history['val_decoder_loss'][e],
            'val_logistic_accuracy': history.history['val_logistic_accuracy'][e],
            'val_classification_accuracy': history.history['val_classification_accuracy'][e],
            'val_logistic_auprc': history.history['val_logistic_auprc'][e],
            'val_classification_auprc': history.history['val_classification_auprc'][e],
            'val_decoder_rmse': history.history['val_decoder_rmse'][e],
        }
    )

# test evaluations:
metrics = model.evaluate(test)
expRun.log_metrics(
    {
        'test_loss': metrics[0],
        'test_logistic_loss': metrics[1],
        'test_classification_loss': metrics[2],
        'test_decoder_loss': metrics[3],
        'test_logistic_accuracy': metrics[4],
        'test_logistic_auprc': metrics[5],
        'test_classification_accuracy': metrics[6],
        'test_classification_auprc': metrics[7],
        'test_decoder_rmse': metrics[8]
    }
)

# extract encode layer
encode_model = tf.keras.Model(
    inputs = model.input,
    outputs = model.get_layer('encoder').output,
    name = args.experiment+'_encoder'
)

# output the model save files
encode_model.save(os.getenv("AIP_MODEL_DIR")+'encoder/')
model.save(os.getenv("AIP_MODEL_DIR"))
expRun.log_params({'model.save': os.getenv("AIP_MODEL_DIR")})
expRun.end_run()

Writing ./temp/ebe/train.py


### Setup Training Job

Run the job with [`aiplatform.CustomJob.from_local_script()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob#google_cloud_aiplatform_CustomJob_from_local_script).

In [48]:
CMDARGS = [
    "--epochs=" + str(EPOCHS),
    "--batch_size=" + str(BATCH_SIZE),
    "--var_target=" + VAR_TARGET,
    "--var_omit=" + VAR_OMIT,
    "--project_id=" + PROJECT_ID,
    "--bq_project=" + BQ_PROJECT,
    "--bq_dataset=" + BQ_DATASET,
    "--bq_table=" + BQ_TABLE,
    "--region=" + REGION,
    "--experiment=" + EXPERIMENT,
    "--series=" + SERIES,
    "--experiment_name=" + EXPERIMENT_NAME,
    "--run_name=" + RUN_NAME
]

In [49]:
customJob = aiplatform.CustomJob.from_local_script(
    display_name = f'{SERIES}_{EXPERIMENT}_{TIMESTAMP}',
    script_path = SCRIPT_PATH,
    container_uri = TRAIN_IMAGE,
    args = CMDARGS,
    requirements = ['tensorflow_io', f'google-cloud-aiplatform>={aiplatform.__version__}', f"protobuf=={pkg_resources.get_distribution('protobuf').version}"],
    replica_count = 1,
    machine_type = TRAIN_COMPUTE,
    accelerator_type = 'NVIDIA_TESLA_K80',
    accelerator_count = 1,
    base_output_dir = f"{URI}/models/{TIMESTAMP}",
    staging_bucket = f"{URI}/models/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

Training script copied to:
gs://statmike-mlops-349915/05ebe/ebe/models/20221111130741/aiplatform-2022-11-11-13:07:58.859-aiplatform_custom_trainer_script-0.1.tar.gz.


### Run Training Job

In [50]:
customJob.run(
    service_account = SERVICE_ACCOUNT,
    tensorboard = tb.resource_name
)

Creating CustomJob
CustomJob created. Resource name: projects/1026793852137/locations/us-central1/customJobs/5997525592361140224
To use this CustomJob in another session:
custom_job = aiplatform.CustomJob.get('projects/1026793852137/locations/us-central1/customJobs/5997525592361140224')
View Custom Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/5997525592361140224?project=1026793852137
View Tensorboard:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+2899904743654555648+experiments+5997525592361140224
CustomJob projects/1026793852137/locations/us-central1/customJobs/5997525592361140224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/5997525592361140224 current state:
JobState.JOB_STATE_PENDING
CustomJob projects/1026793852137/locations/us-central1/customJobs/5997525592361140224 current state:
JobState.JOB_STATE_PENDING


In [51]:
customJob.display_name

'05ebe_ebe_20221111130741'

In [52]:
customJob.resource_name

'projects/1026793852137/locations/us-central1/customJobs/5997525592361140224'

Create hyperlinks to job and tensorboard here:

In [53]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"
board_link = f"https://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{customJob.resource_name.split('/')[-1]}"

print(f'Review the Custom Job here:\n{job_link}')
print(f'Review the TensorBoard From the Job here:\n{board_link}')

Review the Custom Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/5997525592361140224/cpu?cloudshell=false&project=statmike-mlops-349915
Review the TensorBoard From the Job here:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+2899904743654555648+experiments+5997525592361140224


---

## Serving

### Upload The Model - Full Model

In [54]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name

else:
    print('This is a new model, creating in model registry')
    parent_model = ''

if upload_model:
    model = aiplatform.Model.upload(
        display_name = f'{SERIES}_{EXPERIMENT}',
        model_id = f'model_{SERIES}_{EXPERIMENT}',
        parent_model =  parent_model,
        serving_container_image_uri = DEPLOY_IMAGE,
        artifact_uri = f"{URI}/models/{TIMESTAMP}/model",
        is_default_version = True,
        version_aliases = [RUN_NAME],
        version_description = RUN_NAME,
        labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}        
    )

This is a new model, creating in model registry
Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/model_05ebe_ebe/operations/4984962863073329152
Model created. Resource name: projects/1026793852137/locations/us-central1/models/2080236417333592064@1
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/2080236417333592064@1')


>**Note** on Version Aliases:
>Expectation is a name starting with `a-z` that can include `[a-zA-Z0-9-]`
>
>**Retrieve a Model Resource**
>[aiplatform.Model()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model)
>```Python
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}') # retrieves default version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}@time-{TIMESTAMP}') # retrieves specific version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}', version = f'time-{TIMESTAMP}') # retrieves specific version
```

In [55]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/2080236417333592064?project=statmike-mlops-349915


### Upload The Model - Latent Model

In [56]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT}_latent AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model_latent = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = modelmatch[0].resource_name

else:
    print('This is a new model, creating in model registry')
    parent_model = ''

if upload_model:
    model_latent = aiplatform.Model.upload(
        display_name = f'{SERIES}_{EXPERIMENT}_latent',
        model_id = f'model_{SERIES}_{EXPERIMENT}_latent',
        parent_model =  parent_model,
        serving_container_image_uri = DEPLOY_IMAGE,
        artifact_uri = f"{URI}/models/{TIMESTAMP}/model/encoder",
        is_default_version = True,
        version_aliases = [RUN_NAME],
        version_description = RUN_NAME,
        labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}        
    )

This is a new model, creating in model registry
Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/model_05ebe_ebe_latent/operations/3049259448234147840
Model created. Resource name: projects/1026793852137/locations/us-central1/models/791081023998787584@1
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/791081023998787584@1')


>**Note** on Version Aliases:
>Expectation is a name starting with `a-z` that can include `[a-zA-Z0-9-]`
>
>**Retrieve a Model Resource**
>[aiplatform.Model()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model)
>```Python
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}') # retrieves default version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}@time-{TIMESTAMP}') # retrieves specific version
model = aiplatform.Model(model_name = f'model_{SERIES}_{EXPERIMENT}', version = f'time-{TIMESTAMP}') # retrieves specific version
```

In [57]:
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model_latent.name}?project={PROJECT_ID}')

Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/791081023998787584?project=statmike-mlops-349915


### Vertex AI Experiment Update

In [58]:
expRun = aiplatform.ExperimentRun(run_name = RUN_NAME, experiment = EXPERIMENT_NAME)

In [59]:
expRun.log_params({
    'model.uri': model.uri,
    'model.display_name': model.display_name,
    'model.name': model.name,
    'model.resource_name': model.resource_name,
    'model.version_id': model.version_id,
    'model.versioned_resource_name': model.versioned_resource_name,
    'customJobs.display_name': customJob.display_name,
    'customJobs.resource_name': customJob.resource_name,
    'customJobs.link': job_link,
    'customJobs.tensorboard': board_link,
    'model_latent.uri': model_latent.uri,
    'model_latent.display_name': model_latent.display_name,
    'model_latent.name': model_latent.name,
    'model_latent.resource_name': model_latent.resource_name,
    'model_latent.version_id': model_latent.version_id,
    'model_latent.versioned_resource_name': model_latent.versioned_resource_name,
})

Complete the experiment run:

### Vertex AI Experiment Review

In [60]:
expRun.update_state(state = aiplatform.gapic.Execution.State.COMPLETE)

Retrieve the experiment:

In [61]:
exp = aiplatform.Experiment(experiment_name = EXPERIMENT_NAME)

In [62]:
exp.get_data_frame()

Unnamed: 0,experiment_name,run_name,run_type,state,param.model.display_name,param.model.resource_name,param.training.epochs,param.customJobs.display_name,param.var_split,param.var_target,...,time_series_metric.train_classification_accuracy,time_series_metric.train_decoder_rmse,time_series_metric.train_classification_loss,time_series_metric.val_classification_loss,time_series_metric.train_logistic_loss,time_series_metric.val_classification_accuracy,time_series_metric.val_loss,time_series_metric.train_decoder_loss,time_series_metric.val_classification_auprc,time_series_metric.val_logistic_loss
0,experiment-05ebe-ebe-tf-classification-dnn,run-20221111130741,system.ExperimentRun,COMPLETE,05ebe_ebe,projects/1026793852137/locations/us-central1/m...,10.0,05ebe_ebe_20221111130741,splits,Class,...,0.996944,19360.314453,6146540.0,290826.28125,0.006737,0.998301,-6.298358e+21,-4.817085e+21,0.997781,0.011533


Review the Experiments TensorBoard to compare runs:

In [63]:
print(f"The Experiment TensorBoard Link:\nhttps://{REGION}.tensorboard.googleusercontent.com/experiment/{tb.resource_name.replace('/', '+')}+experiments+{exp.name}")

The Experiment TensorBoard Link:
https://us-central1.tensorboard.googleusercontent.com/experiment/projects+1026793852137+locations+us-central1+tensorboards+2899904743654555648+experiments+experiment-05ebe-ebe-tf-classification-dnn


In [64]:
expRun.get_time_series_data_frame()

Unnamed: 0,step,wall_time,train_logistic_accuracy,train_logistic_auprc,val_logistic_accuracy,val_logistic_auprc,train_loss,val_decoder_loss,train_classification_auprc,val_decoder_rmse,train_classification_accuracy,train_decoder_rmse,train_classification_loss,val_classification_loss,train_logistic_loss,val_classification_accuracy,val_loss,train_decoder_loss,val_classification_auprc,val_logistic_loss
0,1,2022-11-11 13:22:30.935000+00:00,0.961282,0.995567,0.998902,0.999252,-3922428000000000.0,-2.778904e+16,0.993438,19337.955078,0.970973,19360.318359,1256.744,12067.05,0.082286,0.998301,-2.778904e+16,-3922428000000000.0,0.997781,0.010563
1,2,2022-11-11 13:22:31.041000+00:00,0.999263,0.999428,0.999115,0.999439,-4.254539e+17,-1.38508e+18,0.995486,19337.955078,0.99654,19360.318359,28323.11,112917.9,0.006046,0.998301,-1.38508e+18,-4.254539e+17,0.997781,0.008255
2,3,2022-11-11 13:22:31.151000+00:00,0.999316,0.999519,0.999186,0.999436,-5.149885e+18,-1.158629e+19,0.996023,19337.955078,0.996953,19360.320312,126757.6,6504.652,0.005185,0.998301,-1.158629e+19,-5.149885e+18,0.997781,0.007913
3,4,2022-11-11 13:22:31.265000+00:00,0.999263,0.999502,0.99915,0.999294,-2.748444e+19,-5.098997e+19,0.995486,19337.955078,0.99654,19360.308594,328757.8,265674.0,0.005984,0.998301,-5.098997e+19,-2.748444e+19,0.997781,0.012127
4,5,2022-11-11 13:22:31.393000+00:00,0.999277,0.999463,0.999221,0.999305,-9.755336e+19,-1.611581e+20,0.99544,19337.955078,0.996505,19360.308594,750521.8,857852.9,0.006815,0.998301,-1.611581e+20,-9.755336e+19,0.997781,0.011169
5,6,2022-11-11 13:22:31.509000+00:00,0.999281,0.999492,0.999186,0.999304,-2.71919e+20,-4.157706e+20,0.995452,19337.955078,0.996514,19360.318359,1263110.0,2624982.0,0.006452,0.998301,-4.157706e+20,-2.71919e+20,0.997781,0.01211
6,7,2022-11-11 13:22:31.623000+00:00,0.999347,0.999411,0.999221,0.999257,-6.455541e+20,-9.341038e+20,0.996023,19337.955078,0.996953,19360.320312,2090514.0,616301.3,0.006775,0.998301,-9.341038e+20,-6.455541e+20,0.997781,0.013139
7,8,2022-11-11 13:22:31.729000+00:00,0.999285,0.999399,0.999221,0.999213,-1.366705e+21,-1.896183e+21,0.995463,19337.955078,0.996523,19360.324219,3117826.0,2851568.0,0.008828,0.998301,-1.896183e+21,-1.366705e+21,0.997781,0.017978
8,9,2022-11-11 13:22:31.817000+00:00,0.999281,0.999394,0.998973,0.999253,-2.653926e+21,-3.563255e+21,0.99544,19337.955078,0.996505,19360.316406,4900270.0,7423477.0,0.008871,0.998301,-3.563255e+21,-2.653926e+21,0.997781,0.014108
9,10,2022-11-11 13:22:31.913000+00:00,0.99929,0.999427,0.999044,0.999348,-4.817085e+21,-6.298358e+21,0.996012,19337.955078,0.996944,19360.314453,6146540.0,290826.3,0.006737,0.998301,-6.298358e+21,-4.817085e+21,0.997781,0.011533


### Create/Retrieve The Endpoint For This Series

#### Endpoint - Full Model

In [65]:
endpoints = aiplatform.Endpoint.list(filter = f"display_name = {SERIES} AND labels.series = {SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Creating Endpoint
Create Endpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368/operations/635330047963561984
Endpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1026793852137/locations/us-central1/endpoints/8032666914671034368')
Endpoint Created: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/8032666914671034368?project=statmike-mlops-349915


In [66]:
endpoint.display_name

'05ebe'

In [67]:
endpoint.traffic_split

{}

In [68]:
deployed_models = endpoint.list_models()
#deployed_models

#### Endpoint - Latent Model

In [69]:
endpoints = aiplatform.Endpoint.list(filter = f"display_name = {SERIES}_latent AND labels.series={SERIES}")
if endpoints:
    endpoint_latent = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint_latent = aiplatform.Endpoint.create(
        display_name = f"{SERIES}_latent",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint_latent.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint_latent.name}?project={PROJECT_ID}')

Creating Endpoint
Create Endpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464/operations/3785597987309223936
Endpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1026793852137/locations/us-central1/endpoints/3139505919532990464')
Endpoint Created: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/3139505919532990464?project=statmike-mlops-349915


In [70]:
endpoint_latent.display_name

'05ebe_latent'

In [71]:
endpoint_latent.traffic_split

{}

In [72]:
deployed_models = endpoint_latent.list_models()
#deployed_models

### Deploy Model To Endpoint - Full Model

In [73]:
print(f'Deploying model with 100% of traffic...')
endpoint.deploy(
    model = model,
    deployed_model_display_name = model.display_name,
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying model with 100% of traffic...
Deploying Model projects/1026793852137/locations/us-central1/models/2080236417333592064 to Endpoint : projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368/operations/1761229954806185984
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368


#### Remove Deployed Models without Traffic

In [74]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 05ebe_ebe with version 1 has traffic = 100


In [75]:
endpoint.traffic_split

{'4613783886613184512': 100}

In [76]:
#endpoint.list_models()

### Deploy Model To Endpoint - Latent Model

In [77]:
print(f'Deploying model with 100% of traffic...')
endpoint_latent.deploy(
    model = model_latent,
    deployed_model_display_name = model_latent.display_name,
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying model with 100% of traffic...
Deploying Model projects/1026793852137/locations/us-central1/models/791081023998787584 to Endpoint : projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464/operations/2447184473050054656
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464


#### Remove Deployed Models without Traffic

In [78]:
for deployed_model in endpoint_latent.list_models():
    if deployed_model.id in endpoint_latent.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint_latent.traffic_split[deployed_model.id]}")
    else:
        endpoint_latent.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 05ebe_ebe_latent with version 1 has traffic = 100


In [79]:
endpoint_latent.traffic_split

{'3825653951823347712': 100}

In [80]:
#endpoint_latent.list_models()

---
## Prediction

See many more details on requesting predictions in the [05Tools - Prediction](./05Tools%20-%20Prediction.ipynb) notebook.

### Prepare a record for prediction: instance and parameters lists

In [81]:
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT 10").to_dataframe()

In [82]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


In [83]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
newob

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

In [84]:
newob = list(newob.values())
newob

[35337,
 1.0928441854981998,
 -0.0132303486713432,
 1.35982868199426,
 2.7315370965921004,
 -0.707357349219652,
 0.8738370029866129,
 -0.7961301510622031,
 0.437706509544851,
 0.39676985012996396,
 0.587438102569443,
 -0.14979756231827498,
 0.29514781622888103,
 -1.30382621882143,
 -0.31782283120234495,
 -2.03673231037199,
 0.376090905274179,
 -0.30040350116459497,
 0.433799615590844,
 -0.145082264348681,
 -0.240427548108996,
 0.0376030733329398,
 0.38002620963091405,
 -0.16764742731151097,
 0.0275573495476881,
 0.59211469704354,
 0.219695164116351,
 0.0369695108704894,
 0.010984441006191,
 0.0]

In [85]:
instances = [json_format.ParseDict(newob, Value())]

In [86]:
#instances

### Get Predictions: Python Client

In [87]:
prediction = endpoint.predict(instances = instances)
prediction

Prediction(predictions=[{'classification': [1.0, 0.0], 'decoder': [1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0], 'logistic': [1.0, 5.18541871e-16]}], deployed_model_id='4613783886613184512', model_version_id='1', model_resource_name='projects/1026793852137/locations/us-central1/models/model_05ebe_ebe', explanations=None)

In [88]:
prediction.predictions[0]

{'classification': [1.0, 0.0],
 'decoder': [1.0,
  1.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0,
  0.0],
 'logistic': [1.0, 5.18541871e-16]}

In [89]:
np.argmax(prediction.predictions[0]['logistic'])

0

In [90]:
endpoint_latent.predict(instances = instances).predictions

[[6600611840.0,
  6586029060.0,
  6598967810.0,
  6539714050.0,
  6605868030.0,
  0.0,
  6588803580.0,
  6590979070.0,
  0.0,
  0.0,
  6597103100.0,
  6592574980.0,
  0.0,
  6605071870.0,
  6577715200.0]]

---
## Example-Based Explanations

**IN DEVELOPMENT**

In [92]:
endpoint.delete(force = True)

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368/operations/4220195351350476800
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/8032666914671034368
Delete Endpoint  backing LRO: projects/1026793852137/locations/us-central1/operations/8831881369777864704
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/8032666914671034368


In [93]:
endpoint_latent.delete(force = True)

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464/operations/100527592213315584
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
Delete Endpoint  backing LRO: projects/1026793852137/locations/us-central1/operations/1805140051173048320
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/3139505919532990464
