# 02b - Vertex AI - AutoML with clients (code)
Use the Vertex AI Python Client to recreate the no-code approach of (02a) with code (Python).  This builds a custom model with AutoML and deploys it to an Endpoint for predictions and explanations.  

**Prerequisites:**
-  01 - BigQuery - Table Data Source

**Overview:**
-  Use Python client google.cloud.aiplatform for Vertex AI
   -  Create a dataset
      -  aiplatform.TabularDataset
      -  Link BigQuery table
   -  Train Model with AutoML
      -  aiplatform.AutoMLTabularTrainingJob
   -  Evaluate
      -  Review the model in GCP Console > Vertex AI > Models
   -  Deploy to Endpoint
      -  Endpoint = aiplatform.Endpoint
      -  Endpoint.deploy
   -  Online Predictions
      -  Endpoint.predict
   -  Explanations
      -  Endpoint.explain
   -  Batch Prediction Job
      -  aiplatform.BatchPredictionJob

**Resources:**
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
-  [AutoML Tabular Training Job With Python Client](https://cloud.google.com/vertex-ai/docs/training/automl-api#aiplatform_create_training_pipeline_tabular_classification_sample-python)
-  [Interpreting Explanations](https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular)

**Related Training:**
-  todo


---
## Vertex AI - Conceptual Flow

<img src="architectures/slides/slide_09.png">

---
## Vertex AI - Workflow

<img src="architectures/slides/slide_10.png">

---
## Setup

inputs:

In [1]:
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
DATANAME = 'fraud'
NOTEBOOK = '02b'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [2]:
from google.cloud import aiplatform
from datetime import datetime

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [4]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
DIR = f"temp/{NOTEBOOK}"

environment:

In [5]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Create Dataset (link to BigQuery table)

In [6]:
dataset = aiplatform.TabularDataset.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}', 
    bq_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/691911073727/locations/us-central1/datasets/5901008537529614336/operations/8153941672825192448
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/691911073727/locations/us-central1/datasets/5901008537529614336
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/691911073727/locations/us-central1/datasets/5901008537529614336')


---
## Train Model with AutoML

In [7]:
column_specs = list(set(dataset.column_names) - set(VAR_OMIT.split()) - set([VAR_TARGET, 'splits']))

In [8]:
column_specs = dict.fromkeys(column_specs, 'auto')

In [9]:
tabular_classification_job = aiplatform.AutoMLTabularTrainingJob(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    optimization_prediction_type = 'classification',
    column_specs = column_specs,
    labels = {'notebook':f'{NOTEBOOK}'}
)

In [10]:
# temporary fix for issue on 9/19/21 that can be removed within 1 week
tabular_classification_job._add_additional_experiments(['training_pipeline_version=legacy'])

In [11]:
model = tabular_classification_job.run(
    dataset = dataset,
    target_column = VAR_TARGET,
    #predefined_split_column_name = 'splits',
        training_fraction_split = 0.8,
        validation_fraction_split = 0.1,
        test_fraction_split = 0.1,
    budget_milli_node_hours = 3000,
    model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    disable_early_stopping = False,
    model_labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/3727161899230429184?project=691911073727
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3727161899230429184 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3727161899230429184 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3727161899230429184 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3727161899230429184 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud

---
## Endpoint and Deployment

In [12]:
endpoint = aiplatform.Endpoint.create(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/691911073727/locations/us-central1/endpoints/2843161147568291840/operations/2471454474246291456
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/691911073727/locations/us-central1/endpoints/2843161147568291840
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/691911073727/locations/us-central1/endpoints/2843161147568291840')


In [13]:
endpoint.deploy(
    model = model,
    deployed_model_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

INFO:google.cloud.aiplatform.models:Deploying Model projects/691911073727/locations/us-central1/models/7914434230313025536 to Endpoint : projects/691911073727/locations/us-central1/endpoints/2843161147568291840
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/691911073727/locations/us-central1/endpoints/2843161147568291840/operations/5380779833527631872
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/691911073727/locations/us-central1/endpoints/2843161147568291840


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [14]:
pred = bigquery.query(query = f"SELECT * FROM {DATANAME}.{DATANAME}_prepped WHERE splits='TEST' LIMIT 10").to_dataframe()

In [15]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,75176,1.235603,0.041383,0.675286,0.836279,-0.675016,-0.657342,-0.154209,-0.067491,0.602617,...,0.088164,0.396205,0.324557,0.18293,-0.017115,0.014979,0.0,0,6f40e111-2131-4031-aef1-268b2daf02a3,TEST
1,112225,-0.285756,0.965688,2.147689,2.838137,1.104026,1.462921,-0.835272,-0.409875,-0.810586,...,-1.222639,0.044689,1.324343,-0.07767,0.128146,0.179651,0.0,0,c9bdeaf0-79d4-4faf-8aaa-14127a0e3513,TEST
2,113420,1.890283,0.241366,-0.165823,4.068924,-0.146807,0.140339,-0.25809,0.084852,-0.056567,...,0.097449,0.01918,0.060621,0.136349,-0.010745,-0.049814,0.0,0,611f4879-490d-4ab5-9e4f-c7944d6165eb,TEST
3,121910,-1.227268,1.555572,1.245848,4.071686,1.154573,2.058276,-1.179951,-2.21114,-2.248168,...,-0.380109,-1.485002,0.237729,0.524141,-0.037543,0.121962,0.0,0,3212444f-1412-49ae-adfa-a1a0ca54d8c5,TEST


In [19]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
#newob

Need to understand the format of variables that the predictions expect.  AutoML may convert the type of some variables. The following cells retrieve the model from the endpoint and its schemata:

In [20]:
newob['Time'] = str(newob['Time'])

In [21]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Get Predictions: Python Client

In [22]:
prediction = endpoint.predict(instances=instances, parameters=parameters)

In [23]:
prediction

Prediction(predictions=[{'classes': ['0', '1'], 'scores': [0.9997270703315735, 0.0002728732360992581]}], deployed_model_id='2414290040084496384', explanations=None)

In [24]:
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'0'

### Get Predictions: REST

In [25]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [26]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    {
      "classes": [
        "0",
        "1"
      ],
      "scores": [
        0.99972707033157349,
        0.00027287323609925812
      ]
    }
  ],
  "deployedModelId": "2414290040084496384"
}


### Get Predictions: gcloud (CLI)

In [27]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'classes': ['0', '1'], 'scores': [0.9997270703315735, 0.0002728732360992581]}]


### Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [59]:
batch = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    model_name = model.name,
    instances_format = "bigquery",
    predictions_format = "bigquery",
    bigquery_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    bigquery_destination_prefix = f"{PROJECT_ID}",
    generate_explanation = True,
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/7908657396220690432?project=691911073727
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/7908657396220690432 current state:
JobState.JOB_STAT

---
## Explanations
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [28]:
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [29]:
explanation.predictions

[{'scores': [0.9997270703315735, 0.0002728732360992581],
  'classes': ['0', '1']}]

In [30]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.9999616146087646
instance output 0.9997270703315735
output_index [0]
output display value 0
approximation error 0.006383512585511742


In [31]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.9999616146087646
instance_output_value: 0.9997270703315735
feature_attributions {
  struct_value {
    fields {
      key: "Amount"
      value {
        number_value: -2.691480848524305e-05
      }
    }
    fields {
      key: "Time"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V1"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V10"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V11"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V12"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V13"
      value {
        number_value: -5.365080303615994e-05
      }
    }
    fields {
      key: "V14"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V15"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "V16"
      value {
        number_value: 0.0
      }
 

---
## Remove Resources
see notebook "XX - Cleanup"