# 02a - Vertex AI - AutoML in GCP Console (no code)
Use Vertex AI from the GCP Console for a no-code approach building a custom model with AutoML and deploy it for predictions.

**Prerequisites:**
-  01 - BigQuery - Table Data Source

**Overview:**
-  Use the GCP Console > Vertex AI to
   -  Create a Dataset
      -  Pick data type and objective: Tabular, Regression/Classification
      -  Link to BigQuery Table for source data
   -  Create a Training Job
      -  Select the Dataset
      -  Objective = Classification with AutoML
   -  Evaluate Model
   -  Deploy Model to Endpoint
   -  Create a Batch Prediction Job 
      -  Use the Model and write predictions to BigQuery

**Resources:**
-  [AutoML Tabular Training Job With GCP Console](https://cloud.google.com/vertex-ai/docs/training/automl-console#tabular)

**Related Training:**
-  [Tutorial for AutoML Tabular](https://cloud.google.com/vertex-ai/docs/tutorials/tabular-automl)
-  todo


---
## Vertex AI - Conceptual Flow

<img src="architectures/slides/02a_arch.png">

---
## Vertex AI - Workflow

<img src="architectures/slides/02a_console.png">

---
## Create Dataset (link to BigQuery table)

From the Console:
- Go to Vertex AI
- Selected `Datasets`
- Select `CREATE DATASET`

<img src="architectures/notebooks/02A_Screenshots/ds_1.png">

- Name the dataset `02a`
- Select `Tabular` and `Regression/classification`
    - [More on Model Types](https://cloud.google.com/vertex-ai/docs/start/automl-model-types)
- Click `Create`

<img src="architectures/notebooks/02A_Screenshots/ds_2.png">

- Under Select a data source pick `Select a table or view from BigQuery`
- Enter the BigQuery path (or browse) to the prepped table created in notebook 01
- Click `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/ds_3.png">

- The `ANALYZE` tab for the dataset will be displayed for review:

<img src="architectures/notebooks/02A_Screenshots/ds_4.png">

- Going back to the `Datasets` dashboard will display the registered dataset

<img src="architectures/notebooks/02A_Screenshots/ds_5.png">

---
## Train Model with AutoML

On The Vertex AI console, select `Training`:

<img src="architectures/notebooks/02A_Screenshots/train_1.png">

Next to `Training` (near the top), select `CREATE`
- For Dataset enter `02a`
- For Objective make sure `Classification` is selected
- Use `AutoML` for the method
- Click `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/train_2.png">

For `Model Details`:
- Keep the default `Model name` which appends a datetime to the end of the dataset name
- For `Target column` select the column to train predictions for
- Expand `ADVANCED OPTIONS`:
    - Select `Manual` for the Data split method
    - Select the `splits` variables that was created in Notebook 01
- Click `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/train_3.png">

For `Training options`:
- Click the `-` symbol next to any rows for variables that should be excluded from training, like the `transaction_id`
- More on Adavanced Options:
    - [Model Weights](https://cloud.google.com/vertex-ai/docs/datasets/prepare-tabular#weight)
    - [Optimization Objectives](https://cloud.google.com/vertex-ai/docs/training/tabular-opt-obj)
        - Pick AUC PR (Due to imbalance in Class)
- Click `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/train_4.png">

For `Compute and pricing`:
- Enter a `Budget` of 1 node hour
    - A guide for choosing the right amount of time can be found [here](https://cloud.google.com/vertex-ai/docs/training/automl-console#tabular): 
- Make sure `Enable early stopping` is toggled on
- Click `START TRAINING`

<img src="architectures/notebooks/02A_Screenshots/train_5.png">

Return to the Vertex AI console `Training` Menu:
- Once the model completes training the name will be accompanied by a green check mark

<img src="architectures/notebooks/02A_Screenshots/train_6.png">

---
## Model: Evaluate, Select, Deploy

On the Vertex AI console, select `Models`

<img src="architectures/notebooks/02A_Screenshots/model_1.png">

Select the model that was just trained - starts with `02a`:
- This brings up the `EVALUATE` tab for the model

<img src="architectures/notebooks/02A_Screenshots/model_2.png">

Select the tab labeled `DEPLOY & TEST`:

<img src="architectures/notebooks/02A_Screenshots/model_3.png">

---
## Endpoint

While still on the Vetex AI `Models` section with the `DEPLOY & TEST` tab selected:
- select `DEPLOY TO ENDPOINT`

<img src="architectures/notebooks/02A_Screenshots/model_3.png">

In the `Deploy to endpoint` menus, complete `Define your endpoint`:
- For Endpoint name use `02a`
- keep defaults for location and Access
- Select `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/endpoint_1.png">

In the `Model settings` section:
- Traffic split should be 100
- minimum number of computes nodes is 1
- keep the remaining default values for max nodes, scaling, logging and explainability
- Select `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/endpoint_2.png">

In the `Model monitoring` section:
- Toggle `Enable model monitoring for this endpoint` on
    - for monitoring job use the name `02a`
    - use defaults for the other menue items
- Select `CONTINUE`

<img src="architectures/notebooks/02A_Screenshots/endpoint_3.png">

In the `Monitoring objectives` section:
- Select `Prediction Drift Detection` under Monitoring objective
- Select `DEPLOY`

<img src="architectures/notebooks/02A_Screenshots/endpoint_4.png">

Once the model is done being deployed to the endpoint, click the `Endpoints` section of Vertex AI:
- Select the endpoint that starts with `02a`
- Review the endpoint dashboard for the deployed model

<img src="architectures/notebooks/02A_Screenshots/endpoint_5.png">


---
## Batch

In the Verex AI console select the `Batch predictions` section:

<img src="architectures/notebooks/02A_Screenshots/batch_1.png">

Select `Create`:
- name the prediction `02a`
- for model name select the model that starts with `02a`
- for Select source, pick BigQuery table
- provide the location of the BigQuery source table
- for storage location pick output format of BigQuery
- provide the project for output in BigQuery
- select `Generate feature importance`
- select `Enable feature attributions for this model`
- select `CREATE`

<img src="architectures/notebooks/02A_Screenshots/batch_2.png">

Once the batch prediction job completes it will be listed with a green checkmark under `Batch Predictions`

<img src="architectures/notebooks/02A_Screenshots/batch_3.png">

Selecting the batch prediction job that starts with `02a` bring up the details of the prediction job

<img src="architectures/notebooks/02A_Screenshots/batch_4.png">

Select the linked BigQuery output table next to `Export location`:

<img src="architectures/notebooks/02A_Screenshots/batch_5.png">

---
## Prediction

inputs:

In [1]:
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
DATANAME = 'fraud'
NOTEBOOK = '02a'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [2]:
from google.cloud import aiplatform

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [4]:
DIR = f"temp/{NOTEBOOK}"

environment:

In [5]:
!rm -rf {DIR}
!mkdir -p {DIR}

### Prepare a record for prediction: instance and parameters lists

In [6]:
pred = bigquery.query(query = f"SELECT * FROM {DATANAME}.{DATANAME}_prepped WHERE splits='TEST' LIMIT 10").to_dataframe()

In [7]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,46100,0.971963,-0.064002,1.864457,2.52122,-0.700822,1.660426,-1.206327,0.717368,0.1709,...,-0.005019,-0.259129,0.152646,0.207666,0.093585,0.022994,0.0,0,3eddd943-117e-4ba9-a09b-c0e2fe8c5647,TEST
1,80430,-0.9297,0.194664,1.549227,1.69343,1.038639,-0.214545,-0.032843,0.248009,-0.598265,...,0.263542,-0.060427,-1.22793,-0.634669,0.062523,0.343741,0.0,0,8f25f7a0-63a0-4a7b-a0de-cdebe813ccee,TEST
2,123919,1.890586,0.271313,-0.157228,4.064907,-0.109914,0.150175,-0.230044,0.061367,-0.119159,...,0.092487,0.030498,0.074608,0.134964,-0.008043,-0.048927,0.0,0,a0da598e-391b-4b9b-a0ba-f8977d0b43b0,TEST
3,141191,-2.857621,-0.307727,1.521266,4.500119,1.812809,2.276221,-0.425395,0.895603,-1.564402,...,1.374592,-1.723268,0.127809,0.20751,-0.08213,-0.009999,0.0,0,7159db57-0ebd-4b99-a1b4-de6996ae0bd7,TEST


In [8]:
#newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET])]].to_dict(orient='records')[0]
#newob

Need to understand the format of variables that the predictions expect.  AutoML may convert the type of some variables. The following cells retrieve the model from the endpoint and its schemata:

In [9]:
newob['Time'] = str(newob['Time'])

In [10]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Get Predictions: Python Client

In [11]:
endpoint = aiplatform.Endpoint.list(filter=f'display_name={NOTEBOOK}')[0]
endpoint.display_name

'02a'

In [12]:
prediction = endpoint.predict(instances=instances, parameters=parameters)
prediction

Prediction(predictions=[{'classes': ['0', '1'], 'scores': [0.9989382028579712, 0.001061751274392009]}], deployed_model_id='6280067370228645888', explanations=None)

In [13]:
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'0'

### Get Predictions: REST

In [14]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [15]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    {
      "classes": [
        "0",
        "1"
      ],
      "scores": [
        0.99893820285797119,
        0.001061751274392009
      ]
    }
  ],
  "deployedModelId": "6280067370228645888"
}


### Get Predictions: gcloud (CLI)

In [16]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'classes': ['0', '1'], 'scores': [0.9989382028579712, 0.001061751274392009]}]


---
## Explanations
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [17]:
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [18]:
explanation.predictions

[{'scores': [0.9989382028579712, 0.001061751274392009], 'classes': ['0', '1']}]

In [19]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.9885345101356506
instance output 0.9989382028579712
output_index [0]
output display value 0
approximation error 0.014094694411026211


In [20]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.9885345101356506
instance_output_value: 0.9989382028579712
feature_attributions {
  struct_value {
    fields {
      key: "Amount"
      value {
        number_value: -0.0002703732914394803
      }
    }
    fields {
      key: "Time"
      value {
        number_value: -9.998348024156358e-05
      }
    }
    fields {
      key: "V1"
      value {
        number_value: 0.001893440882364909
      }
    }
    fields {
      key: "V10"
      value {
        number_value: 0.0004069407780965169
      }
    }
    fields {
      key: "V11"
      value {
        number_value: -0.001008821858300103
      }
    }
    fields {
      key: "V12"
      value {
        number_value: 0.003252426783243815
      }
    }
    fields {
      key: "V13"
      value {
        number_value: -7.198916541205512e-06
      }
    }
    fields {
      key: "V14"
      value {
        number_value: -0.0009798407554626465
      }
    }
    fields {
      key: "V15"
      value {
        num

---
## Remove Resources
see notebook "99 - Cleanup"