# AutoML with Vertex AI - Python Clients

This notebook uses Python Clients for Vertex AI to recreate this workflow:
- create a Dataset linked to BigQuery table
- Train an AutoML model for the Dataset
- Deploy the Model to and Endpoint
- Create Batch Predictions
- Create Online Predictions

**Prerequisites**
- `00 - Initial Setup`

**Resources**
- API: https://googleapis.dev/python/aiplatform/latest/aiplatform.html

**Overview**

<img src="architectures/statmike-mlops-A2.png">

---
## Setup

Import Libraries

In [1]:
from google.cloud import aiplatform

Parameters

In [2]:
# Locations
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

# current time to help with unique resource labeling
from datetime import datetime
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)

---
## Create Dataset (link to BigQuery table)

In [5]:
# dataset
dataset = aiplatform.TabularDataset.create(display_name='bq_digits_code', bq_source='bq://statmike-mlops.digits.digits_source')

INFO:google.cloud.aiplatform.datasets.dataset:Creating TabularDataset
INFO:google.cloud.aiplatform.datasets.dataset:Create TabularDataset backing LRO: projects/691911073727/locations/us-central1/datasets/3579561259294523392/operations/5385608338841010176
INFO:google.cloud.aiplatform.datasets.dataset:TabularDataset created. Resource name: projects/691911073727/locations/us-central1/datasets/3579561259294523392
INFO:google.cloud.aiplatform.datasets.dataset:To use this TabularDataset in another session:
INFO:google.cloud.aiplatform.datasets.dataset:ds = aiplatform.TabularDataset('projects/691911073727/locations/us-central1/datasets/3579561259294523392')


---
## Train Model with AutoML

In [27]:
#automl model
transformations = []

for i in range(64):
    transformations.append({'auto': {'column_name': 'p'+str(i)}})

In [28]:
tabular_classification_job = aiplatform.AutoMLTabularTrainingJob(display_name='bq_digits_code_'+TIMESTAMP, optimization_prediction_type = 'classification', column_transformations=transformations)

In [29]:
model = tabular_classification_job.run(
        dataset=dataset,
        target_column='target',
        training_fraction_split=0.8,
        validation_fraction_split=0.1,
        test_fraction_split=0.1,
        budget_milli_node_hours=1000,
        model_display_name='bq_digits_code_'+TIMESTAMP,
        disable_early_stopping=False,
        sync=True,
    )

INFO:google.cloud.aiplatform.training_jobs:View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/3915092625673158656?project=691911073727
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3915092625673158656 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3915092625673158656 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3915092625673158656 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.training_jobs:AutoMLTabularTrainingJob projects/691911073727/locations/us-central1/trainingPipelines/3915092625673158656 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud

---
## Endpoint and Deployment

In [36]:
# endpoint
endpoint = aiplatform.Endpoint.create(display_name='bq_digits_code_'+TIMESTAMP)

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/691911073727/locations/us-central1/endpoints/6358150287986786304/operations/732897342558044160
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/691911073727/locations/us-central1/endpoints/6358150287986786304
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/691911073727/locations/us-central1/endpoints/6358150287986786304')


In [37]:
# deploy model to endpoint
endpoint.deploy(
    model=model,
    deployed_model_display_name='bq_digits_code_DEPLOYED_'+TIMESTAMP,
    traffic_percentage = 100,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 1,
    sync=True
)

INFO:google.cloud.aiplatform.models:Deploying Model projects/691911073727/locations/us-central1/models/6423689977095258112 to Endpoint : projects/691911073727/locations/us-central1/endpoints/6358150287986786304
INFO:google.cloud.aiplatform.models:Deploy Endpoint model backing LRO: projects/691911073727/locations/us-central1/endpoints/6358150287986786304/operations/1025631318337126400
INFO:google.cloud.aiplatform.models:Endpoint model deployed. Resource name: projects/691911073727/locations/us-central1/endpoints/6358150287986786304


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [40]:
%%bigquery pred
SELECT *
FROM `digits.digits_source`
LIMIT 10

Query complete after 0.00s: 100%|██████████| 1/1 [00:00<00:00, 422.86query/s]                          
Downloading: 100%|██████████| 10/10 [00:01<00:00,  8.97rows/s]


In [41]:
pred.head(1)

Unnamed: 0,p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,...,p56,p57,p58,p59,p60,p61,p62,p63,target,target_OE
0,0.0,5.0,16.0,15.0,5.0,0.0,0.0,0.0,0.0,2.0,...,0.0,6.0,16.0,16.0,16.0,16.0,7.0,0.0,2,Even


In [42]:
newob = pred.loc[:0,'p0':'p63'].to_dict(orient='records')[0]
#newob

In [44]:
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

instances=[json_format.ParseDict(newob, Value())]
parameters=json_format.ParseDict({}, Value())

### Prediction: Online

In [57]:
# get predictions
prediction = endpoint.predict(instances=instances,parameters=parameters)

In [58]:
prediction

Prediction(predictions=[{'classes': ['4', '5', '3', '1', '0', '8', '2', '7', '9', '6'], 'scores': [0.001207366120070219, 0.0007000679615885019, 0.009571815840899944, 0.02577057108283043, 0.0006502495962195098, 0.0413755476474762, 0.9119542837142944, 0.003187478519976139, 0.002909382106736302, 0.002673302078619599]}], deployed_model_id='3852248939075993600', explanations=None)

In [59]:
import numpy as np
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'2'

## Explanation
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [94]:
# get explanations
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [95]:
explanation.predictions

[{'classes': ['4', '5', '3', '1', '0', '8', '2', '7', '9', '6'],
  'scores': [0.001207366120070219,
   0.0007000679615885019,
   0.009571815840899944,
   0.02577057108283043,
   0.0006502495962195098,
   0.0413755476474762,
   0.9119542837142944,
   0.003187478519976139,
   0.002909382106736302,
   0.002673302078619599]}]

In [97]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.0020054997876286507
instance output 0.9119542837142944
output_index [6]
output display value 2
approximation error 0.025497518298939547


In [98]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.0020054997876286507
instance_output_value: 0.9119542837142944
feature_attributions {
  struct_value {
    fields {
      key: "p0"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p1"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p10"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p11"
      value {
        number_value: 0.01840283994429878
      }
    }
    fields {
      key: "p12"
      value {
        number_value: 0.06219814651246582
      }
    }
    fields {
      key: "p13"
      value {
        number_value: 0.008858684051249708
      }
    }
    fields {
      key: "p14"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p15"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p16"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p17"
      value {
        number_value: 0.0


### Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [92]:
# batch predictions
batch = aiplatform.BatchPredictionJob.create(
    job_display_name='bq_digits_code_'+TIMESTAMP,
    model_name=model.name,
    instances_format="bigquery",
    predictions_format="bigquery",
    bigquery_source="bq://statmike-mlops.digits.digits_source",
    bigquery_destination_prefix="statmike-mlops",
    generate_explanation=True
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/3040831348009861120
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/691911073727/locations/us-central1/batchPredictionJobs/3040831348009861120')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/3040831348009861120?project=691911073727
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/3040831348009861120 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob  run. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/3040831348009861120
