# AutoML with Vertex AI - Python Clients
## Orchestrated with Vertex AI - Pipelines

This notebook uses Python Clients for Vertex AI to recreate this workflow:
- create a Dataset linked to BigQuery table
- Train an AutoML model for the Dataset
- Deploy the Model to and Endpoint
- Create Batch Predictions
- Create Online Predictions

**Prerequisites**
- `00 - Initial Setup`
- `01 - BigQuery Data`

**Resources**
- API: https://googleapis.dev/python/aiplatform/latest/aiplatform.html

https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#google-cloud-components

https://github.com/kubeflow/pipelines/tree/master/components/google-cloud

https://codelabs.developers.google.com/vertex-mlmd-pipelines#0

**Overview**

<img src="architectures/statmike-mlops-A2.png">

---

In [95]:
from google.cloud import aiplatform
from datetime import datetime
import kfp
import kfp.v2.dsl as dsl
from google_cloud_pipeline_components import aiplatform as gcc_aip

In [126]:
# Locations
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
BUCKET_NAME='gs://{}/digits/model/02c-automl'.format(PROJECT_ID)
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
EXPERIMENT_NAME = '02c-automl'
JOB_NAME = 'job-'+EXPERIMENT_NAME+'_'+TIMESTAMP
MODEL_DIR = '{}/{}'.format(BUCKET_NAME, JOB_NAME)
PIPELINE_ROOT = f"{MODEL_DIR}/pipeline_root/"
PARENT = "projects/" + PROJECT_ID + "/locations/" + REGION

# files
PACKAGE = EXPERIMENT_NAME

# Resources
DEPLOY_COMPUTE='n1-standard-4'

Give service account roles/storage.objectAdmin permissions: Console > IAM > Select account projectnumber-compute@developer.gserviceaccount.com > edit - give it the role

In [127]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'691911073727-compute@developer.gserviceaccount.com'

In [128]:
aiplatform.init(project=PROJECT_ID, location=REGION)

---


In [129]:
!rm -rf {PACKAGE}
!mkdir {PACKAGE}

In [130]:
# Pipeline - was EXPERIMENT_NAME
@kfp.dsl.pipeline(name='testname', pipeline_root=PIPELINE_ROOT)
def pipeline(
    project: str = PROJECT_ID,
    display_name: str = JOB_NAME,
    deploy_machine: str = DEPLOY_COMPUTE,
    bq_source: str = "bq://statmike-mlops.digits.digits_source",
    
):
    
    # dataset
    dataset = gcc_aip.TabularDatasetCreateOp(
        project = project,
        display_name = display_name,
        bq_source = bq_source
    )
    
    # training
    transformations = []
    for i in range(64):
        transformations.append({'auto': {'column_name': 'p'+str(i)}})
        
    model = gcc_aip.AutoMLTabularTrainingJobRunOp(
        project = project,
        display_name = display_name,
        optimization_prediction_type = "classification",
        budget_milli_node_hours = 1000,
        disable_early_stopping=False,
        column_transformations = transformations,
        dataset = dataset.outputs['dataset'],
        target_column = "target",
        training_fraction_split=0.8,
        validation_fraction_split=0.1,
        test_fraction_split=0.1,
    )
    
    # Endpoint Deployment
    endpoint = gcc_aip.ModelDeployOp(
        model = model.outputs["model"],
        project = project,
        machine_type = deploy_machine
    )

In [131]:
# Compile
kfp.v2.compiler.Compiler().compile(
    pipeline_func=pipeline, package_path=f"{PACKAGE}/{EXPERIMENT_NAME}.json"
)

In [132]:
# Move to GCS
!gsutil cp {PACKAGE}/*.json $PIPELINE_ROOT

Copying file://02c-automl/02c-automl.json [Content-Type=application/json]...
/ [1 files][ 12.0 KiB/ 12.0 KiB]                                                
Operation completed over 1 objects/12.0 KiB.                                     


In [133]:
JOB_NAME

'job-02c-automl_20210908124535'

In [134]:
EXPERIMENT_NAME

'02c-automl'

In [135]:
# Make Job
plJob = aiplatform.PipelineJob(
    display_name=EXPERIMENT_NAME,
    template_path=f"{PIPELINE_ROOT}{EXPERIMENT_NAME}.json",
    pipeline_root=PIPELINE_ROOT
)

In [136]:
# Run Job
response = plJob.run(service_account = SERVICE_ACCOUNT)

INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/691911073727/locations/us-central1/pipelineJobs/testname-20210908124552
INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:
INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/691911073727/locations/us-central1/pipelineJobs/testname-20210908124552')
INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/testname-20210908124552?project=691911073727
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/691911073727/locations/us-central1/pipelineJobs/testname-20210908124552 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/691911073727/locations/us-central1/pipelineJobs/testname-20210908124552 current s

In [165]:
aiplatform.get_pipeline_df(pipeline='testname')

Unnamed: 0,pipeline_name,run_name,param.input:project,param.input:display_name,param.input:bq_source,param.input:deploy_machine
0,testname,testname-20210908124552,statmike-mlops,job-02c-automl_20210908124535,bq://statmike-mlops.digits.digits_source,n1-standard-4


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [137]:
%%bigquery pred
SELECT *
FROM `digits.digits_source`
LIMIT 10

Query complete after 0.01s: 100%|██████████| 1/1 [00:00<00:00, 469.32query/s]                          
Downloading: 100%|██████████| 10/10 [00:01<00:00,  5.14rows/s]


In [138]:
pred.head(4)

Unnamed: 0,p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,...,p56,p57,p58,p59,p60,p61,p62,p63,target,target_OE
0,0.0,5.0,16.0,15.0,5.0,0.0,0.0,0.0,0.0,2.0,...,0.0,6.0,16.0,16.0,16.0,16.0,7.0,0.0,2,Even
1,0.0,5.0,16.0,12.0,1.0,0.0,0.0,0.0,0.0,5.0,...,0.0,8.0,16.0,16.0,16.0,16.0,4.0,0.0,2,Even
2,0.0,5.0,15.0,16.0,6.0,0.0,0.0,0.0,0.0,11.0,...,0.0,6.0,16.0,16.0,16.0,13.0,3.0,0.0,2,Even
3,0.0,4.0,15.0,15.0,8.0,0.0,0.0,0.0,0.0,8.0,...,0.0,7.0,14.0,11.0,0.0,0.0,0.0,0.0,2,Even


In [139]:
newob = pred.loc[:0,'p0':'p63'].to_dict(orient='records')[0]
#newob

In [140]:
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value

instances=[json_format.ParseDict(newob, Value())]
parameters=json_format.ParseDict({}, Value())

### Prediction: Online

In [145]:
ename = 'job-02c-automl_20210908124535_endpoint'

In [146]:
endpoint = aiplatform.Endpoint.list(filter=f'display_name={ename}')[0]

In [149]:
# get predictions
prediction = endpoint.predict(instances=instances,parameters=parameters)

In [150]:
prediction

Prediction(predictions=[{'classes': ['4', '5', '3', '1', '0', '8', '2', '7', '9', '6'], 'scores': [8.749476837488501e-13, 1.520173764646415e-09, 1.218805039115978e-07, 3.909719481498541e-08, 2.406981731983837e-12, 1.94232666217431e-06, 0.9999978542327881, 1.186109521711387e-08, 1.392742032813032e-10, 9.13831094129236e-11]}], deployed_model_id='3566930069714632704', explanations=None)

In [151]:
import numpy as np
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'2'

## Explanation
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [152]:
# get explanations
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [153]:
explanation.predictions

[{'scores': [8.749476837488501e-13,
   1.520173764646415e-09,
   1.218805039115978e-07,
   3.909719481498541e-08,
   2.406981731983837e-12,
   1.94232666217431e-06,
   0.9999978542327881,
   1.186109521711387e-08,
   1.392742032813032e-10,
   9.13831094129236e-11],
  'classes': ['4', '5', '3', '1', '0', '8', '2', '7', '9', '6']}]

In [154]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.002922810148447752
instance output 0.9999978542327881
output_index [6]
output display value 2
approximation error 0.0240365035235022


In [21]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.0011145027820020914
instance_output_value: 1.0
feature_attributions {
  struct_value {
    fields {
      key: "p0"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p1"
      value {
        number_value: 0.005055854970123619
      }
    }
    fields {
      key: "p10"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p11"
      value {
        number_value: 0.00138039983409856
      }
    }
    fields {
      key: "p12"
      value {
        number_value: 0.001151186447324497
      }
    }
    fields {
      key: "p13"
      value {
        number_value: 0.07197858341221165
      }
    }
    fields {
      key: "p14"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p15"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p16"
      value {
        number_value: 0.0
      }
    }
    fields {
      key: "p17"
      value {
        number_value: 0.

### Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [163]:
# batch predictions
batch = aiplatform.BatchPredictionJob.create(
    job_display_name='bq_digits_code_'+TIMESTAMP,
    model_name=endpoint.list_models()[0].model,
    instances_format="bigquery",
    predictions_format="bigquery",
    bigquery_source="bq://statmike-mlops.digits.digits_source",
    bigquery_destination_prefix="statmike-mlops",
    generate_explanation=True
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/1358699503791636480
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/691911073727/locations/us-central1/batchPredictionJobs/1358699503791636480')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/1358699503791636480?project=691911073727
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/1358699503791636480 current state:
JobState.JOB_STATE_PENDING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/1358699503791636480 current state:
JobState.JOB_STAT