# 02c - Vertex AI > Pipelines - AutoML with clients (code) In automated pipeline

Use[ Kubeflow](https://www.kubeflow.org/) Pipelines running on [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction) to orchestrate the process of training a custom model with AutoML Tabular and deploy it to a Vertex AI Endpoint for serving (online and batch) predictions and explanations.  This demonstrates how to automate the processes of (02a) or (02b) with pipeline orchestration.

**Prerequisites:**

-  01 -  BigQuery - Table Data Source

**Overview:**

-  Use Kubeflow Python SDK to build a pipeline
   -  Create pipeline using Google Cloud Pipeline Components (from google_cloud_pipeline_components import aiplatform as gcc_aip)
      -  Use cc_aip.TabularDatasetCreateOp to register dataset from BigQuery table
      -  Train AutoML tabular model with gcc_aip.AutoMLTabularTrainingJobRunOp
      -  Deploy model to endpoint using gcc_aip.ModelDeployOp
   -  Compile the pipeline
      -  kfp.v2.compiler.Compiler().compile
   -  Move the pipeline code to GCS Bucket
   -  Run the pipeline with google.cloud.aiplatform.PipelineJob
-  Online Predictions using Vertex AI Endpoint
-  Online Explanations using Vertex AI Endpoint
-  Batch Prediction Job for predictions and explanation with source and destination tables in BigQuery

**Resources:**

-  [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/build-pipeline#google-cloud-components) see aiplatform.PipelineJob
-  [Python Client for Vertex AI](https://googleapis.dev/python/aiplatform/latest/aiplatform.html)
-  [Kubeflow Pipelines Components for Google Cloud](https://github.com/kubeflow/pipelines/tree/master/components/google-cloud)

**Related Training:**

-  Codelab: [Vertex AI Pipelines Introduction](https://codelabs.developers.google.com/vertex-mlmd-pipelines#0)
-  todo

---
## Vertex AI - Conceptual Flow

<img src="architectures/slides/02c_arch.png">

---
## Vertex AI - Workflow

<img src="architectures/slides/02c_console.png">

---
## Setup

inputs:

In [1]:
REGION = 'us-central1'
PROJECT_ID='statmike-mlops'
DATANAME = 'fraud'
NOTEBOOK = '02c'

# Resources
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [2]:
from google.cloud import aiplatform
from datetime import datetime
import kfp
import kfp.v2.dsl as dsl
from google_cloud_pipeline_components import aiplatform as gcc_aip

from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np

clients:

In [3]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bigquery = bigquery.Client()

parameters:

In [4]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{DATANAME}/models/{NOTEBOOK}"
DIR = f"temp/{NOTEBOOK}"

In [5]:
# Give service account roles/storage.objectAdmin permissions
# Console > IMA > Select Account <projectnumber>-compute@developer.gserviceaccount.com > edit - give role
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'691911073727-compute@developer.gserviceaccount.com'

environment:

In [6]:
!rm -rf {DIR}
!mkdir -p {DIR}

---
## Pipeline (KFP) Definition
- Flow
    - Create Vertex AI Dataset from link to BigQuery table
    - Create Vertex AI AutoML Tabular Training Job
    - Create Endpoint and Depoy trained model

In [7]:
@kfp.dsl.pipeline(name = f'kfp-{NOTEBOOK}-{DATANAME}-{TIMESTAMP}', pipeline_root = URI+'/'+str(TIMESTAMP)+'/kfp/')
def pipeline(
    project: str = PROJECT_ID,
    dataname: str = DATANAME,
    display_name: str = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    deploy_machine: str = DEPLOY_COMPUTE,
    bq_source: str = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    var_target: str = VAR_TARGET,
    var_omit: str = VAR_OMIT,
    label: str = NOTEBOOK 
):
    
    # dataset
    dataset = gcc_aip.TabularDatasetCreateOp(
        project = project,
        display_name = display_name,
        bq_source = bq_source,
        labels = {'notebook':f'{label}'}
    )
    
    # get feature names
    from google.cloud import bigquery
    bigquery = bigquery.Client(project = PROJECT_ID)
    query = f"SELECT * FROM {DATANAME}.INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = '{DATANAME}_prepped'"
    schema = bigquery.query(query).to_dataframe()
    OMIT = VAR_OMIT.split() + [VAR_TARGET, 'splits']
    features = schema[~schema.column_name.isin(OMIT)].column_name.tolist()
    features = dict.fromkeys(features, 'auto')
    
    # training
    model = gcc_aip.AutoMLTabularTrainingJobRunOp(
        project = project,
        display_name = display_name,
        optimization_prediction_type = "classification",
        budget_milli_node_hours = 1000,
        disable_early_stopping=False,
        column_specs = features,
        dataset = dataset.outputs['dataset'],
        target_column = var_target,
        predefined_split_column_name = 'splits',
        #training_fraction_split = 0.8,
        #validation_fraction_split = 0.1,
        #test_fraction_split = 0.1,
        labels = {'notebook':f'{label}'}
    )
    
    # Endpoint Deployment
    endpoint = gcc_aip.ModelDeployOp(
        model = model.outputs["model"],
        project = project,
        #display_name = display_name,
        machine_type = deploy_machine,
        #labels = {'notebook':f'{label}'}
    )

---
## Compile Pipeline

In [8]:
kfp.v2.compiler.Compiler().compile(
    pipeline_func = pipeline,
    package_path = f"{DIR}/{NOTEBOOK}.json"
)

Move compiled pipeline files to GCS Bucket

In [9]:
!gsutil cp {DIR}/{NOTEBOOK}.json {URI}/{TIMESTAMP}/kfp/

Copying file://temp/02c/02c.json [Content-Type=application/json]...
/ [1 files][ 12.0 KiB/ 12.0 KiB]                                                
Operation completed over 1 objects/12.0 KiB.                                     


---
## Create Vertex AI Pipeline Job

In [10]:
pipeline = aiplatform.PipelineJob(
    display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    template_path = f"{URI}/{TIMESTAMP}/kfp/{NOTEBOOK}.json",
    pipeline_root = f"{URI}/{TIMESTAMP}/kfp/",
    labels = {'notebook':f'{NOTEBOOK}'}
)

In [11]:
response = pipeline.run(service_account = SERVICE_ACCOUNT)

INFO:google.cloud.aiplatform.pipeline_jobs:Creating PipelineJob
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob created. Resource name: projects/691911073727/locations/us-central1/pipelineJobs/kfp-02c-fraud-20210929191634-20210929191726
INFO:google.cloud.aiplatform.pipeline_jobs:To use this PipelineJob in another session:
INFO:google.cloud.aiplatform.pipeline_jobs:pipeline_job = aiplatform.PipelineJob.get('projects/691911073727/locations/us-central1/pipelineJobs/kfp-02c-fraud-20210929191634-20210929191726')
INFO:google.cloud.aiplatform.pipeline_jobs:View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/us-central1/pipelines/runs/kfp-02c-fraud-20210929191634-20210929191726?project=691911073727
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/691911073727/locations/us-central1/pipelineJobs/kfp-02c-fraud-20210929191634-20210929191726 current state:
PipelineState.PIPELINE_STATE_RUNNING
INFO:google.cloud.aiplatform.pipeline_jobs:PipelineJob projects/6

In [12]:
aiplatform.get_pipeline_df(pipeline = f'kfp-{NOTEBOOK}-{DATANAME}-{TIMESTAMP}')

Unnamed: 0,pipeline_name,run_name,param.input:dataname,param.input:project,param.input:bq_source,param.input:var_omit,param.input:display_name,param.input:label,param.input:var_target,param.input:deploy_machine
0,kfp-02c-fraud-20210929191634,kfp-02c-fraud-20210929191634-20210929191726,fraud,statmike-mlops,bq://statmike-mlops.fraud.fraud_prepped,transaction_id,02c_fraud_20210929191634,02c,Class,n1-standard-4


---
## Prediction

### Prepare a record for prediction: instance and parameters lists

In [13]:
pred = bigquery.query(query = f"SELECT * FROM {DATANAME}.{DATANAME}_prepped WHERE splits='TEST' LIMIT 10").to_dataframe()

In [14]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,46100,0.971963,-0.064002,1.864457,2.52122,-0.700822,1.660426,-1.206327,0.717368,0.1709,...,-0.005019,-0.259129,0.152646,0.207666,0.093585,0.022994,0.0,0,3eddd943-117e-4ba9-a09b-c0e2fe8c5647,TEST
1,80430,-0.9297,0.194664,1.549227,1.69343,1.038639,-0.214545,-0.032843,0.248009,-0.598265,...,0.263542,-0.060427,-1.22793,-0.634669,0.062523,0.343741,0.0,0,8f25f7a0-63a0-4a7b-a0de-cdebe813ccee,TEST
2,123919,1.890586,0.271313,-0.157228,4.064907,-0.109914,0.150175,-0.230044,0.061367,-0.119159,...,0.092487,0.030498,0.074608,0.134964,-0.008043,-0.048927,0.0,0,a0da598e-391b-4b9b-a0ba-f8977d0b43b0,TEST
3,141191,-2.857621,-0.307727,1.521266,4.500119,1.812809,2.276221,-0.425395,0.895603,-1.564402,...,1.374592,-1.723268,0.127809,0.20751,-0.08213,-0.009999,0.0,0,7159db57-0ebd-4b99-a1b4-de6996ae0bd7,TEST


In [15]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
#newob

Need to understand the format of variables that the predictions expect.  AutoML may convert the type of some variables. The following cells retrieve the model from the endpoint and its schemata:

In [16]:
newob['Time'] = str(newob['Time'])

In [17]:
instances = [json_format.ParseDict(newob, Value())]
parameters = json_format.ParseDict({}, Value())

### Get Predictions: Python Client

In [18]:
endpoint = aiplatform.Endpoint.list(filter=f'display_name={NOTEBOOK}_{DATANAME}_{TIMESTAMP}_endpoint')[0]
endpoint.display_name

'02c_fraud_20210929191634_endpoint'

In [19]:
prediction = endpoint.predict(instances=instances, parameters=parameters)
prediction

Prediction(predictions=[{'scores': [0.9375022053718567, 0.06249777600169182], 'classes': ['0', '1']}], deployed_model_id='6609956042933534720', explanations=None)

In [20]:
prediction.predictions[0]['classes'][np.argmax(prediction.predictions[0]['scores'])]

'0'

### Get Predictions: REST

In [21]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

In [22]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

{
  "predictions": [
    {
      "classes": [
        "0",
        "1"
      ],
      "scores": [
        0.93750220537185669,
        0.062497776001691818
      ]
    }
  ],
  "deployedModelId": "6609956042933534720",
  "model": "projects/691911073727/locations/us-central1/models/3699839035280195584",
  "modelDisplayName": "02c_fraud_20210929191634"
}


### Get Predictions: gcloud (CLI)

In [23]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[{'classes': ['0', '1'], 'scores': [0.9375022053718567, 0.06249777600169182]}]


### Batch Predictions: BigQuery Source to BigQuery Destination, with Explanations

In [None]:
batch = aiplatform.BatchPredictionJob.create(
    job_display_name = f'{NOTEBOOK}_{DATANAME}_{TIMESTAMP}',
    model_name = endpoint.list_models()[0].model,
    instances_format = "bigquery",
    predictions_format = "bigquery",
    bigquery_source = f'bq://{PROJECT_ID}.{DATANAME}.{DATANAME}_prepped',
    bigquery_destination_prefix = f"{PROJECT_ID}",
    generate_explanation=True,
    labels = {'notebook':f'{NOTEBOOK}'}
)

INFO:google.cloud.aiplatform.jobs:Creating BatchPredictionJob
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob created. Resource name: projects/691911073727/locations/us-central1/batchPredictionJobs/5522655591295614976
INFO:google.cloud.aiplatform.jobs:To use this BatchPredictionJob in another session:
INFO:google.cloud.aiplatform.jobs:bpj = aiplatform.BatchPredictionJob('projects/691911073727/locations/us-central1/batchPredictionJobs/5522655591295614976')
INFO:google.cloud.aiplatform.jobs:View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/5522655591295614976?project=691911073727
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/5522655591295614976 current state:
JobState.JOB_STATE_RUNNING
INFO:google.cloud.aiplatform.jobs:BatchPredictionJob projects/691911073727/locations/us-central1/batchPredictionJobs/5522655591295614976 current state:
JobState.JOB_STAT

---
## Explanations
Interpretation Guide
- https://cloud.google.com/vertex-ai/docs/predictions/interpreting-results-automl#tabular

In [25]:
explanation = endpoint.explain(instances=instances, parameters=parameters)

In [26]:
explanation.predictions

[{'classes': ['0', '1'], 'scores': [0.9375022053718567, 0.06249777600169182]}]

In [27]:
print("attribution:")
print("baseline output",explanation.explanations[0].attributions[0].baseline_output_value)
print("instance output",explanation.explanations[0].attributions[0].instance_output_value)
print("output_index",explanation.explanations[0].attributions[0].output_index)
print("output display value",explanation.explanations[0].attributions[0].output_display_name)
print("approximation error",explanation.explanations[0].attributions[0].approximation_error)

attribution:
baseline output 0.9419223070144653
instance output 0.9375022053718567
output_index [0]
output display value 0
approximation error 0.00978898435221329


In [28]:
explanation.explanations[0].attributions[0]

baseline_output_value: 0.9419223070144653
instance_output_value: 0.9375022053718567
feature_attributions {
  struct_value {
    fields {
      key: "Amount"
      value {
        number_value: 4.335244496663412e-05
      }
    }
    fields {
      key: "Time"
      value {
        number_value: 1.01062986585829e-05
      }
    }
    fields {
      key: "V1"
      value {
        number_value: -0.0002142257160610623
      }
    }
    fields {
      key: "V10"
      value {
        number_value: 0.002479215463002523
      }
    }
    fields {
      key: "V11"
      value {
        number_value: -0.001960290802849663
      }
    }
    fields {
      key: "V12"
      value {
        number_value: 0.001099851396348741
      }
    }
    fields {
      key: "V13"
      value {
        number_value: -1.141760084364149e-05
      }
    }
    fields {
      key: "V14"
      value {
        number_value: -0.0004734926753573947
      }
    }
    fields {
      key: "V15"
      value {
        numbe

---
## Remove Resources
see notebook "99 - Cleanup"