 ============================================================================== \
 Copyright 2020 Google LLC. This software is provided as-is, without warranty \
 or representation for any use or purpose. Your use of it is subject to your \
 agreement with Google. \
 ============================================================================== 
 
 Author: Elvin Zhu, Chanchal Chatterjee \
 Email: elvinzhu@google.com \
<img src="img/google-cloud-icon.jpg" alt="Drawing" style="width: 200px;"/>

### Import packages

In [None]:
# !python3 -m pip install google-cloud-aiplatform
# !python3 -m pip install google-cloud-storage==1.32
# !gcloud components update

In [129]:
# Import packages

import json
import logging
import pandas as pd
import numpy as np
from datetime import datetime
from pytz import timezone
from googleapiclient import discovery
from google.cloud import aiplatform

### Configure Global Variables

In [191]:
# Configure your global variables
PROJECT = 'img-seg-3d'          # Replace with your project ID
USER = 'elvinzhu'               # Replace with your user name
BUCKET_NAME = 'vapit_job'  # Replace with your gcs bucket name
FOLDER_NAME = 'xgb_train_job'   # Replace with your gcs folder name
REGION = 'us-central1'          # Replace with your GCP region
TIMEZONE = 'US/Pacific'         # Replace with your local timezone
PACKAGE_URIS = f"gs://{BUCKET_NAME}/trainer/trainer-0.1.tar.gz" # Replace with your python pakcage uri

TRAIN_FEATURE_PATH = f"gs://{BUCKET_NAME}/data_split/mortgage_structured_x_train.csv" # Update with your gcs path
TRAIN_LABEL_PATH = f"gs://{BUCKET_NAME}/data_split/mortgage_structured_y_train.csv" # Update with your gcs path
TEST_FEATURE_PATH = f"gs://{BUCKET_NAME}/data_split/mortgage_structured_x_test.csv" # Update with your gcs path
TEST_LABEL_PATH = f"gs://{BUCKET_NAME}/data_split/mortgage_structured_y_test.csv" # Update with your gcs path


List your current GCP project name

In [131]:
!gcloud config list --format 'value(core.project)' 2>/dev/null

img-seg-3d


Create your bucket

In [165]:
!gsutil mb gs://$BUCKET_NAME -l $REGION 

CommandException: "mb" command does not support "file://" URLs. Did you mean to use a gs:// URL?


Build python package and upload to your bucket

In [133]:
# !cd /home/jupyter/vapit/ai-platform-xgboost
# !python3 -m build
# !gsutil cp ./dist/trainer-0.1.tar.gz $PACKAGE_URIS

In [192]:
# freddie mac public mortgage data (Don't change it)
INPUT_DATA = "gs://tuti_asset/datasets/mortgage_structured.csv" # public mortgage data 
TARGET_COLUMN = "TARGET" # Column name for target labels

-----------
### Dataset preprocessing

Preprocess input data by

    1. Dropping unique ID column;
    2. Convert categorical into one-hot encodings;
    3. Count number of unique classes;
    4. Split train/test
    5. Save process data into gcs

In [193]:
!python3 preprocessing.py \
    --input_file $INPUT_DATA \
    --x_train_name $TRAIN_FEATURE_PATH \
    --x_test_name $TEST_FEATURE_PATH \
    --y_train_name $TRAIN_LABEL_PATH \
    --y_test_name $TEST_LABEL_PATH \
    --target_column $TARGET_COLUMN

INFO:root:Preprocessing raw data:
INFO:root: => Drop id column:
INFO:root: => One hot encoding categorical features
INFO:root: => Count number of classes
INFO:root: => Perform train/test split
INFO:root:Reading raw data file: gs://tuti_asset/datasets/mortgage_structured.csv
INFO:root:Drop unique id column which is not an useful feature for ML: LOAN_SEQUENCE_NUMBER
INFO:root:Convert categorical columns into one-hot encodings
INFO:root:categorical feature: first_time_home_buyer_flag
INFO:root:categorical feature: occupancy_status
INFO:root:categorical feature: channel
INFO:root:categorical feature: property_state
INFO:root:categorical feature: property_type
INFO:root:categorical feature: loan_purpose
INFO:root:categorical feature: seller_name
INFO:root:categorical feature: service_name
INFO:root:Count number of unique classes ...
INFO:root:No. of Classes: 4
INFO:root:Perform train/test split ...
INFO:root:Get feature/label shapes ...
INFO:root:x_train shape = (93639, 149)
INFO:root:x_tes

------
### Training with Google Vertex AI 

For the full article, please visit: https://cloud.google.com/vertex-ai/docs

Where Vertex AI fits in the ML workflow \
The diagram below gives a high-level overview of the stages in an ML workflow. The blue-filled boxes indicate where Vertex AI provides managed services and APIs:

<img src="img/ml-workflow.svg" alt="Drawing">

As the diagram indicates, you can use Vertex AI to manage the following stages in the ML workflow:

- Train an ML model on your data:
 - Train model
 - Evaluate model accuracy
 - Tune hyperparameters
 
 
- Deploy your trained model.

- Send prediction requests to your model:
 - Online prediction
 - Batch prediction (for TensorFlow only)
 
 
- Monitor the predictions on an ongoing basis.

- Manage your models and model versions.


In [136]:
# Google Cloud AI Platform requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'xgb_train_{}_{}'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    ) # Unique job name

# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/{}'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    JOBNAME,
    ) # gcs path to hold the outputs

# Get the initial set of hyperparameters
N_CLASSES = '4' 
BOOSTER = 'gbtree' # Booster type
MAX_DEPTH = '2'      # Depth of trees
N_ESTIMATORS = '10'  # No of estimators

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)

JOB_NAME =  xgb_train_elvinzhu_061321_2020
JOB_DIR =  gs://vapit_job/xgb_train_job/xgb_train_elvinzhu_061321_2020


#### Train at local

Before submitting training jobs to Cloud AI Platform, you can test your train.py code in the local environment. You can test by running your python script in command line, but another and maybe better choice is to use `gcloud ai-platform local train` command. The latter method could make sure your your entire python package are ready to be submitted to the remote VMs.

In [None]:
# Train on local machine with python command
!python3 trainer/train.py \
    --job-dir ./models \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --no_classes $N_CLASSES \
    --n_estimators $N_ESTIMATORS \
    --max_depth $MAX_DEPTH \
    --booster $BOOSTER

### Submit job to Vertex AI

#### Using aiplatform python SDK

In [137]:
executor_image_uri = 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-2:latest'
python_module = "trainer.train"
api_endpoint = "us-central1-aiplatform.googleapis.com"
machine_type = "n1-standard-4"
        
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
custom_job = {
    "display_name": JOBNAME,
    "job_spec": {
        "worker_pool_specs": [
            {
                "machine_spec": {
                    "machine_type": machine_type,
                },
                "replica_count": 1,
                "python_package_spec": {
                    "executor_image_uri": executor_image_uri,
                    "package_uris": [PACKAGE_URIS],
                    "python_module": python_module,
                    "args": [
                      '--job-dir',
                      JOB_DIR,
                      '--train_feature_name',
                      TRAIN_FEATURE_PATH,
                      '--train_label_name',
                      TRAIN_LABEL_PATH,
                      '--no_classes',
                      N_CLASSES,
                      '--n_estimators',
                      N_ESTIMATORS,
                      '--max_depth',
                      MAX_DEPTH,
                      '--booster',
                      BOOSTER
                    ],
                },
            }
        ]
    },
}
parent = f"projects/{project}/locations/{REGION}"
response = client.create_custom_job(parent=parent, custom_job=custom_job)
print("response:", response)


response: name: "projects/122476304848/locations/us-central1/customJobs/6139747696291872768"
display_name: "xgb_train_elvinzhu_061321_2020"
job_spec {
  worker_pool_specs {
    machine_spec {
      machine_type: "n1-standard-4"
    }
    replica_count: 1
    disk_spec {
      boot_disk_type: "pd-ssd"
      boot_disk_size_gb: 100
    }
    python_package_spec {
      executor_image_uri: "us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-2:latest"
      package_uris: "gs://vapit_job/trainer/trainer-0.1.tar.gz"
      python_module: "trainer.train"
      args: "--job-dir"
      args: "gs://vapit_job/xgb_train_job/xgb_train_elvinzhu_061321_2020"
      args: "--train_feature_name"
      args: "gs://vapit_job/data_split/mortgage_structured_x_train.csv"
      args: "--train_label_name"
      args: "gs://vapit_job/data_split/mortgage_structured_y_train.csv"
      args: "--no_classes"
      args: "4"
      args: "--n_estimators"
      args: "10"
      args: "--max_depth"
      args: "2"
      args: 

------
### Hyperparameter Tuning

To use hyperparameter tuning in your training job you must perform the following steps:

- Specify the hyperparameter tuning configuration for your training job by including a HyperparameterSpec in your TrainingInput object.

- Include the following code in your training application:

 - Parse the command-line arguments representing the hyperparameters you want to tune, and use the values to set the hyperparameters for your training trial.
 - Add your hyperparameter metric to the summary for your graph.


In [None]:
# Gcloud training config

# Google Vertex AI requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'xgb_train_{}_{}_hpt'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    ) # define unique job name

# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/jobdir'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    ) # define unique job dir on gcs

N_CLASSES = '4'

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)

### Submit the hyperparameter job to vertex AI

In [None]:
executor_image_uri = 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-2:latest'
python_module =  "trainer.train_hpt"
api_endpoint = "us-central1-aiplatform.googleapis.com"
machine_type = "n1-standard-4"

# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.JobServiceClient(client_options=client_options)

# study_spec
metric = {
    "metric_id": "roc_auc",
    "goal": aiplatform.gapic.StudySpec.MetricSpec.GoalType.MAXIMIZE,
}

max_depth = {
        "parameter_id": "max_depth",
        "integer_value_spec": {"min_value": 2, "max_value": 20},
        "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE,
}
n_estimators = {
        "parameter_id": "n_estimators",
        "integer_value_spec": {"min_value": 10, "max_value": 200},
        "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE,
}
batch_size = {
    "parameter_id": "booster",
    "categorical_value_spec": {"values": ["gbtree","gblinear","dart"]},
}

# trial_job_spec
machine_spec = {
    "machine_type": machine_type,
}
worker_pool_spec = {
    "machine_spec": machine_spec,
    "replica_count": 1,
    "python_package_spec": {
        "executor_image_uri": executor_image_uri,
        "package_uris": [PACKAGE_URIS],
        "python_module": python_module,
        "args": [
            '--job-dir',
            JOB_DIR,
            '--train_feature_name',
            TRAIN_FEATURE_PATH,
            '--train_label_name',
            TRAIN_LABEL_PATH,
            '--val_feature_name',
            TEST_FEATURE_PATH,
            '--val_label_name',
            TEST_LABEL_PATH,
            '--no_classes',
            N_CLASSES,
        ],
    },
}

# hyperparameter_tuning_job
hyperparameter_tuning_job = {
    "display_name": JOBNAME,
    "max_trial_count": 4,
    "parallel_trial_count": 2,
    "study_spec": {
        "metrics": [metric],
        "parameters": [max_depth, n_estimators, batch_size],
#         "algorithm": aiplatform.gapic.StudySpec.Algorithm.RANDOM_SEARCH,
    },
    "trial_job_spec": {"worker_pool_specs": [worker_pool_spec]},
}
parent = f"projects/{project}/locations/{REGION}"
response = client.create_hyperparameter_tuning_job(
    parent=parent, hyperparameter_tuning_job=hyperparameter_tuning_job
)
# print("response:", response)


#### Check the status of Long Running Operation (LRO) with Google API Client

Send an API request to Vertex AI to get the detailed information. The most interesting piece of information is the hyperparameter values in the trial with best performance metric.

In [None]:
client_options = {"api_endpoint": api_endpoint}
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
name = client.hyperparameter_tuning_job_path(
    project=project,
    location=location,
    hyperparameter_tuning_job=3537793011578568704,
)
response = client.get_hyperparameter_tuning_job(name=name)
# print("response:", response)
# print("response state: ", str(response.state))
if "JobState.JOB_STATE_SUCCEEDED" == str(response.state):
    print("Job state succeeded.")

#### Get the hyperparameters associated with the best metrics

In [None]:
max_ind = 0
max_val = 0
for ind, trials in enumerate(response.trials):
    value = trials.final_measurement.metrics[0].value
    print("Metrics Value (larger is better):", value)
    if value > max_val:
        max_val = value
        max_ind = ind
        
param_dict = {}
for params in response.trials[max_ind].parameters:
    param_dict[params.parameter_id] = params.value
    
BOOSTER=param_dict['booster']
MAX_DEPTH=param_dict['max_depth']
N_ESTIMATORS=param_dict['n_estimators']

print("BOOSTER", BOOSTER)
print("MAX_DEPTH", MAX_DEPTH)
print("N_ESTIMATORS",N_ESTIMATORS)

------
### Training with Tuned Parameters

Once your hyperparameter training jobs are done. You can use the optimized combination of hyperparameters from your trials and start a single training job on Cloud AI Platform to train your final model.

In [128]:
# Google Cloud AI Platform requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'xgb_train_{}_{}'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    )
# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/{}'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    JOBNAME,
    )

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)

N_CLASSES = '4'
print("TRAIN_FEATURE_PATH = ", TRAIN_FEATURE_PATH)
print("TRAIN_LABEL_PATH = ", TRAIN_LABEL_PATH)
print("N_CLASSES = ", N_CLASSES)
print("BOOSTER = ", BOOSTER)
print("MAX_DEPTH = ", MAX_DEPTH)
print("N_ESTIMATORS = ", N_ESTIMATORS)

JOB_NAME =  xgb_train_elvinzhu_061321_2019
JOB_DIR =  gs://vapit_job/xgb_train_job/xgb_train_elvinzhu_061321_2019
TRAIN_FEATURE_PATH =  gs://vapit_job/data_split/mortgage_structured_x_train.csv
TRAIN_LABEL_PATH =  gs://vapit_job/data_split/mortgage_structured_y_train.csv
N_CLASSES =  4
BOOSTER =  gbtree
MAX_DEPTH =  19.0
N_ESTIMATORS =  110.0


In [None]:
executor_image_uri = 'us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-2:latest'
python_module = "trainer.train"
api_endpoint = "us-central1-aiplatform.googleapis.com"
machine_type = "n1-standard-4"
        
# The AI Platform services require regional API endpoints.
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
# This client only needs to be created once, and can be reused for multiple requests.
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
custom_job = {
    "display_name": JOBNAME,
    "job_spec": {
        "worker_pool_specs": [
            {
                "machine_spec": {
                    "machine_type": machine_type,
                },
                "replica_count": 1,
                "python_package_spec": {
                    "executor_image_uri": executor_image_uri,
                    "package_uris": [PACKAGE_URIS],
                    "python_module": python_module,
                    "args": [
                      '--job-dir',
                      JOB_DIR,
                      '--train_feature_name',
                      TRAIN_FEATURE_PATH,
                      '--train_label_name',
                      TRAIN_LABEL_PATH,
                      '--no_classes',
                      N_CLASSES,
                      '--n_estimators',
                      N_ESTIMATORS,
                      '--max_depth',
                      MAX_DEPTH,
                      '--booster',
                      BOOSTER
                    ],
                },
            }
        ]
    },
}
parent = f"projects/{project}/locations/{REGION}"
response = client.create_custom_job(parent=parent, custom_job=custom_job)
print("response:", response)


Check the training job status

In [141]:
# check the training job status
client_options = {"api_endpoint": api_endpoint}
client = aiplatform.gapic.JobServiceClient(client_options=client_options)
name = client.custom_job_path(
    project=PROJECT,
    location=REGION,
    custom_job=6139747696291872768,
)
response = client.get_custom_job(name=name)
print(response.state)

JobState.JOB_STATE_SUCCEEDED


--------
### Deploy the Model

Vertex AI provides tools to upload your trained ML model to the cloud, so that you can send prediction requests to the model.

In order to deploy your trained model on Vertex AI, you must save your trained model using the tools provided by your machine learning framework. This involves serializing the information that represents your trained model into a file which you can deploy for prediction in the cloud.

Then you upload the saved model to a Cloud Storage bucket, and create a model resource on Vertex AI, specifying the Cloud Storage path to your saved model.

When you deploy your model, you can also provide custom code (beta) to customize how it handles prediction requests.



#### Import model artifacts to Vertex AI 

When you import a model, you associate it with a container for Vertex AI to run prediction requests. You can use pre-built containers provided by Vertex AI, or use your own custom containers that you build and push to Container Registry or Artifact Registry.

You can use a pre-built container if your model meets the following requirements:

- Trained in Python 3.7 or later
- Trained using TensorFlow, scikit-learn, or XGBoost
- Exported to meet framework-specific requirements for one of the pre-built prediction containers

The link to the list of pre-built predict container images:

https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers?_ga=2.125143370.-1302053296.1620920844&_gac=1.221340266.1622086653.CjwKCAjw47eFBhA9EiwAy8kzNOkCqVAmokRvQaxBDOoa8AhGOpzzW69x64rRzfgWxogIn3m6moQoBRoCuOsQAvD_BwE

In [185]:
MODEL_NAME = "my_first_XGBoost_model"

aiplatform.Model.upload(
    display_name = MODEL_NAME,
    serving_container_image_uri = 'us-docker.pkg.dev/vertex-ai/prediction/xgboost-cpu.1-3:latest',
    artifact_uri = "gs://vapit_job/xgb_train_job/xgb_train_elvinzhu_061321_2020"
)

INFO:google.cloud.aiplatform.models:Creating Model
INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/122476304848/locations/us-central1/models/1143360151491706880/operations/4675220473204703232
INFO:google.cloud.aiplatform.models:Model created. Resource name: projects/122476304848/locations/us-central1/models/1143360151491706880
INFO:google.cloud.aiplatform.models:To use this Model in another session:
INFO:google.cloud.aiplatform.models:model = aiplatform.Model('projects/122476304848/locations/us-central1/models/1143360151491706880')


<google.cloud.aiplatform.models.Model object at 0x7f119a4fced0> 
resource name: projects/122476304848/locations/us-central1/models/1143360151491706880

#### Create Endpoint

You need the endpoint ID to deploy the model.

In [143]:
MODEL_ENDPOINT_DISPLAY_NAME = "my_first_XGBoost_model_endpoint"

aiplatform.init(project=PROJECT, location=REGION)
endpoint = aiplatform.Endpoint.create(
    display_name=MODEL_ENDPOINT_DISPLAY_NAME, project=PROJECT, location=REGION,
)

endpoint_id = endpoint.resource_name.split('/')[-1]

INFO:google.cloud.aiplatform.models:Creating Endpoint
INFO:google.cloud.aiplatform.models:Create Endpoint backing LRO: projects/122476304848/locations/us-central1/endpoints/7700645189408260096/operations/8358320543463636992
INFO:google.cloud.aiplatform.models:Endpoint created. Resource name: projects/122476304848/locations/us-central1/endpoints/7700645189408260096
INFO:google.cloud.aiplatform.models:To use this Endpoint in another session:
INFO:google.cloud.aiplatform.models:endpoint = aiplatform.Endpoint('projects/122476304848/locations/us-central1/endpoints/7700645189408260096')


#### Deploy Model to the endpoint

You must deploy a model to an endpoint before that model can be used to serve online predictions; deploying a model associates physical resources with the model so it can serve online predictions with low latency. An undeployed model can serve batch predictions, which do not have the same low latency requirements.

In [None]:
MODEL_NAME = "my_first_XGBoost_model"
DEPLOYED_MODEL_DISPLAY_NAME = "my_first_XGBoost_model_deployed"
aiplatform.init(project=PROJECT, location=REGION)

model = aiplatform.Model(model_name='1143360151491706880')

# The explanation_metadata and explanation_parameters should only be
# provided for a custom trained model and not an AutoML model.
model.deploy(
    endpoint=endpoint,
    deployed_model_display_name=DEPLOYED_MODEL_DISPLAY_NAME,
    machine_type = "n1-standard-4",
    sync=True
)

model.wait()

------
### Send inference requests to your model

Vertex AI provides the services you need to request predictions from your model in the cloud.

There are two ways to get predictions from trained models: online prediction (sometimes called HTTP prediction) and batch prediction. In both cases, you pass input data to a cloud-hosted machine-learning model and get inferences for each data instance.

Vertex AI online prediction is a service optimized to run your data through hosted models with as little latency as possible. You send small batches of data to the service and it returns your predictions in the response.

#### Load testing data

In [194]:
# Load test feature and labels
x_test = pd.read_csv(TEST_FEATURE_PATH)
y_test = pd.read_csv(TEST_LABEL_PATH)

# Fill nan value with zeros (Prediction lacks the ability to handle nan values for now)
x_test = x_test.fillna(0)

# Create a temporary json file to contain data to be predicted
JSON_TEMP = 'xgb_test_data.json' # temp json file name to hold the inference data
batch_size = 100                # data batch size
start = 0
end = min(ind+batch_size, len(x_test))
body={'instances': x_test.iloc[start:end].values.tolist()}
# body = json.dumps(body).encode().decode()
body = json.dumps(body)
with open(JSON_TEMP, 'w') as fp:
    fp.write(body)

#### Call Google API for online inference

In [225]:
!curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://us-central1-aiplatform.googleapis.com/v1/projects/$PROJECT_ID/locations/us-central1/endpoints/$endpoint_id:predict \
-d "@$JSON_TEMP"