 ============================================================================== \
 Copyright 2020 Google LLC. This software is provided as-is, without warranty \
 or representation for any use or purpose. Your use of it is subject to your \
 agreement with Google. \
 ============================================================================== 
 
 Author: Elvin Zhu, Chanchal Chatterjee \
 Email: elvinzhu@google.com \
<img src="img/google-cloud-icon.jpg" alt="Drawing" style="width: 200px;"/>

### List your current GCP project name

In [9]:
!gcloud config list --format 'value(core.project)' 2>/dev/null

img-seg-3d


In [20]:
# Import packages

import json
import logging
import pandas as pd
import numpy as np
from datetime import datetime
from pytz import timezone
import googleapiclient
from googleapiclient import discovery

-----------
### Dataset preprocessing

Preprocess input data by

    1. Dropping unique ID column;
    2. Convert categorical into one-hot encodings;
    3. Count number of unique classes;
    4. Split train/test
    5. None value removal
    6. Save process data into gcs
    
What is the difference from XGBoost preprocessing?
1. None value removal added (Automatically handled by XGBoost model);
2. Labels are one-hot encoded (XGBoost uses integers);

In [21]:
INPUT_DATA = "gs://tuti_asset/datasets/mortgage_structured.csv"
TARGET_COLUMN = "TARGET"

# TODO: Update gcs path before proceeding
TRAIN_FEATURE_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_train.csv"
TRAIN_LABEL_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_train.csv"
TEST_FEATURE_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_test.csv"
TEST_LABEL_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_test.csv"

!python3 preprocessing.py \
    --input_file $INPUT_DATA \
    --x_train_name $TRAIN_FEATURE_PATH \
    --y_train_name $TRAIN_LABEL_PATH \
    --x_test_name $TEST_FEATURE_PATH \
    --y_test_name $TEST_LABEL_PATH \
    --target_column $TARGET_COLUMN

INFO:root:Preprocessing raw data:
INFO:root: => Drop id column:
INFO:root: => One hot encoding categorical features
INFO:root: => Count number of classes
INFO:root: => Perform train/test split
INFO:root:Reading raw data file: gs://tuti_asset/datasets/mortgage_structured.csv
INFO:root:Drop unique id column which is not an useful feature for ML: LOAN_SEQUENCE_NUMBER
INFO:root:Convert categorical columns into one-hot encodings
INFO:root:categorical feature: first_time_home_buyer_flag
INFO:root:categorical feature: occupancy_status
INFO:root:categorical feature: channel
INFO:root:categorical feature: property_state
INFO:root:categorical feature: property_type
INFO:root:categorical feature: loan_purpose
INFO:root:categorical feature: seller_name
INFO:root:categorical feature: service_name
INFO:root:Count number of unique classes ...
INFO:root:No. of Classes: 4
INFO:root:Perform train/test split ...
INFO:root:Get feature/label shapes ...
INFO:root:x_train shape = (93639, 149)
INFO:root:x_tes

------
### Training with Google AI Platform

For the full article, please visit: https://cloud.google.com/ai-platform/docs/technical-overview

Where AI Platform fits in the ML workflow \
The diagram below gives a high-level overview of the stages in an ML workflow. The blue-filled boxes indicate where AI Platform provides managed services and APIs:

<img src="img/ml-workflow.svg" alt="Drawing">

As the diagram indicates, you can use AI Platform to manage the following stages in the ML workflow:

- Train an ML model on your data:
 - Train model
 - Evaluate model accuracy
 - Tune hyperparameters
 
 
- Deploy your trained model.

- Send prediction requests to your model:
 - Online prediction
 - Batch prediction (for TensorFlow only)
 
 
- Monitor the predictions on an ongoing basis.


- Manage your models and model versions.



In [22]:
PROJECT_ID = '<YOUR-PROJECT-ID>'     # Replace with your project ID
USER = '<YOUR-USERNAME>'             # Replace with your user name
BUCKET_NAME = '<YOUR-BUCKET>'    # Replace with your gcs bucket name
FOLDER_NAME = 'tf_train_job' # Replace with your gcs folder name
REGION = 'us-central1'        # Replace with your GCP region
TIMEZONE = 'US/Pacific'       # Replace with your local timezone

# Google Cloud AI Platform requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'tf_train_{}_{}'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    ) # Unique job name

# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/{}'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    JOBNAME,
    ) # gcs path to hold the outputs

# This is the AI Platform configuration for training, created in the setup step
JOB_CONFIG = "./config/config.yaml" # local path to training config file

# Path to your input feature and labels
TRAIN_FEATURE_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_train.csv'
TRAIN_LABEL_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_train.csv'
TEST_FEATURE_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_test.csv"
TEST_LABEL_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_test.csv"

# Get the initial set of hyperparameters
N_CLASSES = 4 
BOOSTER = 'gbtree' # Booster type
MAX_DEPTH = 2      # Depth of trees
N_ESTIMATORS = 10  # No of estimators

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)
print("JOB_CONFIG = ", JOB_CONFIG)

JOB_NAME =  tf_train_elvinzhu_051021_1232
JOB_DIR =  gs://tuti_job/tf_train_job/tf_train_elvinzhu_051021_1232
JOB_CONFIG =  ./config/config.yaml


#### Train at local

Before submitting training jobs to Cloud AI Platform, you can test your train.py code in the local environment. You can test by running your python script in command line, but another and maybe better choice is to use `gcloud ai-platform local train` command. The latter method could make sure your your entire python package are ready to be submitted to the remote VMs.

In [23]:
# Train on local machine with python command
!python3 trainer/train.py \
    --job-dir ./models \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --test_feature_name $TEST_FEATURE_PATH \
    --test_label_name $TEST_LABEL_PATH


2021-05-10 19:32:46.840859: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2021-05-10 19:32:46.840985: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2021-05-10 19:32:46.840998: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Namespace(batch_size=4, depth=3, dropout_rate=0.02, epochs=1, job_dir='./models', learnin

In [24]:
# Train on local machine with gcloud command
!gcloud ai-platform local train \
    --job-dir ./models \
    --package-path $(pwd)/trainer \
    --module-name trainer.train \
    -- \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --test_feature_name $TEST_FEATURE_PATH \
    --test_label_name $TEST_LABEL_PATH \
    --depth 3 \
    --dropout_rate 0.02 \
    --learning_rate 0.0001 \
    --batch_size 4 \
    --epochs 1

2021-05-10 19:33:59.233378: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2021-05-10 19:33:59.233502: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/lib64:/usr/local/nccl2/lib:/usr/local/cuda/extras/CUPTI/lib64
2021-05-10 19:33:59.233533: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Namespace(batch_size=4, depth=3, dropout_rate=0.02, epochs=1, job_dir='./models', learnin

#### Submit jobs to AI platform
See link for a full list of arguments: \
https://cloud.google.com/sdk/gcloud/reference/ai-platform/jobs/submit/training

In [25]:
# submit the training job to AI Platform
! gcloud ai-platform jobs submit training $JOBNAME \
    --job-dir $JOB_DIR \
    --package-path $(pwd)/trainer \
    --module-name trainer.train \
    --region $REGION \
    --python-version 3.7 \
    --runtime-version 2.2 \
    --config $JOB_CONFIG \
    -- \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --test_feature_name $TEST_FEATURE_PATH \
    --test_label_name $TEST_LABEL_PATH \
    --depth 3 \
    --dropout_rate 0.02 \
    --learning_rate 0.0001 \
    --batch_size 4 \
    --epochs 5 

Job [tf_train_elvinzhu_051021_1232] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_train_elvinzhu_051021_1232

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_train_elvinzhu_051021_1232
jobId: tf_train_elvinzhu_051021_1232
state: QUEUED


In [26]:
!gcloud ai-platform jobs describe $JOBNAME

createTime: '2021-05-10T19:35:09Z'
etag: jW5sUEEG4Nc=
jobId: tf_train_elvinzhu_051021_1232
state: PREPARING
trainingInput:
  args:
  - --train_feature_name
  - gs://tuti_job/data_split/mortgage_structured_x_train.csv
  - --train_label_name
  - gs://tuti_job/data_split/mortgage_structured_y_train.csv
  - --test_feature_name
  - gs://tuti_job/data_split/mortgage_structured_x_test.csv
  - --test_label_name
  - gs://tuti_job/data_split/mortgage_structured_y_test.csv
  - --depth
  - '3'
  - --dropout_rate
  - '0.02'
  - --learning_rate
  - '0.0001'
  - --batch_size
  - '4'
  - --epochs
  - '5'
  jobDir: gs://tuti_job/tf_train_job/tf_train_elvinzhu_051021_1232
  packageUris:
  - gs://tuti_job/tf_train_job/tf_train_elvinzhu_051021_1232/packages/c54b00d83480f177f19013665b0db22ba09f5137f103daa45e5a5ab8e62c55d1/trainer-0.1.tar.gz
  pythonModule: trainer.train
  pythonVersion: '3.7'
  region: us-central1
  runtimeVersion: '2.2'
  scaleTier: STANDARD_1
trainingOutput: {}

View job in the Cloud Con

------
### Hyperparameter Tuning

To use hyperparameter tuning in your training job you must perform the following steps:

- Specify the hyperparameter tuning configuration for your training job by including a HyperparameterSpec in your TrainingInput object.

- Include the following code in your training application:

 - Parse the command-line arguments representing the hyperparameters you want to tune, and use the values to set the hyperparameters for your training trial.
 - Add your hyperparameter metric to the summary for your graph.


In [27]:
# Gcloud training config
PROJECT_ID = '<YOUR-PROJECT-ID>'     # Replace with your project ID
USER = '<YOUR-USERNAME>'             # Replace with your user name
BUCKET_NAME = '<YOUR-BUCKET>'    # Replace with your gcs bucket name
FOLDER_NAME = 'tf_train_job' # Replace with your gcs folder name
REGION = 'us-central1'        # Replace with your GCP region
TIMEZONE = 'US/Pacific'       # Replace with your local timezone

# Google Cloud AI Platform requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'tf_train_{}_{}_hpt'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    ) # define unique job name

# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/jobdir'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    ) # define unique job dir on gcs

# This is the AI Platform configuration for hypertune, created in the setup step
JOB_CONFIG = "./config/config_hpt.yaml" # local path to hypertune config file

# Path to your input feature and labels (Train/validation)
TRAIN_FEATURE_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_train.csv'
TRAIN_LABEL_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_train.csv'
TEST_FEATURE_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_test.csv"
TEST_LABEL_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_test.csv"

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)
print("JOB_CONFIG = ", JOB_CONFIG)

JOB_NAME =  tf_train_elvinzhu_051021_1235_hpt
JOB_DIR =  gs://tuti_job/tf_train_job/jobdir
JOB_CONFIG =  ./config/config_hpt.yaml


In [28]:
# submit the hyperparameter training job
!gcloud ai-platform jobs submit training $JOBNAME \
    --package-path $(pwd)/trainer \
    --module-name trainer.train_hpt \
    --python-version 3.7 \
    --runtime-version 2.2 \
    --job-dir $JOB_DIR \
    --region $REGION \
    --config $JOB_CONFIG \
    -- \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --test_feature_name $TEST_FEATURE_PATH \
    --test_label_name $TEST_LABEL_PATH \
    --epochs 5 

Job [tf_train_elvinzhu_051021_1235_hpt] submitted successfully.
Your job is still active. You may view the status of your job with the command

  $ gcloud ai-platform jobs describe tf_train_elvinzhu_051021_1235_hpt

or continue streaming the logs with the command

  $ gcloud ai-platform jobs stream-logs tf_train_elvinzhu_051021_1235_hpt
jobId: tf_train_elvinzhu_051021_1235_hpt
state: QUEUED


#### Check the status of Long Running Operation (LRO) a.k.a. jobs

In [29]:
!gcloud ai-platform jobs describe $JOBNAME

createTime: '2021-05-10T19:35:13Z'
etag: AcULn0Ne9TE=
jobId: tf_train_elvinzhu_051021_1235_hpt
state: PREPARING
trainingInput:
  args:
  - --train_feature_name
  - gs://tuti_job/data_split/mortgage_structured_x_train.csv
  - --train_label_name
  - gs://tuti_job/data_split/mortgage_structured_y_train.csv
  - --test_feature_name
  - gs://tuti_job/data_split/mortgage_structured_x_test.csv
  - --test_label_name
  - gs://tuti_job/data_split/mortgage_structured_y_test.csv
  - --epochs
  - '5'
  hyperparameters:
    enableTrialEarlyStopping: true
    goal: MAXIMIZE
    hyperparameterMetricTag: accuracy
    maxParallelTrials: 2
    maxTrials: 5
    params:
    - maxValue: 10.0
      minValue: 2.0
      parameterName: model_depth
      scaleType: UNIT_LINEAR_SCALE
      type: INTEGER
    - maxValue: 0.01
      minValue: 0.001
      parameterName: dropout_rate
      scaleType: UNIT_LOG_SCALE
      type: DOUBLE
    - maxValue: 0.01
      minValue: 1e-05
      parameterName: learning_rate
      sc

#### Check the status of Long Running Operation (LRO) with Google API Client

Send an API request to Cloud AI Platform to get the detailed information. The most interesting piece of information is the hyperparameter values in the trial with best performance metric.

In [None]:
# Define the project id and the job id and format it for the api request
# We need to use project id and job name from last step
job_id = 'projects/{}/jobs/{}'.format(PROJECT_ID, JOBNAME)
# Build the service
ml = discovery.build('ml', 'v1', cache_discovery=False)
# Execute the request and pass in the job id
request = ml.projects().jobs().get(name=job_id).execute()
# Print response
logging.info(json.dumps(request, indent=4))

In [None]:
# Parse request response and sort experiments based on final metrics
trials = request['trainingOutput']['trials']
trials = pd.DataFrame(trials)
trials['hyperparameters.model_depth'] = trials['hyperparameters'].apply(lambda x: x['model_depth'])
trials['hyperparameters.dropout_rate'] = trials['hyperparameters'].apply(lambda x: x['dropout_rate'])
trials['hyperparameters.learning_rate'] = trials['hyperparameters'].apply(lambda x: x['learning_rate'])
trials['hyperparameters.batch_size'] = trials['hyperparameters'].apply(lambda x: x['batch_size'])
trials['finalMetric.trainingStep'] = trials['finalMetric'].apply(lambda x: x['trainingStep'])
trials['finalMetric.objectiveValue'] = trials['finalMetric'].apply(lambda x: x['objectiveValue'])
trials = trials.sort_values(['finalMetric.objectiveValue'], ascending=False)

In [None]:
trials

------
### Training with Tuned Parameters

Once your hyperparameter training jobs are done. You can use the optimized combination of hyperparameters from your trials and start a single training job on Cloud AI Platform to train your final model.

In [None]:
PROJECT_ID = '<YOUR-PROJECT-ID>' # Replace with your project ID
USER = '<YOUR-USERNAME>' # Replace with your User name
BUCKET_NAME = '<YOUR-BUCKET>' # Replace with your bucket name
FOLDER_NAME = 'tf_train_job' # Replace with your Folder name
REGION = 'us-central1' # Replace with your region
TIMEZONE = 'US/Pacific'

# Google Cloud AI Platform requires each job to have unique name, 
# Therefore, we use prefix + timestamp to form job names.
JOBNAME = 'tf_train_{}_{}'.format(
    USER,
    datetime.now(timezone(TIMEZONE)).strftime("%m%d%y_%H%M")
    )
# We use the job names as folder names to store outputs.
JOB_DIR = 'gs://{}/{}/{}'.format(
    BUCKET_NAME,
    FOLDER_NAME,
    JOBNAME,
    )

# This is the AI Platform configuration for training, created in the setup step
JOB_CONFIG = "./config/config.yaml" # local path to train config file

print("JOB_NAME = ", JOBNAME)
print("JOB_DIR = ", JOB_DIR)
print("JOB_CONFIG = ", JOB_CONFIG)

# Path to your input feature and labels (Train/validation)
TRAIN_FEATURE_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_train.csv'
TRAIN_LABEL_PATH = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_train.csv'
TEST_FEATURE_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_test.csv"
TEST_LABEL_PATH = "gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_test.csv"

# Getthe best hypertuned model parameters
N_CLASSES = 4
DEPTH=trials['hyperparameters'][0]['model_depth']
DROPOUT_RATE=trials['hyperparameters'][0]['dropout_rate']
LEARNING_RATE=trials['hyperparameters'][0]['learning_rate']
BATCH_SIZE=trials['hyperparameters'][0]['batch_size']


print("TRAIN_FEATURE_PATH = ", TRAIN_FEATURE_PATH)
print("TRAIN_LABEL_PATH = ", TRAIN_LABEL_PATH)
print("N_CLASSES = ", N_CLASSES)
print("DEPTH = ", DEPTH)
print("DROPOUT_RATE = ", DROPOUT_RATE)
print("LEARNING_RATE = ", LEARNING_RATE)
print("BATCH_SIZE = ", BATCH_SIZE)

In [None]:
# submit the training job
! gcloud ai-platform jobs submit training $JOBNAME \
    --job-dir $JOB_DIR \
    --package-path $(pwd)/trainer \
    --module-name trainer.train \
    --region $REGION \
    --python-version 3.7 \
    --runtime-version 2.2 \
    --config $JOB_CONFIG \
    -- \
    --train_feature_name $TRAIN_FEATURE_PATH \
    --train_label_name $TRAIN_LABEL_PATH \
    --test_feature_name $TEST_FEATURE_PATH \
    --test_label_name $TEST_LABEL_PATH \
    --depth $DEPTH \
    --dropout_rate $DROPOUT_RATE \
    --learning_rate $LEARNING_RATE \
    --batch_size $BATCH_SIZE \
    --epochs 5 

In [None]:
# check the training job status
! gcloud ai-platform jobs describe $JOBNAME

--------
### Deploy the Model

AI Platform provides tools to upload your trained ML model to the cloud, so that you can send prediction requests to the model.

In order to deploy your trained model on AI Platform, you must save your trained model using the tools provided by your machine learning framework. This involves serializing the information that represents your trained model into a file which you can deploy for prediction in the cloud.

Then you upload the saved model to a Cloud Storage bucket, and create a model resource on AI Platform, specifying the Cloud Storage path to your saved model.

When you deploy your model, you can also provide custom code (beta) to customize how it handles prediction requests.



In [None]:
MODEL_NAME = "tensorflow_model"                # Model name of your choice to deploy
MODEL_VERSION = "tensorflow_v0_1" # Model version name of your choice to deploy
REGION = "global"                       # The deployed model region
MODEL_FRAMEWORK = "tensorflow"             # The deployed model framework (tensorflow, sklearn, xgboost)
MODEL_DESCRIPTION = "tensorflow_hpt_best"      # The description of your model

In [None]:
# create model if not exist
!gcloud ai-platform models create $MODEL_NAME --region $"global" --enable-logging

In [None]:
# list model versions under model
!gcloud ai-platform versions list --model $MODEL_NAME --region "global"

In [None]:
# The gcs path contains your latested trained model
LATEST_MODEL_DIR = "gs://{}/{}/{}".format(BUCKET_NAME, FOLDER_NAME, JOBNAME)
print("LATEST_MODEL_DIR: ", LATEST_MODEL_DIR)

In [None]:
# Deploy the model to endpoint
! gcloud beta ai-platform versions create $MODEL_VERSION \
  --model=$MODEL_NAME \
  --origin=$LATEST_MODEL_DIR \
  --runtime-version=2.2 \
  --python-version=3.7 \
  --framework=$MODEL_FRAMEWORK \
  --description=$MODEL_DESCRIPTION \
  --region=$REGION 


In [None]:
# List all models
!gcloud ai-platform models list --region $REGION
# List all versions of the created model
!gcloud ai-platform versions list --model $MODEL_NAME --region $REGION
# Describe the Model
!gcloud ai-platform models describe $MODEL_NAME --region $REGION

------
### Send inference requests to your model

AI Platform provides the services you need to request predictions from your model in the cloud.

There are two ways to get predictions from trained models: online prediction (sometimes called HTTP prediction) and batch prediction. In both cases, you pass input data to a cloud-hosted machine-learning model and get inferences for each data instance.



#### Load testing data

In [None]:
# Load test feature and labels
test_feature_url = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_x_test.csv'
test_label_url = 'gs://<YOUR-BUCKET>/data_split/mortgage_structured_y_test.csv'

x_test = pd.read_csv(test_feature_url)
y_test = pd.read_csv(test_label_url, header=None)

#### Call Google API for online inference

In [None]:
# Create google API client 
PROJECT_ID = "<YOUR-PROJECT-ID>" # Your project id
MODEL_NAME = "tensorflow_model"  # The model name from previous step
VERSION = "tensorflow_v0_1" # The model version from previous step
batch_size = 1000

# Create model inference with Google API Client 
# Model endpoint name
model_name = 'projects/{}/models/{}/versions/{}'.format(
    PROJECT_ID, 
    MODEL_NAME, 
    VERSION
    )

# Build the service
service = discovery.build(
    'ml', 
    'v1', 
    cache_discovery=False, 
    cache=False
    )

prediction_list = []

for ind in range(0, len(x_test), batch_size):
    start = ind
    end = min(ind+batch_size, len(x_test))
    response = service.projects().predict(
        name=model_name,
        body={'instances': x_test.iloc[start:end].values.tolist()}
        ).execute()
    response = response['predictions']
    response = [x['dense_6'] for x in response]
    prediction_list += response
    
prediction_list = np.array(prediction_list)

#### Other way to call Cloud AI Platform API using gcloud command for prediction

In [None]:
def post_process(predict, n_sample, n_class):  
    """Parse response of inference requests,
    Args:
        predict: List of strings, inference request response;
        n_sample: No. of samples for inference;
        n_class: No. of classes
    Return:
        List of inference labels
    """
    predictions = np.empty([n_sample, n_class])
    for entry in predict[1:]:
        key, value = entry.split(":")
        exec("{} = {}".format(key, value))
    predictions = np.argmax(predictions, axis=1)
    return predictions.tolist()

def accuracy_score(y_true, y_pred):
    """ Compute accuracy score
    Args:
        y_ture: list of ground truth labels,
        y_pred: list of predicted labels,
    Return:
        float, accuracy score
    """
    from sklearn import metrics
    return metrics.accuracy_score(y_true, y_pred)

In [None]:
PROJECT_ID = "<YOUR-PROJECT-ID>"        # Project ID
MODEL_NAME = "tensorflow_model"         # Model name from previous step
VERSION = "tensorflow_v0_1"     # Model version from previous step
JSON_TEMP = 'test_data.json' # temp json file name to hold the inference data
batch_size = 1000                # data batch size

y_pred = []
for ind in range(0, len(x_test), batch_size):
    start = ind
    end = min(ind+batch_size, len(x_test))
    body={'instances': x_test.iloc[start:end].values.tolist()}
    with open(JSON_TEMP, 'w') as fp:
        json.dump(body, fp)
    
    predict = !gcloud ai-platform predict \
      --model=$MODEL_NAME \
      --version=$VERSION \
      --format='text' \
      --json-request=$JSON_TEMP \
      --region=$REGION
    
    y_pred += post_process(predict[1:], end-start, N_CLASSES)

In [None]:
accuracy = accuracy_score([np.where(r==1)[0][0] for r in y_test.to_numpy()], y_pred)
print("Accuracy: ", accuracy)