**Last Updated 29/9/23:**

The focus of this notebook is to demonstrate how to programatically configure a Vertex AI custom container job and hyperparameter tuning job. The main benefit of this is for larger jobs like backtesting model performance across several months where doing so on GCP console is troublesome. In general, the SDK is easy to use - simply instantiate a CustomJob class, pass it a dictionary of relevant parameters and if doing a HyperparameterTuningJob, pass the CustomJob as an argument to the HyperparameterTuningJob class. Some variables, such as location, can be initiated just once when initiating aiplatform, and passing it as argument to individual jobs only serves to override the original value.  

# Utilities:

Mainly, a function to clear the bigquery table for backtesting results.

In [1]:
import pandas as pd 
import numpy as np
import pickle5 as pickle
import gcsfs
from google.cloud import bigquery
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/jupyter/ficc/isaac_creds.json"
fs = gcsfs.GCSFileSystem(project='eng-reactor-287421')


In [7]:
# def load_data_from_pickle(path, bucket = 'isaac_data'):
#     if os.path.isfile(path):
#         print('File available, loading pickle')
#         with open(path, 'rb') as f:
#             data = pickle.load(f)
#     else:
#         print(f'File not available, downloading from cloud storage and saving to {path}')
#         gc_path = os.path.join(bucket, path)
#         print(gc_path)
#         with fs.open(gc_path) as gf:
#             data = pd.read_pickle(gf)
#         with open(path, 'wb') as f:
#             pickle.dump(data, f)
#     return data

# path = '/home/jupyter/ficc/ml_models/sequence_predictors/isaac_experiments/New Features/data_latest_01-08_no_exclusions.pkl'
# data = load_data_from_pickle(path)

# data.to_pickle('gs://custom-train-job-test/large_data.pkl')

In [2]:
def getSchema():
    schema = [bigquery.SchemaField("rtrs_control_number", "INTEGER"),
                bigquery.SchemaField("cusip", "STRING"),
                bigquery.SchemaField("trade_date", "DATE"),
                bigquery.SchemaField("yield", "FLOAT"),
                bigquery.SchemaField("new_ys", "FLOAT"),
                bigquery.SchemaField("new_ficc_ycl", "FLOAT"),
                bigquery.SchemaField("dollar_price", "FLOAT"),
                bigquery.SchemaField("new_ys_prediction", "FLOAT"),
             bigquery.SchemaField("prediction_datetime", "DATETIME")]
    return schema

def uploadData(df, TABLE_ID, schema, write_disposition="WRITE_APPEND"):
    client = bigquery.Client(project='eng-reactor-287421', location="US")
    job_config = bigquery.LoadJobConfig(schema = schema, write_disposition=write_disposition)

    job = client.load_table_from_dataframe(df, TABLE_ID,job_config=job_config)

    try:
        job.result()
        print("BQ upload Successful")
    except Exception as e:
        print("BQ failed to Upload")
        
def clear_bq():
    save_cols = ['rtrs_control_number', 'cusip', 'trade_date', 'dollar_price', 'yield', 'new_ficc_ycl', 'new_ys', 'new_ys_prediction', 'prediction_datetime']
    df = pd.DataFrame(columns=save_cols)
    uploadData(df, 
               "eng-reactor-287421.historical_predictions_test.historical_predictions_test",
               getSchema(),
               'WRITE_TRUNCATE')

In [3]:
# CLEARS BIG QUERY TABLE BE CAREFUL WHEN RUNNING 
clear_bq()

BQ upload Successful


# VERTEX AI SDK

In [2]:
import google.cloud.aiplatform as aip
from google.cloud.aiplatform import hyperparameter_tuning as hpt

Arguments for HPT job

In [3]:
LOCATION = "us-central1"
STAGING_BUCKET = "gs://ficc-historical-results"
PROJECT_ID ='eng-reactor-287421'
SERVICE_ACCOUNT = "964018767272-compute@developer.gserviceaccount.com"
JOB_NAME = "model_backtest_202301_202309" 
CONTAINER_URI = "us-east4-docker.pkg.dev/eng-reactor-287421/custom-train-job/ficc-historical-models:latest"

aip.init(project=PROJECT_ID,
         staging_bucket=STAGING_BUCKET,
         location=LOCATION)

disk_spec = {
    "boot_disk_type": "pd-ssd"  ,
    "boot_disk_size_gb": 100
}

machine_spec = {
    "machine_type": "n1-highmem-16",
    "accelerator_type": "NVIDIA_TESLA_T4",
    "accelerator_count": 1
}


containerSpec = {
    "image_uri": CONTAINER_URI,
    "args": [
        "--train_months=6",
        "--NUM_EPOCHS=150",
        "--VALIDATION_SPLIT=0.1",
        "--bucket=ficc-historical-results",
        "--file=processed_data_2022-09_2023-09.pkl",
        "--BATCH_SIZE=10000",
        "--LEARNING_RATE=0.0007"
    ]
}

worker_pool_spec = [
    {
        "replica_count": 1,
        "machine_spec": machine_spec,
        "disk_spec": disk_spec,
        "container_spec": containerSpec
    }
]

Creating custom job with arguments

In [4]:
job = aip.CustomJob(display_name=JOB_NAME, 
                worker_pool_specs=worker_pool_spec)

In [6]:
target_dates = pd.bdate_range('2023-03-01', '2023-06-30').strftime('%Y-%m-%d').to_list()
print(len(target_dates))

88


Select dates for backtesting as a hyperparameter. Vertex AI will take a different value every run. With N runs = N dates, every date will be tested.

In [None]:
# target_dates = [
#     '2023-03-01',
#                 # '2023-04-03',
#                 '2023-05-01',
#                 '2023-06-01',
#                 # '2023-07-03',
#                 # '2023-08-01',
#                 # '2023-09-01',
#                 '2023-09-29',
#                 # '2023-05-29'
# ]


In [7]:
N = len(target_dates)
hpt_dates = hpt.CategoricalParameterSpec(target_dates)

Instantiate HyperparameterTuningJob class and pass arguments, then run on correct service account

In [8]:
hpt_job = aip.HyperparameterTuningJob(
    display_name=JOB_NAME,
    custom_job=job,
    metric_spec={
        "mae": "minimize",
    },
    parameter_spec={
        "target_date": hpt_dates,
    },
    max_trial_count=N,
    parallel_trial_count=min(12,N),
)


In [9]:
hpt_job.run(service_account = SERVICE_ACCOUNT)

INFO:google.cloud.aiplatform.jobs:Creating HyperparameterTuningJob
INFO:google.cloud.aiplatform.jobs:HyperparameterTuningJob created. Resource name: projects/964018767272/locations/us-central1/hyperparameterTuningJobs/6947764841490677760
INFO:google.cloud.aiplatform.jobs:To use this HyperparameterTuningJob in another session:
INFO:google.cloud.aiplatform.jobs:hpt_job = aiplatform.HyperparameterTuningJob.get('projects/964018767272/locations/us-central1/hyperparameterTuningJobs/6947764841490677760')
INFO:google.cloud.aiplatform.jobs:View HyperparameterTuningJob:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/6947764841490677760?project=964018767272


KeyboardInterrupt: 